Procedural Methodologies

Discussion of various procedural methodologies and techniques

I find it best to divide up procedural software into "tasks" or "events". Each task reads and sometimes changes the "state" from/to the database. It is a variation of the old "input-->process-->output" paradigm, except that a good portion of the i/o is with the database. The database becomes the backbone, not the code. The tasks are like cows feeding from (and to, yuk) a trough (troff?), which is the database. (See BigSoupOfClasses)

There might be some "master" tasks which dispatch to the tasks. In larger systems, the database is often used as a guide to this dispatching. For example, a table may store menu and/or UI event items along with the name of the script (filename) to run upon that menu or event. (This is a little harder to do in static languages, but is still possible using big switch/case statements to translate from a name to a routine. I suggest it be auto-generated from a compile script if such approach is used.) Related: EvalVsPolymorphism.

In my observation, the size and complexity of these tasks stays roughly the same regardless of the application or database size, at least for interactive systems.

But as the system gets larger, it is the *interaction* between the tasks that will kill you. The impact of any change in any task cannot be assessed without going through ALL other tasks, and no one can keep all that in his head, unless the tasks are grouped into modules with associated tables which other tasks cannot touch (but that breaks the simple "input-->process-->output" paradigm). "Gee, what if taskX puts a '2' in this row here instead of '1'?" "Uh... taskY will barf because it only expects '0' or '1'...)".

I don't see how it is any worse than OOP. The "flapping butterfly" thing will apply to anything. Do you want a log of which procedure made the data change in the DB? I need more specifics about what is bothering you.

The simple "input-->process-->output" (IPO for short) paradigm means that the DB is basically a big global variables store. By interactions I mean Non-locality, No Access Control or Constraint Checking, Implicit coupling, see GlobalVariablesAreBad

in short, the IPO paradigm means every task is couple with every other tasks. In addition, since the data in the DB are not just some random bits, but represent some reality, there are rules about them that is *implicit* among *all* of the tasks, but is not spelled out anywhere. If you spell out the assumptions as DB checks/triggers, you have just rediscovered one reason of putting code and data together in OOP.

E.g., if a numeric column represents AGE, all tasks will probably assume the values will be positive. But what if the age is not known? Is it okay to put a NULL there? You wouldn't know unless you have all the details of the all tasks in your head, which will get impossible if the system has 20, 30 or more tasks. Any tasks that do a simple JOIN or any comparison will fail, such as counting records by age groups.

Sorry, but I find this example murky. Whether it is okay or not to put a Null there is part of the table interface (schema). One needs to know that just like they need to know a class interface. IOW, "ask the interface description" in both cases.

What's more, any tasks that takes the AGE, do something and put it back to some other table will spread the problem to other tables. So if you have a production system using this IPO paradigm, any bad/unexpected data may contaminate any number of records until someone notices the problem. Even when the problem is noticed, you got to have the production data to find out why, because the chance of you putting in the same bad data to reproduce the problem is practically zero.

And no, this does not apply to OOP because you can spell out any such assumptions in the Person objects representing such entity, in fact, the possibility of unknown age will be evident when you try to write the constructors of the Person class.

Triggers and referential integrity rules can do such validation. Plus, such validation applies to *any* language that uses the DB. If you do it with OO classes, then most likely only the language written in the same language as the constructors will be able to use it. IOW, it is a wider GateKeeper. Sure, you could perhaps have something like SOAP, but that is very similar to a "remote procedure call".

(BTW, this seems to be drifting toward the topic of FlawsOfProceduralDiscussion? or something rather than just descriptions of methodologies.)

Is it more appropriate to call this IPO a paradigm rather than a "methodology"?

I don't know. The borders between the two are blurry.

How is every task having potential access to the DB more evil than every class having potential access to every other class? In other words, talking to a People table or a People class is not that much different. They are a protocol of sorts. It is not "variables" which are global, but *protocols*, in both cases (DB and classes). ...[and later]... Sorry, but I find this example murky. Whether it is okay or not to put a Null there is part of the table interface (schema). One needs to know that just like they need to know a class interface. IOW, "ask the interface description" in both cases. ...[and then]... Triggers and referential integrity rules can do such validation. Plus, such validation applies to *any* language that uses the DB.

Either you are putting logic and constraints into the database or you are not. The initial description: "Each task reads and sometimes changes the "state" from/to the database. It is a variation of the old "input-->process-->output" paradigm, except that a good portion of the i/o is with the database" suggest that you should not, otherwise the it is no longer simply "input-->process-->output". If the logic is not in the database, you cannot "talk" to the People table as you could with a class, because the People table is just a data-store."

If you *do* put the logic and constraints in the DB, you are just putting code and data together as in OOP, but in the DB instead of the usual application program. So this ProceduralMethodologies is just doing OOP with the database, or is there more?

Well, I suppose this gets into the messy issue of what the precise definition of OOP is. It might be warrented to see if database constraints/triggers pre-date OOP and/or if they were motivated by OOP or created independently. Related is FileSystemAlternatives.

Let's ignore chronology for a moment, but concentrate on the difference between this ProceduralMethodologies and the usual OOP approach. What are the difference? From the description at the top, it seems that in ProceduralMethodologies, you should have:

a database with data (basically a dumb, "free for all" datastore)
many "tasks" reading/updating the database, doing anything they want with the data

But further down, it becomes:

a database with data, plus triggers and constraints forming an intelligent GateKeeper with some "protocol" you can "talk" to.
many "tasks" reading/updating the database through "talking" to the GateKeeper's "protocol", doing only updates allowed by the keeper.

Which is then similar to OOP with one class monopolizing all access to DB (which is how some OOP systems are done), and with a BigSoupOfClasses each representing a "task". What is the difference? Is the DB dumb or intelligent? Is the business rules in the "tasks" or in the DB?

Suppose it was "dumb". We could make functions that are the "official" access to DB entities (usuall for updating). Same role as a class. But if we consider the "smart" approach, I don't think that qualifies as OO. A BigSoupOfTasks? is usually easier to navigate than a BigSoupOfClasses because first, they are all tasks and only tasks. You know what they are. And second, often there is a table(s) or "dispatching" master task(s) that can be used to navigate to them. IOW, a kind of "task catalog". You can often easily navigate directly from the UI to the proper task. The relationship between UI's and tasks/events is usually cleaner and simpler and more consistent than between UI and OO classes IMO. There is usually a "backbone" which dispatches directly to tasks in procedural. That backbone is either a dispatching task, or an event and/or menu table(s). That backbone serves as a pretty good map to the tasks.

Regarding whether this is different from OO, OO tends to blur the distinction between the database and the application (see ReinventingTheDatabaseInApplication). Database tend to encourage one to divide stuff between shared concepts and application-specific concepts. OO tends to ignore the distinction or grow it organically. Second, is the "table shape" of relational that OO does not have to follow (see TablesAndObjectsAreTooDifferent and OoLacksMathArgument).

Re: "Is the business rules in the "tasks" or in the DB?"

It does not have to be mutually-exclusive. See FileSystemAlternatives.

I suppose "business rule" is kind of a vague term also. If the Customer table contains a code(s) indicating whether the best way to contact him/her is via email, phone, and/or paper letter, isn't that a "business rule"?

If it is just a character like 'P', 'M', 'E', then no. If there is another table mapping 'P' = phone, 'M' = mail, 'E' = email, and it gets to affect how the system works (i.e. shown to sales people on screen, or triggers the email system) then yes. The reason is if you only have the character 'P' in the DB, some task will have to know that 'P' = phone. OTOH, if there is another table or stored procedure storing 'P'=phone, any task just need to join that table or invoke that procedure and display/pass the result to other system.

The most important thing is the effect on the tasks. If you only have 'P' in the DB (i.e. business rules in tasks), you can change one task put 'T' in the DB meaning "telegram", BUT you have to update all other tasks that expects only P/M/E to understand T properly.

I find that a rather minor distinction to pivot the definition off of. Repeating the description rather than factor it to one spot might be bad factoring, but it does not really change the nature of what is going on. Related: ConstantTable.

Would it be correct to say that Procedural Methodologies group methods based on cohesion of functionality, while Object Oriented Methodologies group methods based on cohesion of data? For example, in a Procedural approach, one would start with a base Initialize() method and add lower level Initialize methods roughly in the chronological calling sequence. In an Object Oriented approach, one would have multiple objects each with an Initialize or Constructor method. In the Object Oriented approach, the chronological order is hidden, while the interactions between the Initialize method and other methods on common data is made obvious.

Some might consider that "scattered" rather than "hidden". What is the measure for "hidden" anyhow? One could argue that the sequence is easier to inspect if it is together. Relational can also make it easier to query by chronology or whatever factor you want to look at if you control that with data (see CodeAvoidance.) OO still has to deal with chronology, whether you can readily see it or not. Perhaps a sample application would help resolve this.

Re: "How do you share context between ever smaller tasks?"

I prefer subroutines/functions that have the option of "inheriting" the variable scope from the parent. (Support for it in most languages is often poor.) This is *not* good for general libraries or highly-shared subroutines, but is very helpful for breaking up specific tasks into sub-tasks. Unlike some others, I don't believe in trying to make every subroutine try to be globally "generic" by using only parameters for passing context. In other words, "leaky" scope is fine for nearby task-specific stuff. I generally divide up my designs into "tasks". Each task is usually composed of one "starting point" routine (similar to "main" in some languages), and the rest of the routines in the task are subservient to the main routine. Thus, the code tends to look like:

  ----Task A----
    mainRoutineForTaskA(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task A----
  ----Task B----
    mainRoutineForTaskB(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task B----
  ----Task C----
    mainRoutineForTaskC(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task C----
  ...etc...

Ideally, the supporting routines are invisible to other tasks to reduce name collisions. However, occasionally we do want to make them visible. Thus, such hiding should probably be an option per routine instead of forced by the language. We thus have two somewhat orthogal "settings" for a given routine:

Inherits scope from the caller Vs. Only through parameters.
Visible outside of module (task) versus visible only inside.

--top--

So how do you share context between the supporting routines? If it isn't global data and it isn't passed as arguments, I don't know where you are keeping it. (BTW, your design looks like most of my classes, with the only difference being that I can group tasks together to share a common context at a scope smaller than global.)

"Regional" variables. In otherwords, module-level. Many languages use regional variables instead of caller-scope to achieve more or less the same thing.

Your modules sound a lot like classes. What languages are these that use "regional variables"?

Now that is a different take on closure. If each source program - the entire text were designed as a class for one purpose - that of the routine name. No external references allowed, only the passed variables and returned variable(s) were in the scope of the caller. Any references to data outside the scope of the source program available only through a call. Closure enforced through structure. No variables in the source code would require any consideration of scope.

Most "scripting" languages use them. Perl for example. In most of them, variables declared in the "main" section become regional whether you want them to or not. In Pascal you make them by nesting routines. Maybe your OO is not really OO after all.

It doesn't matter if my OO is "real OO". It sounds like the only difference between your code and my code is that I can create distinct instances of these regional contexts instead of reusing a single instance all the time.

What does that get you? DeltaIsolation does not work very long.

It has nothing to do with DeltaIsolation. It lets me create as many regional contexts as I need.

Why do you need it? What is an example?

Why do I need multiple regional contexts? To build software. If I want 10 foobars, I get 10 foobars. Why would I settle for one foobar? And my regional contexts don't accidentally colide with other regional contexts.

Usually there is a data-centric solution to such. And, they don't suffer from PolymorphismLimits. Is InternationalUiExample related to "regional contexts"?

Procedural Design Rules of Thumb

Divide up code by "tasks" or "events" to be performed by the software
Study a list of proposed tasks to see if there is duplication that may suggest frameworks or libraries.
Information about nouns goes mostly in the database.
Most interactive tasks should be no more than about 100 lines. Batch tasks may be much longer.
Follow relational normalization rules. When in doubt, make one larger table instead of two smaller ones. (Some interpretations of normalization rules lead to too many tables.) See DatabaseBestPractices for more tips.
For larger teams, assign an experienced developer to coordinate and oversee shared libraries and frameworks.
Use a DuplicationRefactoringThreshold of 3 to 4. Thus, if you see a pattern repeated 3 or 4 times, then consider factoring it to a subroutine or shared library.
Lean toward using the database as the primary communication conduit between tasks instead of variables.
Hire a good DBA, or at least a good data modeler
Only use stored procedures for the more common or most costly queries. Excessive use of stored-procedures results in more code change-points for changes such as new columns.
Use languages with named parameters and built-in "local" tables if possible.
Understand the users' backgrounds, not just the requirements. For example, would they prefer easy-to-use or reduced typing/mousing? Optimizing interfaces for newbie understanding and power-users generally creates ConflictingRequirements. If you need both, then perhaps create two screens per user task, one for newbies and one for power users. However, this complicates the system.
Use DeclarativeGuiFrameworks and EventDrivenProgramming when creating GUIs rather than hard-wiring GUI arrangments into code calls.
Avoid having too many parameters. Lots of parameters usually indicates a problem with the design or the particular language you are using. See TooManyParameters for tips.
Dividing up by tasks should keep the name-space at any given (execution) time relatively small. If the name-space gets crowded such that naming collisions are common, then some refactoring may be needed to keep tasks small and relatively independent. Keeping the name-space small per task allows one to keep the variable names and scope management simple.
Strive for HelpersInsteadOfWrappers. GenericBusinessFrameworkUnobtainable for the most part. Make it relatively easy to abandon a given framework if need be.
DispatchOnFeaturesNotClassifications

A more extensive list can be found at TwentyFiveOrSoRulesToBuildSoftwareThatWorksAndWhichIsEasyToMaintain (pending a better topic name).

Why does this page have so much about databases on it?

Because TopMind was here. Everywhere he goes he talks about databases, regardless of context. I think he gets a penny for every database sold.

Why do you think LarryEllison is almost as wealthy as BillGates :-) Anyhow, I see procedural and relational a kind of Yin and Yang where they naturally complement each other with little or no fighting over territory (unlike say OO and relational). I agree that procedural by itself is not very pleasent. See also: AreTablesGeneralPurposeStructures. (As far as the "get a penny", you may be right. I have stock in database companies.)

PageAnchor: nested_reports

The notion that OO and relational (or, more likely in this case, SQL) development "fights over territory" whilst procedural does not is unsubstantiated. As a long-term developer of database-driven applications (i.e., "business applications") using SQL DBMSes with both OO and non-OO procedural languages, I have never found any down-side to introducing object-orientation, and it frequently makes certain aspects -- e.g., nested forms, nested reports, data-grids with arbitrary cell contents, re-usable abstractions, implementing stubs or MockObjects for testing purposes, etc. -- considerably easier to develop and maintain. For aspects that do not benefit from OO features, OO programming reduces to procedural programming, so I don't see how you can make a claim that procedural programming fits with "relational" development (i.e., employing a SQL DBMS back-end) while OO does not.
- It sounds like you are mostly talking about GUI engines. I think relational would work just fine as a GUI engine, but it's mostly an untried concept and will probably remain that way for at least a decade. (One advantage of such is being less language-specific, something OO GUIs cannot seem to solve.) But, GUI's are not the primary topic here. You seem to be saying that OO is a super-set of procedural and that supersets are always better than the subsets within. I disagree with that, but cannot find the topic that already covered that. --top
- {Relational won't work at all as a GUI engine... 'Relational' doesn't model communication, or even acknowledge that communication is a possibility, and the whole 'Interface' aspect of 'GUI' doesn't work so well without communication. I'm sure you meant something more than just Relational. Care to clarify?}
  - How about "90%" relational.
  - {Sure, so long as you remember the EightyTwentyRule. More seriously, that was a real awful clarification...}
  - EightyTwentyRule is about abstractions, not paradigm mix percent. Perhaps we can form the topic RelationalGui? to talk about such if you want to. This is not really the place for it.
  - {Call it 'abstraction if you like, so long as you remember that Relational neither models nor abstracts communications, or even generic calculations. I'm rather certain that attempting to fit all communication, display, handling, event management, pattern-recognition for gestures and hotkeys, etc. into 10% of the code will be near impossible, even if you can leverage Relational as a storage media. Are you assuming someone else has already written the million+ lines of code to handle all that stuff, so that your job is just writing up a relational schema and plugging in some data and code specific to your application? And are you assuming that code stored in a relational database is also 'relational paradigm'? But I agree that this isn't the place for this discussion. I'd be interested in RelationalGui? if you're willing to expand on your thoughts and clarify them to get it started.}
- {Anyhow, the above speaker does not speak of GUI engines; he speaks of non-interactive (but nested) reports and forms and data-grids as outputs as the result of a computational process.}
- How about a Nice Hawaiian Demo of this nested report betterment taking place. Assume HTML output to avoid getting into UI-API differences. --top
  - {If I recall correctly, HTML doesn't exactly do a lot of nesting.}
  - It's only the output medium/format in order to provide a neutral comparing format. This does NOT mean that you have to actually implement whatever it is you are trying to show using just HTML.
  - {Ah. That's reasonable, I suppose, so long as HTML is capable of representing the sort of complex reports that have nested components: output summaries with associated pie-charts and bar-charts and line-graphs and diagrams and captions and legends, and nested components, too: diagrams with pie-charts, pie-charts with captions, diagrams with line-graphs with legends. Since HTML doesn't exactly allow for that sort of output, you may need your program to take multiple components and output stuff in, say, PNG images using vector graphics. Writing low-level output code isn't exactly going to be fun for anyone, be they writing procedural or OO, but HTML may force it of you... which is why I was wondering whether it's an appropriate output format (or even appropriate to specify a fixed output format). But since I find report generation spectacularly uninteresting, I'll leave this battle to the original claimaint.}
  - May I suggest you restate the paragraph after PageAnchor "nested_report" (above). It is not clear what is being compared to what. Procedural is mentioned there. And, I don't think we need to demonstrate graph/chart-drawing algorithms themselves unless that is somehow related to it all. By the way, I've seen web-based charting engines that use images, Flash, and Java applets to generate the chart itself.
  - I'm not clear what you'd like re-stated. The paragraph is not about comparison, it is about the apparent inconsistency of claiming procedural programming is suited to relational systems while OO programming is not, yet OO programming is procedural programming that uses inheritance, polymorphism, and encapsulation. I agree, however, that graphing and charting specifics are beyond the scope of illustrating a simple report generator, in which it should hopefully suffice to merely output primitive data values (strings, integers, etc.) to some trivial device, such as a console or line printer.
  - {As the claimant, you should be aware that OO will only show its advantage regarding 'nesting' if your example actually has complex nesting. But the more primitive you make it, with fewer distinct report-types that need to be nested, the smaller will be any revealed advantage of OO over Procedural. E.g. if you only have integers and strings, with strings nesting integers and other strings, there really won't be much demonstrable difference between the two. I think diagram reports might be better as an example... e.g. reports on a network or router graph with usage statistics derived from data logged into a relational DBMS. And to top: yes, you could use Flash and Java applets to generate the chart (at least if you adopt a VERY loose interpretation of what means 'HTML report' that happens to include 'Flash report'). But your application would need to generate the Flash and applets for the report (which I can't imagine will be much easier than would be producing vector-graphics PNG images with nested reports). If you are going to have a fair measure of which 'paradigms' are involved, you MUST be responsible for producing EVERYTHING involved in that final report. Allocating it off to black-box programs without counting the paradigms in those programs as part of your own is... cheating, or 'cherry-picking', at least in the game of comparing paradigms and approaches.}
    - By primitive data values, I meant that I did not intend to render (i.e., output) any data type that cannot be trivially boiled down to a stream of characters, though I intended that the layout of these could be arbitrarily complex. However, I suppose there's no reason not to include images, graphs, charts, etc., as long as we presume rendering code already exists for these. Whether the rendered output is characters or a complex diagram (or both), the concepts and mechanisms are the same. I had intended my report generator to be a *trivial* character-oriented illustration, with perhaps at most three report types, such as columnar, row-oriented, and free-form. That should be sufficient for illustrative purposes, as the conceptual basis is the same whether there is one type of report, three, or a dozen. Also, business-oriented database-driven applications generally focus on textual data, which I presume would be of greater interest to Top than diagrams. That said, if time permits, I'll generalise it to handle diagrams, too.
    - "Type of report" suggests that the classification of report differences can smoothly broken down into a hierarchical taxonomy. I'm skeptical such is powerful enough for the real world, prefering some kind of set-based feature-management system to manage (track, find, print, etc.) multiple variations on a theme.
    - You wrote that already. See the discussion immediately below this one.
  - It is hard to comment without seeing the requirements details. As far as "report types", I am skeptical that reports can be generically well-defined as a tree taxonomy of sub-types. I used to try to generatize stuff via large-scale abstractions, but found that "pick a part from the junk-yard" works better (HelpersInsteadOfWrappers). And, it is not realistic to expect a custom biz-app builder to make chart-drawing modules from scratch. (At least PhP and ColdFusion both come with packages for such, BTW.)
  - {Top, when you finally "get it" that "type" DOES NOT mean "tree" or even "inheritance" or "taxonomy", perhaps you'll have a minor 'eureeka' moment. But it seems it hasn't happened yet. Types of reports that I can name off the top of my head include: pie charts, line graphs, bar graphs, statistical reports, flow charts, dataflow graphs for networks and programs, etc. Nesting these things can be putting statistical reports into a dataflow graph. And while it "is not realistic to build modules from scratch" whatever, I don't care - I don't expect you to build it from scratch. But if you include chart-drawing modules, YOU NEED TO COUNT THE PARADIGMS AND LOC USED IN THAT MODULE, otherwise you are "cheating" when you're saying "which paradigms your solution uses". Your 'solution' and which 'paradigms' it uses is the whole package, not just the little piece you want to count. Just counting the little pieces you want to count is the very definition of cherry-picking.}
    - "Types" means whatever the hell the debater wants it to mean. -- top
    - {And you wonder why people call you HumptyDumpty. I don't need a definition of 'type' to know that even in languages that support inheritance, different 'types' don't necessarily fall into the same hierarchy. Therefore, even in languages that support inheritance, "type" DOES NOT mean "tree" or "inheritance" or "taxonomy".}
    - It's turned into a useless overloaded term when applied to technical slicing of concepts. (Some claim it takes 500 pages of text and math to define it, which I find utterly ridiculous, but that's another topic.) And, interface polymorphism often suffers the same limits that "hard" hierarchies suffer. If you slice poly small enough to compete with set-based techniques, you usually merely are doing "dumb" set/gets with poly.
    - {It probably takes 500 pages of text and math to really understand types. Defining them is easy... but, like defining 'monad', it doesn't really help you understand or make use of them. ... But that's another topic. And I haven't a clue as to how 'interface polymorphism' got into this conversation.}
    - Ronald's "type" writing is a meandering Rube Goldberg model. I don't want to hear about that shit anymore. Bury that smelly shit.
    - {It really is quite amusing watching you throw a tantrum and fling shit at something you don't understand, at least when I get over my utter disgust at your behavior.}
    - Anyhow, like I said, usually one uses off-the-shelf tools to generate them because it is rarely economical to roll your own (Although I've done it before), and often one cannot count the lines of code because it is proprietary and compiled/encrypted. I don't know what metric you use to claim "better" anyhow. You need to be clear what you find better, why its better, and how you are measuring. I cannot do that for you because I don't share your betterment feeling. If most purchase components to do charting, why do you want to include it in YOUR comparison? Something smells fishy here. I don't expect one to count the code needed to execute a Print statement under the cover, so why should charting be any different? --top
    - {You can't excuse your cherry-picking by saying the data you need is inaccessible. Imagine a situation where you "purchased a component to do charting" where all you had to do was feed it the raw data and a basic description of the report you wanted and out pops HTML and PNG... i.e. exactly what your product is supposed to do. Then you can say, "hey! look at my procedural solution! it takes data, and passes it (unprocessed) straight to this other component, and out pops a report!". What did you prove about procedural? Exactly nothing. If you wish to be scientific, you are responsible for counting up everything in your solution that lies between receiving the report specification and outputting the HTML file. And why should charting be included in the comparison? Because the charts are part of many reports, and HTML - the output format YOU chose - lacks higher-level support for chart descriptions. Technically, you should include the 'Print' statement, too, if you 'Print' to the HTML file; it is also between receipt of the report specification and output of the report.}
    - As proposed in ChallengeSixVersusFpDiscussion, perhaps fat libraries are the key to productivity, not newfangled paradigms. And you haven't addressed the Print statement issue, and all the assembler inside the compiler/interpreter, etc. Where is the point where we stop counting?
    - {You seem to want the entire world to build you a super-library so that you can do anything you want by invoking a procedure. Your ignoring of the magic behind the curtain is the source of your cherry-picking. And I did address the 'Print' statement... see the last sentence of the prior statement. EVERYTHING in the process between receipt of report specification and output of the report should be counted. The 'magic' in the interpreter should be counted, too. It is also part of the 'business process... delivering'.}
    - {As for proving an approach 'better', you can look at the things that would matter to users of your solution. Among others, this would include code-change metrics (i.e. the amount of effort it takes to add three or four more new 'types' of nestable reports to the report-generator... or, from user perspective, the cost and time it takes to get a new report-type after asking for it), flexibility (i.e. the degree to which a user can specify the desired components of the report), and modularity (i.e. the ability for a third-party to add report-types).}
    - Go for it. Show show show. I'd love to see OO kicking P/R's ass by these metrics using inspectable code. OO fans have been flapping about such claims for 9 years, but never produce biz code that delivers. Bring it on!
    - {You keep yapping about 'biz code that delivers' as if all the other pieces your code glues together (e.g. charting modules, browsers, RDBMS's, pattern-matching systems, etc.) don't qualify as 'biz code'. I look forward to you attempting a 'RelationalGui?' that doesn't essentially fall back on a bunch of code outside the P/R approach of which you're so fond. And what made you think I'm a fan of OO? Do all us non-P/R fans look alike to you? I don't support OO much more than you do... and use it only as a platform for 'computation oriented' programming and design (for which OO languages provide encapsulation and runtime configurability). As to whether 'computation oriented' could beat P/R in 'biz app code' depends on how flexible and modular you need your 'biz app code' to be. What I can say for sure is that computation oriented design benefits well from relational or any equivalent knowledge management system, and thus OO/R aren't in conflict when OO is used for computation-oriented design.}
    - I copied the library issue to CodeComparingIssues. Let's continue it there. Remember, your primary goal is to demostrate to a rank-and-file custom biz developer working in the domain that OO would help them write and/or maintain your "nested reports" scenario. You are not selling DLL libraries or what-not (are you?). --top
    - {What needs to be demonstrated to a 'rank-and-file custom biz developer' is that they can write the 'DLL or libraries' they need to do the job, and maybe even sell them. If a tool exists that already provides everything they needed to do the nesting and write the reports out to HTML, they should use that, but much of the time solutions like that simply don't exist or can't be distributed, and 'custom biz developers' need to write their own. Where I take issue with you: it's okay to glue together solutions where they exist, but it is not okay to say that such 'solutions' are 'procedural' simply because the glue-code you chose happens to be procedural.}
    - Without studying the details of an actual scenario, I cannot fully comment on the likelihood of such a scenario. You need to explain the scenario, the budget, the time-limit, etc. Sure, a boss may want it exactly the way he has in his/her mind, but an exact fit is sometimes not worth it. I generally present an "A", "B", and "C" scenario with times and costs of each. When they see the cost of an exact fit to their desire, they usually settle for a workable compromise. By the way, MS-Excel has fairly good charting capabilities and with the help of some macros, OLE, and VBA, I've hooked them up as a chart driver before. I wanted them to purchase some components for smoother integration, but they didn't want to pay for it. By the way, have you tried GnuPlot? --top

What's a "Nice Hawaiian Demo"?
An old "Hawaiian Punch" advert... Great. That has what to do with this? You want a "punchy" demo or some such?
Just produce the goddam report nesting code. I don't wanna fall into your pedantic TV commercial debate.
Sorry, since it was capitalised, I thought "Nice Hawaiian Demo" was a specific technical reference, perhaps drawn from some prototyping tool or development methodology. I thought your link to the Hawaiian Punch ad was being deliberately snide, as if I should know what a "Nice Hawaiian Demo" means. If you don't mind a simple, stripped-down-for-illustration-purposes report generator, I'll work on some skeleton code, as a full-featured report generator is a big beast. Or do you mean a specific example of a report that benefits from nesting?
Whatever it is you mentioned above as "nested reports" that OO allegedly does better. Perhaps you are just a crappy procedural/relational programmer who does P/R messy and goofy and you are comparing decent OO to bad P/R. One cannot tell for sure without seeing the code. (I am not accusing you of being a bad P/R'er, only saying there's not enough info to remove that possibility.)
- {IIRC, the original point is that OO/R isn't in conflict. You keep saying P/R vs. OO as though that's the way it needs to be.}
- Confusion between R and P addressed elsewhere.
Perhaps you're right. Over the next few days, I'll try to find time to work up a very simple, stripped-down object-oriented report generator example. I'll have to leave out a lot of detail, but I'll try to capture the essential characteristics that illustrate the benefit of OO language features. You can create a corresponding procedural example. Then we'll compare and contrast them. By the way, I'd still like to know what you mean by a "Nice Hawaiian Demo."
Sounds like a plan. By the way, perhaps it should be moved to OoBusinessExamples.

{I can't speak for top's reasoning, but I understand it thusly: OO and Relational do not mix well when you utilize OO for what is essentially a 'simulator' with domain-object simulacra. See ObjectVsModel for a broader discussion. If you, however, utilize OO primarily as a discipline to organize computation - e.g. to build processing pipelines, to encapsulate and configure tasks and tests, to represent synchronization objects, to represent commands, messages, inputs, outputs, etc. - there is no conflict between the two; to the contrary, such design benefits immensely from good collection-management and external knowledge management, as is provided by a database. But this isn't really 'ObjectOrientedProgramming' - at least not in the sense it is taught in books and schools with dogs and ducks, cars and engines, and various other domain-object simulacra. It's more of a ComputationOrientedDesign? implemented in an ObjectOrientedLanguage by representing computations as objects, where the content of the tasks, inputs, outputs, commands, and data are as 'domain-specific' as it gets. Based on your description above of representing outputs (forms, reports, components thereof, cell contents) an test-configurations, it sounds to me like you are using OO in this manner.}
{I'd venture a guess that most highly experienced OO programmers primarily use their ObjectOrientedLanguage for ComputationOrientedDesign?. This notion is supported by the Boost libraries and the fact that a great many recommended 'design patterns' for OO seem to focus on communication, calculation, representation of messages and commands (command pattern), task orgnization, dispatching, and are otherwise ComputationOriented? (AreDesignPatternsMissingLanguageFeatures?). However, I lack statistics. This guess that may just be WishfulThinking on my part, possibly mixed with a little bit of NoTrueScotsman.}
Yes, I use OO for ComputationOrientedDesign?. I was operating under the (possibly unreasonable) assumption that most experienced OO developers use their ObjectOrientedLanguage for ComputationOrientedDesign?. I too have no real-world statistics on this (it would make an interesting research project, though, assuming appropriate rigour can be upheld), but my experience in twenty-odd years of developing database-driven applications is that the only developers who create business simulations -- as opposed to creating computational tools to support developing business fact presenters, generators and collectors (i.e., GUIs, reports, various processing engines) on top of a fact processor (i.e., SQL DBMS) -- are either very new to the business application development game, or are forced down that path by ObjectRelationalMapping tools. The former is easily cured in the discussion session after the student or new employee sheepishly comes up me to whilst working on his or her first significant database-driven project and says, "I've done the object model, now how do I hook it to the database?" The latter is a bit tougher, especially as the approach has devoted proponents despite solid arguments against it. DateAndDarwen call equating classes with tables the "First Great Blunder", and a blunder it is. However, I'm afraid the toy examples taught in texts and schools tend to encourage this, or at least allow it to happen, because they tend to illustrate OO concepts using the most superficial simulation-oriented examples. They rarely cover the use of OO for ComputationOrientedDesign?, and almost never touch on databases in any form. Perhaps this isn't surprising -- textbook authors and lecturers may be good communicators, but they're not necessarily good developers.
{I think it's safe to assume that top is not among the set of programmers described as "most experienced OO developers". Anyhow, due to support for encapsulation and runtime configuration, ObjectOrientedLanguages are capable of ComputationOrientedDesign?... but only barely so. If 'AreDesignPatternsMissingLanguageFeatures' has merit, there are a helluva lot of missing language features in the OO languages for ComputationOrientedDesign?. I made a list, but it was longer than this paragraphs and digresses too far; it's sufficient to note that there are many things with which you're fighting the language and/or using truckloads of macros and metaprogramming to implement. In any case, I think it a mistake to conflate ObjectOriented with ComputationOriented?. I mean, trumpeting ObjectOriented when you're actually doing ComputationOrientedDesign? is like exalting Procedural when you're actually using ObjectOrientedDesign - with either, you're fighting the language to implement it (e.g. OO in procedural by passing around pointers to data-structures that themselves have pointers to method-tables that contain lists of function-pointers that take as a first parameter a pointer to the data-structure... but without any language or syntax support for this design pattern). In my own mind, I call 'ObjectOriented' those methodologies that utilize domain-object simulacra (and tend to keep state for these simulated objects), and I call 'ComputationOriented?' those methodologies that focus primarily on representing computation (calculations, communications, concurrency, commands, messages, tasks, etc.) and tend to keep 'facts' about the real world in a 'reflective' database (see DataManipulation).}

There are many problem domains that do not involve interacting with databases. How do the above suggestions apply to scientific computing, signal processing, or interactive graphics, just to pick three?

My domain is custom business applications. I cannot speak for all domains. There may be some related discussion in AreRdbmsSlow. If you wish to propose a topic name change or add your own observations about procedural methodologies in other domains, be my guest. -- top

Noun Indexing

One of the complaints against task-oriented grouping of code is that the associations with entities are not included (contrast with ResponsibilityDrivenDesign). I agree that such association can be a nice thing to have when searching or exploring code. However, there are two problems I find with the tight associations found in other paradigms:

There are often multiple entities for any given task. It is a many-to-many relationship.
The association of entities and tasks tend to be fleeting or dynamic over time. We should avoid a DiscontinuitySpike for such changes.
Which nouns are used to carry out a task should perhaps be considered an implementation detail to be hidden inside the black box of modules or routines rather than "stick out" loudly.

Thus, although I agree that indexing by noun/entity is useful, it should be "soft" rather than hard-wired or imposing itself heavily into the code. Here are some suggestions for "gently" providing such information into a code repository:

Use a RDBMS instead of file systems for code management, and introduce an entity cross reference into the "module" and/or "routine" tables or schemas. For example, have a many-to-many "routineEntities" table with the following columns: routineRef (ref=foreign key), entityRef, rank, notes. Column "rank" would provide "fuzzy associations".
Put a specially-marked comment into the top of each module and/or routine that can be used by a code analysis tool to extract entity meta information. Those mentioned first may be given a higher ranking (see above). In the following example comment, the code analyzer could extract lists from comments starting with "entities:":

// entities: customers, orders, products

Cross reference table names with code using parsing tools. If perfect parsing is not possible, then at least such could provide a good guess. Extra wrongs are better than missing rights in my experience. So error on the side of "wide" token parsing if one has to chose. For example, parse comments if there is no guaranteed way to separate them all. This is common in web languages that mix languages (SQL, app language, JavaScript, style sheets, etc.)
Some source-code analysis tools may already provide such functionality. However, I like to be able to do open-ended queries on such info rather than live with just the presentations such tools give out-of-the-box. Personally I would rather just buy the parser and then load results into a DB myself based on the extracted info. See SeparateMeaningFromPresentation.

--top

In ProceduralProgramming, usually there is some kind of interface module or table that dispatches the "tasks" based on user input or other "events". For example, in LegacyCode it might like like:

  select on userOption
  case 'D' 
    deleteSalesOrder()
  case 'A'
    addSalesOrder()
  case 'V'
    viewSaleseOrder()
  ....
  case 'H',' '
    salesOrderHelp()
  end select

In larger systems, a ControlTable may be used for such. In GUI systems the "tasks" generally correspond to events. The code for dispatching is usually hidden from the application developer, but generally would look similar to the above or a ControlTable.

Batch setups are even simpler. You generally have something like:

   stepOne()
   if (condition....) {
      stepTwo()
   }
   if (condition....) {
      stepThree()
   }
   if (condition....) {
      stepFour()
   }
   .....
   cleanUp()

Note that I prefer to have the NounModel mostly in the database. Thus, my code is generally not shaped-by nor grouped by entities or noun taxonomies. Maybe some ProceduralProgramming or FunctionalProgramming fans do such, but I am assuming otherwise here. Well, I take that back: I might group some related task modules by entity if there is an opportunity, but this depends on the language and environment.

In short, there is a usually a "backbone" of some sort that dispatches the tasks (or event handlers). This backbone is fairly consistent and usually closely tied to the user interface so that it is fairly easy to map from user requests to tasks. Most user requests are described in terms of interfaces.

Re: if a numeric column represents AGE, all tasks will probably assume the values will be positive. But what if the age is not known? Is it okay to put a NULL there?

(Probably should move the below text to a new page, but discussing it here for now)

A practical option for eliminating the NULL in a column with numbers would be to use negative values, assuming your column supports negative numbers (you can change the type specification of the column if it currently only supports positive values).

Take an example: the AGE column was input via a data entry person, who was reading the AGE off some forms that were written in pencil or pen by customers. If the "Customer Did Not Provide Age" then put in -1 into the AGE column. "Cannot Read Age Due To Messy Handwriting" would be -2, and "Customer Did Not Wish to Disclose Age" is -3. The negative values act as an enumeration (-1, -2, -3, ...) and can be expanded when needed. This way, when anyone queries the database for bad AGE data, he can find the AGE data factually. One can query all the positive values of AGE and find legitimate ages. One can find all unknown AGE's by searching for negative values. In fact, it is no longer unknown - you know WHY the age is not entererd because now you have facts about why the age wasn't entered.

Using negative values, one can create another table that maps these negative values to their descriptions - then one can query this table and find out info about what the negative values mean. This is much much more factual and helpful than NULL.

I suspect this has to do with why Codd wanted to offer several null values later on in his research rather than a single NULL - because he wanted more descriptive "nulls" that had more meaning. Unfortunately, decisions about NULL'S and what to do with them are one of the biggest problems in databases. Deciding how to avoid NULL's elegantly is extremely tough - and no, I do not find Date and Darwen's complex "How to Handle Missing Information" solutions really elegant or convenient, although at least they attempt to resolve the NULL problem.

Possibly some would argue that using negative values as suggested above, is kind of like supporting several different types of nulls. Not so, because negative values can be queried and strictly specified in another table. This is much more specific and factual than a NULL or multi option NULL.

A problem arises when negative values already exist in the data column. No longer can negative values be used, then, for errors or missing information. This is a similar problem to error return codes from function, and the decisions that have to be made when returning errors in API's (and no, throwing exceptions and killing the program is not the most elegant solution either).

A problem arises with string columns, or columns of a type that do not support negative values.

Maybe databases need to support a special placeholder value for errors and missing information, which references another table that one can extend. However, BOOLEAN values should always be BOOLEAN.. without any choice of NULL or any special value for missing information.

Is this bringing back three valued logic? No: because booleans remain only TRUE and FALSE.. while other columns that aren't booleans can contain error values that are factually represented via negative numbers, or some other special representation such as ~~Error1, ~~Error2, ~~Error3, where these errors reference another table with key 1,2,3 referencing descriptions of these errors.

There was a big discussion a few weeks ago concerning complex values along with error messages or indicators, but I cannot remember where it is right now. I'll link back to it if/when I encounter it again. --top

DecemberZeroSeven

CategoryModellingLawsAndPrinciples CategoryMethodology