Data Centric Thinking

This is an attempt to do a "mind-dump" about how I think when I do software design. I try to make this neutral rather than bashing behavior-centric techniques, but my bias may leak out. I am only trying to explain my thinking process, not justify it. I am not saying this is the right way to think, but simply documenting my current way of thinking. I hope others here also show some mind-dumps.

I am also not going to talk much about relational theory here. Although I think it is the best data management tool right now, I think data-ness and relational are on somewhat different level of issues.

I like to describe nouns without worrying about how they are going to be processed at the moment. It helps me DivideAndConquer. Describing what you want can be split from how to get it. How to get it is a lower-level detail. It is kind of like making a grocery list and then handing it to somebody else to do the foot-work of getting the groceries. You put the information about nouns or "things" into a data holder/structure knowing it can be and fairly likely will be used by other processes in the future. For example, you can save the grocery list for another day. You may want to put priorities on it so that the most important things are obtained first if time is short. That way if you are short on time or money, you simply tell the shopper to pick the higher-ranked items before the rest.

Plus, descriptions of nouns are reusable for other tasks. Data is generally not an island, at least not in my domain. You store facts about something, and it may be usable for some future purpose which you cannot foresee yet. Maybe the grocery list could be used by my wife to plan a surprise party, or sell to a marketing research firm to earn some bucks, or to take inventory of the kitchen to see what is low. (Okay, weak examples, but you get the idea.)

Further, data handling can be standardized such that data browsers, query/sifting tools, persistence management, concurrency management etc. can be obtained or reused without reinventing the wheel. Automating data chomping is easier than automating behavior chomping, at least for me. I know tools that make such viewing and transformations readily available and easy to grok. There are more "data maths" than "behavior maths", at least ones that I find usable. I think this is one of the reasons why it is easier to make a language-neutral database than language-neutral behavior representations. I don't see a lot of "behaviorbases" on the market.

Now, I realize that data is not the best form of every piece of information about the world or our applications. Code is indeed better at some things. Complex conditional logic is one of them. I have yet to find a data-centric way to handle much of complex conditional logic in a nice way (see InterweavingOrthogonalFactors?). ExpertSystems have tried to do this, but the results are mixed. For one, they don't provide a good way(s) for one logic "cell" to communicate with another, and make it hard to take advantage of local context to simplify certain logic, resulting in OnceAndOnlyOnce violations or a lot of interface work. Thus, I tend to put conditional logic of raw business rules in code. However, the data side still fuels the conditional logic for the most part. It supplies the values or codes on which the conditional logic operates.

Given the grocery list example, suppose I put a column of rankings for foods I prefer on a cold day, and another ranking column for foods I prefer on a hot day. The list describes my preferences, but does not implement them. I can put that information into my list without worrying about implementation at the time. But to later implement them I might make code like:

  if (weather = cold) {
    ordering = "coldRank"
  } else if (weather = hot){
    ordering = "hotRank"
  } else {  // mild
    ordering = "hotRank + coldRank"
  }
  result = getView(groceryList, order=ordering)
  ...process result...
In some cases, there may be a way to push such processing out of IF statements and into our data-handling tools, or a mix of data handling tools and formulas:

   factor = (fahrenheit - 50) / 50
   factor = keepInRange(factor, 0, 1)  // keep between 0 and 1
   ordering = "$factor * hotRank + (1 - $factor) * coldRank"
   result = getView(groceryList, order=ordering)
   ...process result...
However, it is not always convenient or not intuitive to do such. Thus, my code tends to have lots of IF statements. But what you don't see much is code of mine that describes nouns or relationships between nouns for the most part. That is because describing nouns and relationships between them is best as data in my opinion. This is one area where OO falters (or at least some methodologies of OO) in my opinion. Now, some claim that OO improves upon the conditional logic side of things. However, I have yet to see a convincing demonstration.

OO is not intended for writing business logic. Business logic is never reusable except as part of a whole application. OO is intended for writing general-purpose libraries, not special-purpose applications.

Polymorphism requires too much "regularity" to be useful for real-world business logic that I see. Bending it to take care of all the "dirty fits" makes polymorphism even uglier than nested IF statements in my opinion. Plus, it tends to want to put noun taxonomies in code, which is a no-no in my mind (described above).

I don't like interweaving nested IFs, but so far there is no clear alternative that handles real-world conditions. Interweaving IFs are admittedly the ugliest part of my approach. But, I think this is because IFs are where I "dump" the messy parts of the model. Almost any messy nitty little business rule can be IF-atized if you put it in the right spot. I think there are some parts to modeling applications that are simply immune from relatively clean abstractions. The real world is messy. That is just a fact of life. If it were clean, it would already be available in a box at WalMart or Computers-R-Us. Often, one is modeling the whims and egos of marketers, managers, and customers in my domain, and these appear to defy logic and rationality. Generally you have to be modeling something shaped by rationality or predictable laws of nature to be able to apply rational clean abstractions. The mind of a marketer defies that, at least that a typical geek can see. Thus, these huge bumps in the road to abstraction are going to show up somewhere in ANY paradigm or technique, I suspect. In my technique, they show up as messy nested interweaving IF statements for the most part.

OO fans seem to think that coupling data and behavior or wrapping data with behavior can fix This Great Weakness. But, I have not seen a demonstration that is promising in this area. The demonstrations are no less fragile against the funny little rules and exceptions that pop up. Sure, it looks great in the pristine environment of textbooks and brochures, but doesn't everything? -- top


Is Behavior More Powerful?

It is sometimes claimed that behavior is more important than data, that data and data-centric processing is "too wimpy" or the like. But my observation is that most algorithms are composed of a large part DatabaseVerbs or at least potential DatabaseVerbs. I have been kicking around a table-centric language interpreter, and a large part of the processing is or can be composed of DatabaseVerbs. The more I study it, the more of the interpreter can be implemented using database or table transformations, and I doubt I have exhausted the possibilities.

It is also sometimes stated that nothing happens without behavior, and therefore behavior is more important. However, this would be like saying that a military general would not get anything done without front-line soldiers who execute the general's plan, and therefore the soldiers are more powerful than the general.

The soldiers are more important than the general. A general can't fight a war without soldiers, but soldiers can still fight without a general. General s decide when and where, but once the action starts, soldiers do all the work.

I often view behavior (code) as simply the front-line soldiers that carry out the "wishes" specified in the data. An "isRequired" column in a DataDictionary by itself does not execute the enforcement of field being required, just as the general does not actually carry out his own orders. (This example was first stated somewhere else and perhaps should be consolidated with this.)

The general decides how to fight the war. He can get soldiers from anywhere.

That statement does not contradict anything said above. It is why data tends to outlive languages.

Sorry, i'll try to say something contradictory next time.

[Yes, we wouldn't want too much agreement, we'd have nothing left to talk about!]

It meant that the purpose of the point is not understood. Perhaps if the comment said, "In other words, the general.....", the transition would be less confusing.

Chicken and egg? Apples and oranges? Data exists (or doesn't) without purpose. Behaviour has a specific purpose. Data may appear to have a purpose, but only because our minds implement behaviour; data looks more useful because our minds can do things with it without needing behaviour code, but we are not good at storing text or specific abstract information. I personally like OO because I find it easier to conceptualize OO, but I know that it can all be done functional/procedural/... And business logic will always be messy; in OO, I can abstract portions of it away into families of objects, and use patterns like StrategyPattern and StatePattern. There is always mess, but componentization helps to put it into discrete buckets. I think that FlowBasedProgramming may be doing the same thing with separate asynchronous processes. -- WillChamberlain

Can you provide an example of how you think in OO and why it helps? Also, can you kindly clarify "componentization"? Functions, modules, tables, etc. can be considered "components" or at least discrete units. Nobody seems to be proposing one big file of in-line code that I can tell. All but the very worse coders have some kind of approach(es) to dividing things. Often deciding which factor to divide stuff by is a big compromise among multiple orthogonal candidate factors. Very few experienced coders place stuff randomly. Non-OO techniques also have strategy-pattern equivalents. However, I find that "noun classifications" and "behavior classications" tend not to sync up one-to-one, and thus in practice the strategy pattern is not very practical. -- top

I may be just an AnonymousCoward, but I think I might be able to summarize why OO/polymorphism is seen as a step up over nested IF statements. Yes, it does sometimes make you hammer methods into classes where they don't quite fit. However, it also allows you to add new business rules closer to their domain. IF statements force you to make changes to business logic in exactly one place, meaning that adding a new branch of business logic types always requires two edits: one to the IF nest, and one to define a new class.

Just my 2c.

That is if you can define your domain cleanly as "types" or "sub-types", which I have generally found difficult to do. Related: VariationsTendTowardCartesianProduct, LimitsOfHierarchies. "Simple" OOP solutions tend to be rigidly hierarchical, and the more complicated ones that facilitate flexible indirection are messy to grok and manage as textual source code, at least for my eyes. Now, maybe some OO'ers have FastEyes and can deal with a textual web of cross-references. I can't. I prefer using tables with query systems such that I can re-project the info into a form that's more conducive to a particular situation or question. I'm not stuck viewing the relationships in the way the original coder placed them. Sure, you have ObjectBrowsers that may help some in OO, but I prefer relational over pointer graphs (NavigationalDatabase) to manage bunches of relationships. -t

You're clearly talking about object oriented databases, of course. Obviously, object oriented programming (I assume your use of "OOP" was a careless mistake) long ago evolved from simplistic domain modeling to abstract computational modeling. The former may well suffer from limitations of hierarchies, singular views and the like. The latter captures invariants, and so is not (as) subject to fragility due to domain changes.

I thought I addressed that above. I agreed that it "evolved"; but, that evolution created new problems. And I am not talking about OO databases. The bottom line is that the relationship between things needs to be managed somehow. Sometimes it's better to manage them in a database than as textual code. Non-trivial cross-references are difficult to manage in textual code.

I do not recognise the non-"OO databases" to which you refer; you certainly aren't addressing object oriented programming, unless you're referring to some table-oriented equivalent to code polymorphism, inheritance and encapsulation that I haven't seen. I do recognise your suggestions as something from the OO-vs-Relational database debates of nearly two decades ago. Things have moved on since then.

There are various examples floating around this wiki. One that comes to mind is DoubleDispatchExample.

Pointless. There are OO languages that implement MultipleDispatch. TutorialDee, also a RelationalLanguage, is one of them.

But they pull a GreencoddsTenthRuleOfProgramming. One is putting database-like features into the languages to achieve such.

Please define "database-like" and indicate how it is measured. I do not find that term in the literature.

I don't know what the formal literature says about it.

Therein lies your problem.

But I consider things like predicate dispatching to be "database-like" (compiler/interpreter searching a list of parameter and type profiles). I could also flip this on you and ask you to prove these extra features of your pet languages are OO and only OO features. You couldn't do it. People who live in glass houses shouldn't throw academic rocks. -t

Where did I suggest MultipleDispatch or DoubleDispatch are only OO features? They are, however, found in languages known for implementing OO features, such as DylanLanguage and TutorialDee.

And what does that tell the reader?

Huh?

The key issues here are whether we use NavigationalDatabase-like features or relational-like features to manage complexity; and if we use static (code-centric) or dynamic versions of such.

I don't know what complexities you think you're addressing. They appear to be imaginary.

Going back to the printer driver example, there is value in being able to sift, sort, filter, cross-reference, etc. all the different printers, shapes, etc. for trouble-shooting, checking, etc. If there's only a few and they rarely change, go ahead and hard-wire all that info into source code. But when there's hundreds or thousands, then DB-flavored tools should be an obvious help to all but the few die-hard DB haters. (In reality, there's usually in-between meta-languages such that the example is of questionable practical utility as stated.)

What makes you think OO programming precludes being able to "sift, sort, filter, cross-reference, etc. all the different printers, shapes, etc. for trouble-shooting, checking, etc."? How strange. Once again, you seem to be confusing a twenty year old notion of object oriented databases with modern object oriented programming.

They can, but they are then becoming more database-like to do it. And thus expose GreencoddsTenthRuleOfProgramming. Another problem is that extending OOP in such a way usually leads to navigational-like database features instead of relational-like ones. You may have to break encapsulation also. If every object "inherits" DatabaseVerbs, then encapsulation is broken. If only some do, then some are encapsulation-broken.

OO is based on the premise that each object defines and "carries" is own interface. With RDBMS, there is a standard interface for the units of info, and one uses those and only those standard units. (They are sufficient as building blocks in most case to define more complex behavior.) OO has no standard verbs/accessors, and as soon as you start adding them, you are reinventing a database of sorts. -t

I have no idea what point you're trying to make. I've repeatedly pointed out that object oriented programming is not the same thing as defining object oriented databases, yet everything you write persists in conflating them. Do you have any experience developing large database systems or object oriented programs?

You are still attempting to use DB-like features to fix the flaws of simple OO; and the larger and more complex the project becomes, the more you will want other DB-like features. For a somewhat simplistic example, you wouldn't want to hard-wire 2,000 printer brands into the OO double-dispatch example and maintain them that way, would you?

What does "DB-like" mean, and how do you identify and measure it? It appears that "DB-like" is simply your recognition that many aspects of ComputerScience rely on searching (...and sorting; see the third volume of TheArtOfComputerProgramming by DonaldKnuth.) That doesn't make them databases.

Until there is a consensus on paradigm categorization and methods of paradigm categorization, there probably is no reliable way to classify such features in a way that would satisfy you. Conversely, saying they are only OO features or only FP features would run into similar problems. I would put predicate-based searching more into the "data centric" or DB camp than I would the OO camp. If the reader wishes to disagree, that's their prerogative. OO out of the box generally lacks collection-oriented idioms. They are not part of the "atoms" of OOP. If you want them, you have to add them via library classes etc., and they may still be limited in terms of concurrency management, ACID, etc. Although some OO languages may include such features; that by itself does not make them "OO features". They could simply be languages stealing features from other paradigms to make up for what they lack. I'm not sure what you want. There is currently no precise consensus classification system. That's life. Deal with it.

You've confirmed my point: "DB-like" is meaninglessly subjective. However, searching and sorting are pervasive. I presume that's the source of your "DB-like" label, but it doesn't mean every language requires -- or even that many languages require -- built-in syntactic support for tables, collection-handling, multi-user concurrency, ACID, or other facilities typically found in DBMSes. That's what database languages like SqlLanguage, TutorialDee and ExBase are for. It's what libraries can be for in C/C#/C++/Java/etc. But there is a world of applications for which OO, functional, and logical languages are appropriate without benefiting from a single reference to tables, multi-user concurrency, or ACID. Furthermore, tables, multi-user concurrency, ACID, and any number of other features typically found in DBMSes may be present along with (say) MultipleDispatch, but MultipleDispatch need not use any of them, nor would it benefit from doing so.

Further, there's a question of whether source-code-based predicate dispatching is easier or harder to use/manage than database-based predicate dispatching, regardless of how one classifies it.

Feel free to provide a side-by-side comparison of both.

Most of it would involve personal WetWare descriptions, which you would probably balk at. Maybe some aliens are very skilled at BrainFsck, but that doesn't necessarily extrapolate to humans. I don't know what interfaces with YOUR WetWare better. I cannot test your head; at least not without bashing it open and putting your neurons under a scope. You are welcome to volunteer. I have power tools ready.


Under the NygaardClassification, OO primarily controls where data and decisions and control may flow. An object that receives a message can only query objects it already knows about, or those named in the message. The actual model for making decisions doesn't need to be OO. For example, ErlangLanguage fits nicely in NygaardClassification, and uses LogicProgramming-inspired code to decide how to progress with with each message. Polymorphic dispatch is what allows me to transparently replace one service with another, support forwarding and revocation patterns, et cetera. Due to the nature of popular implementations, "ObjectOrientedProgramming" tends to connote passive objects (as opposed to ActiveObject, ActorsModel, or AutonomousAgent) but that connotation isn't critical to the NygaardClassification.

Under the OneResponsibilityRule and OnceAndOnlyOnce, OO programs migrate towards a DivideAndConquer strategy with configurable delegation: an object will take a big problem, break it into a few smaller problems, delegate the smaller problems, then somehow combine or coordinate the solutions. For example, if part of the problem requires computing a grocery list, you might delegate that problem to an object that happens to be an expert system - though, all you need is a proxy interface that will return your list. The need for delegation creates a need for DependencyInjection. The fine granularity of objects results naturally from decomposing the problem with a DivideAndConquer strategy. I suggest: decomposing a difficult solution into smaller objects with dependencies indicates whether you're doing object oriented programming - as opposed to merely programming in the presence of objects.

One doesn't use polymorphism to compute the grocery list, and it would seem quite silly to even try. Delegating responsibility to compute the grocery list is legit. Encoding the rule for selecting warm foods on cool days in the dispatch mechanism? that's too damned many responsibilities for one language feature.

Now, the other camp might disagree. They might point me at MultipleDispatch and PredicateDispatch, and rules that say "on Wednesdays, questions about grocery lists get received by this procedure instead, unless it's a full moon...". Ugh. OopNotForDomainModeling, please. IMO, MultipleDispatch is a RulesBasedProgramming? feature, not OOP.


See: DataAndCodeAreExchangeableExamples, CodeAvoidance, EverythingIsRelative, DatabaseIsRepresenterOfFacts, ProceduralMethodologies, SeparationOfDataAndCode, DeclarativeProgramming, BusinessRulesMetabase


EditText of this page (last edited November 4, 2014) or FindPage with title or text search