Oop Biz Domain Gap


OOP may have a "business domain" gap, meaing that it either does not provide significant benefits for business applications, or that how to use it for business applications is poorly documented and/or poorly understood. The evidence:

(Note that I tried to word this topic as a question to make it sound more open-minded, but it came out ugly in wiki-speak. If someone thinks they can fix the name, they are welcome to present suggestions.)

--top


I think the reason systems software seem to work more smoothly with OOP is that with systems software engineers can control the model more and one can engineer the model to fit OOP in order to take advantage of polymorphism. With custom biz apps, the engineers cannot effectively control the business rules, and thus the business rules appear rather random and chaotic from the engineer's perspective. Set theory and feature-sets work better in such an environment. (Related: GranularityOfVariation) --top

OO may be better for certain domains where engineers control the "rules" and protocols involved instead of marketers, business managers, and politicians. Engineers tend to make cleaner/tighter taxonomies/classification systems than the others, and OO works better under such constraints. Procedural/Relational is superior when the domain classification systems are complex and/or messy. (A snippet of thought I plan to integrate and extend later --top)


In terms of using OO to model the business entities themselves, I'm inclined to agree. I've not found it particularly helpful to use OO to create persistent Customer, Employee, etc., classes, though I have tried this on some projects and achieved varying degrees of elegance, re-usability, maintainability and so forth. Though advocated in textbooks and various modeling techniques, I believe this approach is naive; it treats enterprise information systems as business simulators, rather than what they really are -- fact processors. When used with SQL or relational databases, it also tends to make DateAndDarwen's FirstGreatBlunder, the erroneous equivalence of relations (or, more accurately in the case of SQL, tables) and classes.

Therefore, I prefer to create UI and/or presentation and/or processing-engine classes for manipulating collections of facts -- typically query results or source data for database updates -- rather creating entity-classes. On a given business application implemented in an OO language, there is likely to be a collection of derivations from a Report base class; a collection of derivations from a base Form class plus instances of DBMS-aware form widgets; instances of other UI elements such as menus; classes defining various calculation or business-rule engines such as schedule generators, payroll generators and invoice generators (these tend to be the biggest part of the application); instances of various classes for managing database connections and constructing database queries, and so on, rather than a collection of strict analogues of real-world entities. These I model in the database. Individual Customer, Employee, Invoice, etc., instances -- such as they are -- only have a transient lifetime within the application, typically as attributes contained within a Row instance. But that's just my approach; I don't deprecate any approach that works as long as there's a clear rationale for it when a rationale matters.

In a sense, I suppose my approach turns a business application problem into a systems software problem. -- DaveVoorhis

I do agree in that the few places I've seen where OOP may help a little is in preparing information for display or transfer, not modeling the business objects themselves. I'd be interested to see your designs for such utilities. I've generally moved toward HelpersInsteadOfWrappers. A generic utility, or even a base class, is not sufficient in my observation. Each app has a different flavor such that you don't know what to keep or scrap until you actually use the utility. A generic one would become a feature packrat. I used to strive hard for genericness, but eventually realized its futile as the future kept re-surprising me even after years of experience. CRUD may seem simple on the surface, but the potential combinations of all potential features can be enormous.

My base classes define (for example) generic, standard reusables such as database-aware widgets (the usual buttons, comboboxes, text boxes, listboxes, checkbuttons, etc. -- generally as wrappers around Java/Swing widgets or HTML-generators, depending on the targetted platform) plus navigation/search/sort widgets for forms, and equivalent print-widgets for reports plus facilities for columns, grids, headings, footers/subtotals, pages, sections, etc. These remain relatively constant and are the building-blocks of just about everything; the need to create a form or report component that is truly new and can't be inherited from an existing widget or class is relatively rare, and even then generally makes use of some of the generic classes.

Because I'm still in the (D)HTML world, sometimes I need to use JavaScript to try to coax real-GUI-like behavior out of a form. This often requires goofy tweeks and a compromise may have to be custom-designed because JavaScript may not work right in a given vendor's browser. (JS+DHMTL is a fricken mess and I miss real GUI's. It takes hacky rocket science to do what was 3 clicks in real GUI's.)

I do find a surprising amount of variation pops up in handling relationships between sub-forms, i.e., the usual master-detail, master-master-detail, master-detail-detail presentations and so on. In the early days I assumed I could trivially build generic tools that would handle every possible multi-subform scenario, but twenty years later I still haven't nailed it.

Another issue is that one usually does not need multiple simultaneous instances of say a report formatting util, and thus don't need a "handle" (object). One is usually doing only one at a time in the code. If you are doing more than one at the same time, such as generating two reports at the same time, then usually there's a design problem. Sometimes an intermediate structure(s) can be used to avoid having multiple such things going on at the same time. Collect the data-sets first, and then display them. (Parellelism to speed thru-put is another issue.) It also simplifies error handling and debugging. (Related: SeparateIoFromCalculation). I used to use legacy Fortran libraries for graphs and charts, and the biggest difference between those API's and OO API's is the assumption in Fortran apps of doing only one at a time, reducing the need for instance management (a few API's did have handles, though, so multi-instance was not out of the question). I don't think single-instance is a limiting assumption in the vast majority custom biz apps. Now, systems software may be another animal in that regard. --top

It's true that you usually don't need multiple simultaneous instances of a report formatting utility, but then you get a user who needs to be able to print invoices on demand while the monthly statistics report is churning out in the background. Or, the originally-Singleton schedule generator now needs to simultaneously produce both a staff schedule and a client-meeting schedule. I.e., exactly that parallelism issue you mentioned, which I wouldn't consider a design problem. Or, you (or the user) decide it's of value to allow sub-reports within reports. Since you've got to store the report state (or the schedule generator state) somewhere, why not define a Report (or whatever) class to hold the report state, and get the potential for parallelism and nesting (i.e., sub-reports) for free? I find instance management to be a negligible issue, especially in languages that support garbage collection.

I am not sure what you mean the by the scheduling thingy, but If I was using a client-centric tool, I'd spin off an independent process (window group, form group, etc.). One can communicate between them using either file polling or tables. Same with web browsers: spin off an independent window and treat it like a separate user session.

{Sounds to me like object-oriented at the process granularity.}

I suppose one could view it that way, although its not the only way. Possibly related: EverythingIsAnObject, EverythingIsa


A draft of an article I plan to put on my blog:

"Why OOP Fails Domain Modeling"

http://groups.google.com/group/comp.object/browse_frm/thread/853fca22ded31c00#

--top

As suggested by your first respondent on comp.object, it might be helpful to more clearly distinguish between types of domain, and definitively identify the scope of your criticism. OO may indeed be inappropriate for creating database-driven information systems when it is misused by creating classes to mirror domain entity types, i.e., if a business domain model is created in an OO language and then persisted by a relational database. However, for applications that are effectively simulations (and in a sense, a GUI widget set, DBMS, report generator, or other computational artifact can be considered a simulation of an abstract machine), OO appears to be well suited in a way that a relational DBMS -- which is a fact processor, not a simulator -- is not. OO appears to be particularly well-suited to creating games, for example. I suspect it would be much more awkward to create PacMan using an RDBMS than pure Java or C++, though a well-integrated implementation of relations, tuples and the RelationalAlgebra (sans persistence, etc.) could make an excellent way to manage run-time collections of class instances.

For GUI's and games, I agree up to a point. But when the game or GUI grows beyond a certain complexity such that there are too many interrelated entities and too many instances to keep strait, then a database-like approach becomes more appropriate, especially for debugging. Debugging via pointer hopping can be a pain in the arse. NavigationalDatabases were already tried heavily in the 60's and 70's and eventually rejected for relational.

I would rather ask, "show me all enemy creatures with an energy level less than 30 having weapon X that are still allowed to cross bridges" than iterating via pointers. You cannot tell me that is OO's strength with a strait face. Many-to-Many relationships are especially ugly in OO. With a database you get built-in CollectionOrientedProgramming. With OOP you have to roll your own (or inherit from a library that proves GreencoddsTenthRuleOfProgramming).

--top

A game at run-time is overwhelmingly behaviour-oriented, the number of entities may be large but is almost invariably finite and predictable (and, hence, implicitly manageable), ad-hoc queries are typically non-existent and so can be effectively hard-coded, collections are typically trees or vectors rather than set-like, and persistence typically consists of serialising the entire live environment rather than iteratively (and unpredictably) saving/restoring portions at a time -- except in MMORGs and the like, though these are arguably a special case. These general factors are a good match for OO programming, especially as the interactions between game entities (represented by class instances) are effectively modelled via method invocations. Debugging is definitely an issue (as it is in any complex system, pointers or not), but one that is dramatically reduced via TestDrivenDevelopment. As I noted, I would not preclude using the RelationalModel to manage run-time collections -- indeed, I feel it might be extremely effective -- but I would be hard-pressed to drop an overall OO approach when, in numerous projects and in my own experience, it has proven so effective in this domain. I can see no clear overall benefit to using an RDBMS to achieve the same thing, and plenty of down-sides.

Regarding "behavior-oriented", I am skeptical. People are just *used to* thinking of it behaviorally. A lot of it can be declarativificated (ouch) if one is used to such techniques. Or perhaps a combo of declarative and event-driven, like a good GUI system. The events typically are small snippets in GUI's, but a database could be used to keep track of the events. If you see a common pattern to the events, then you can create declarative versions of those, but still hand-code the less common ones. It would be roughly comparable to MicrosoftAccess macro's. They paramaterized about 150 or so common CRUD behaviors so that one does not have to code them. (The actual implementation of them often stinks, buts thats another issue.) Some allow WHERE-clause filters so that it is half parameterized and half programming. This may remind some of PayrollExample. A game example may resmemble:

  // Triggers battle mode based on primity
  Trigger-type: Battle_Mode_At_Proximity
  DistanceMeters?: 100
  EnemyCategories?: Dragon, Wizard, Horse  // any of
  UserCategory?: [blank]// blank = all
  Filter: userEnergy < 300 and Not SpottedEnemy?

There are certainly games and game development systems that have defined high-level languages to simplify manipulating the game primitives, but the underlying architecture is still OO -- modulo those games written in C/Assembly by developers fearing the performance hit of OO vtables and the like. This does not, however, deprecate the clean match between OO and simulation-oriented systems. It's worth noting that the first OO language, Simula, was designed specifically for creating simulations of event-driven systems.

Yes, but they dealt with hundreds of things instead of hundreds of thousands. I agree that the performance of a typical database is probably too unpredictable to be useful in an action game. (Related: AreRdbmsSlow) As databases get faster and cheaper, it may be that people decide to start to consider them over OOP, however.

The number of things is rather irrelevant here. I find your implication that OOP and RDBMSes are somehow interchangeable to be a bit odd; they are distinct technologies -- like hammers and screwdrivers -- despite whatever confusion in this area the notion of OO databases might have, at one time, inspired. The RelationalModel is certainly not a fundamental computational paradigm, any more than OO is. The RelationalModel is merely a way (and not the only one) of algebraically manipulating collections of values. OO merely provides a means (and not the only one) for defining encapsulated values that support polymorphic operations. Viewed this way, they are complementary. And, like the hammer and screwdriver, they are merely tools. Choose the right one for the job at hand.

Is "polymorphic operations" a result or a technique? I view it as a technique, one of many. Set-based operations can often replace polymorphism, and often more flexible at the same time. In other-words, polymorphism and set theory are both competitors to managing "variations on a theme".

"Polymorphic operations" are a characteristic. I certainly wouldn't deprecate set-based approaches to programming; SETL was one, and ExtendedSetTheory implies another. However, as the RelationalModel is merely one specific derivation of set processing, intended for managing collections of values, I have some doubts that it can demonstrate the same generality as OO without significant extensions -- which, intuitively, I suspect would turn it back into a general set processor. I've made some tentative stabs at turning this into an avenue of academic research (based on a general-purpose implementation of ExtendedSetTheory), but it's far, far too early to draw any conclusions.

The goal of a database is not to be general. It's a tool that rolls up common attribute- and collection-handling idioms into a standard package. Further, it does not have to be general-purpose to overlap with OOP. A general-purpose tool will overlap with everything, but that does not necessarily make it the best tool/language for the job. BrainFsck is also general-purpose. Ideally we'd meld the best of both, but the philosophies tend to contradict, making it difficult. Thus, we are forced by contradiction of philosophy to more or less pick one over the other, or duplicate functionality.

True, which is why I suspect the RelationalModel with TuringComplete extensions (as in TutorialDee), may not be as effective or powerful a general-purpose programming environment -- or as elegant a paradigm, perhaps -- as one based on ExtendedSetTheory. Of course, this would not preclude incorporating the RelationalModel (it can be trivially implemented in ExtendedSetTheory), but other models (perhaps some that are not yet defined) can be seamlessly incorporated as well. In short, ExtendedSetTheory might provide a unifying model over OOP, the RelationalModel, functional programming, typeful programming, and general set processing -- thus providing the non-contradictory melding of philosophies that we both seek.

I find if one de-emphasizes types, then mixing relational and procedural is fairly easy and they compliment each other well without fighting over territory.

The apparent need to deprecate types might be eliminated under a unifying model like ExtendedSetTheory. Furthermore, I suspect you are in fact doing typeful programming, but are either implicitly relying on the type system facilities of certain high-level languages (and therefore aren't recognising their typeful nature because the language designer and implementer have done the heavy lifting for you), or you are effectively defining types via procedures that manipulate internal representations consisting of strings and/or integers.

This probably gets back to WhatAreTypes, and you really don't want to spark up that debate again, trust me.

You're right, I don't. Best we end this here, at the end.


RE: A draft of an article I plan to put on my blog: "Why OOP Fails Domain Modeling" (http://groups.google.com/group/comp.object/browse_frm/thread/853fca22ded31c00# --top

In this draft article, Top says: "There is one big difference between most CA's and DA's: volume."

I do not believe this is the fundamental difference, which really comes down to reflection vs. projection. You might see DataManipulation and ObjectVsModel for examinations on this subject. In short, computational abstractions essentially create functors, pipelines, dispatchers, visitors, workflows, etc. then hook them together like a big RubeGoldbergDevice? that will take your inputs (which could be represented as objects or values), do stuff, then produce outputs (also represented as objects or values, plus side-effects). When instantiated, these objects possess a 'real' existence within the system, forming a machine that can then process inputs to produce outputs. This, by nature, is entirely projective. Domain Objects, however, tend to be reflective - the object is outside the system, slightly out of sync (due to physics laws), and not directly accessible. Attempting to interact with a domain object by manipulating their reflection is just silly, a bit like attempting to capture the moon by swallowing its image in the lake.

There is very simple reason that Domain Objects favor a good DBMS. By nature, a reflection falls out of sync, so it must be kept up-to-date. One can call 'sensors' those services that keep everything up-to-date (which would include data-entry people), and 'subscribers' those that need up-to-date data (including those who would perform ad-hoc queries). In this system, a CRUD screen would be both a sensor and subscriber. Suppose there are M sensors and N subscribers, then there would be MxN integration effort to keep everything up-to-date. With a middleman - a DBMS - this effort is reduced to M+N... and each needs to only integrate with the DBMS. This makes it far cheaper to add, remove, modify, etc. sensors and subscribers, making the whole system more modular. Of course, any decent DBMS (and even many indecent ones, like filesystems) provides these advantages. The choice of an RDBMS over some other DBMS would be selected based on various NFRs, such as ease of creating queries, demand for latency, support for delta-isolation, and so on.

Anyhow, let's compare this situation with the storage needs for Computational Abstractions... along with the related modularity and coupling demands. First, there aren't any inherent 'sensors' because CAs are projective - they're creating reality, not reflecting it. However, there are 'manipulators' that can change this reality, and systems that can subscribe to it. Now suppose you add 'M' manipulators and 'N' subscribers to manipulate and view these CAs in the underlying model, complete with pre-built queries and triggers for useful little views, possibly even performing critical additional services that feed back into the original system. Neat, eh? You regularly point out such AdvantagesOfExposingTheRuntimeEngine?. But there is a cost: you now have M+N couplings preventing you from changing or improving the computational 'model' you built to solve the problem... where without these couplings there were none at all. Essentially, you are now coupled to implementation rather than to interface. This makes the system very fragile and more expensive to change. This implementation-coupling and its harmful effect on cost of change is among the DisadvantagesOfExposingTheRuntimeEngine?.

The difference in impacts here is significant when making judgements about what should go into the computation components and what should be part of the domain object model. However, one can usefully compromise if one is willing to accept a low rate of change for certain parts, such as items in a scene-graph. Doing so effectively creates a standard, allowing plug-in components to interact in a well defined manner. This may be a little counter-intuitive, but with Computational Abstractions, models with a high rate of change should stay OUT of a shared database in order to avoid coupling to premature choices of computational abstraction. Encapsulation should be favored where implementation coupling should be avoided.

With language or library support for component-specific databases and tables, the above distinction goes away. The issue above is one of coupling, and coupling can be avoided by not sharing the database. So there are ways to get the best of both worlds... it 'just' requires better language facilities than are readily and cheaply available today. Of course, JustIsaDangerousWord, especially if anything resembling SymmetryOfLanguage is to be achieved. Ideally, such tables can store arbitrary data types from the language, or at least a significant fraction of them, in order to avoid all the AccidentalDifficulty associated with your choice of either parsing/serialization or value composition/decomposition/collection when interacting with the tables. (Yes, relational DOES require support for types if you're interested in achieving both high performance and ease-of-use.) But nobody on this page would reject solutions that internally applied more 'table' components. Inside the machine, tables are another tool, another Computational Abstraction... a rather flexible and useful one if done right.

As a note, there is an category that one might call Computation Domain, containing objects that are outside the system and immediate control but are being interacted with rather than reflected by it. This includes files, services, hardware, processes, etc. There is some question of whether these things should be modeled, since you don't need to model that which you can directly interact with. My own conclusion is that they should be, in order to support better abstractions, mock-objects for debugging, transparent distribution, and transparent persistence. But it is worth pointing out that most OOP does a poor job with it because they don't support asynchronous message passing and immutable message objects. ErlangLanguage, which is not OOP but has FirstClass processes and abstracts hardware as such, does better than many OOP languages for this subject.


See also SemanticGap, XpSemanticHierarchy, OoBusinessExamples, DomainNicheDiscussion, ComputationalAbstractionTechniques, OopNotForDomainModeling

NovemberZeroSeven and again SeptemberZeroEight

CategoryBusinessDomain, CategoryEvidence, CategoryObjectOrientation


EditText of this page (last edited March 13, 2012) or FindPage with title or text search