Scaling Oop Discussion

(moved from ArgumentsAgainstOop)

On a small scale, OOP is over-kill for BBR [Binding Behavior to References]. Procedural techniques usually work fine (at least with decent dynamic features). On a very large scale, databases are superior in my opinion, largely because they provide ready-made features to manage big collections of semi-similar stuff. There's a middle ground where OOP may have a slight edge, but this creates two problems: 1. How do we scale up to the DB when the complexity or size grows to exceed this middle area? If we start out with a DB for the middle ground, scaling is far easier than switching paradigms. 2. The confusion of mixing and integrating multiple paradigms. - t

[I'm afraid I don't follow the above, and I've been developing database-driven business applications using object oriented languages on the client-side, plus developing non database-driven object oriented and "pure" procedural applications, for over a quarter century. You appear to be treating databases as an alternative to OOP, which makes as much sense as using houses as an alternative to fish. Databases are collections of data; OOP is (usually regarded as) procedural programming with explicit language support for encapsulation, inheritance and polymorphism. How can you treat one as an alternative to the other? Do you mean to say that relational DBMSes are more scalable than OODBs? That would make more sense, though that's a debate that -- for various reasons that are irrelevant here -- hasn't been worth arguing about for over a decade.]
- It has been alleged that RDBMS and/or relational "don't scale" in an objective manner, and that OOP "fixes" this. I'd like to see the specifics on that claim rather than get caught up in categories and vocab if possible. Note that there's already (long) topics discussing whether relational and OOP fight over territory. I believe they do because either the primary domain noun model is in the RDBMS, in classes, or in both. The latter is a violation of OnceAndOnlyOnce, and thus should not be promoted as the ideal.
- [How could the RelationalModel and OOP possibly fight over territory? How does the RelationalModel support polymorphism? How do you write a 3D game, or any program for that matter, in the RelationalAlgebra? Or SQL? I could write one in TutorialDee, but that's a general-purpose language which absorbs DBMS functionality and melds programming with data management, rather than the usual ungainly linking of disparate technologies via communications gateways like ODBC. Or are you, in fact, attempting to claim that the RelationalModel implies a superior mechanism for handling collections than the typical OO language's container classes? That would make some sense, but I see no other intersection between the two. Do you?]
- One can write say 90% of an app in a query language and ControlTables, with some kind of app or scripting language handling the rest, acting as a kind of "glue" language. The best balance of percent of query code to app code depends on the situation. Many MicrosoftAccess apps are written without a single line of VBA app code (although I don't like MS-Access's approach to it and conventions, but it shows it can be done in practice.)
- [Use of control tables is not paradigmatic, it is merely leveraging the capabilities of certain application development tools to create classically "data driven" applications. Control tables are no more a feature of the RelationalModel than GUIs are a feature of C++. This is not evidence of any "fight" between OOP and the RelationalModel; at best it may warrant some minor debate over best practices in certain development tools. The only room for a debate of OOP vs the RelationalModel (quite distinct from OODB vs the RelationalModel!) is at the point where OOP and the RelationalModel may intersect, that being where the RelationalModel may be used to handle collections instead of the canonical OO language container classes.]
I'm assuming that TopMind refers to a stratified system, where independent procedural processes or devices interact with a common Relational system. He promotes this relational + procedural design on a regular basis.
- [Any idea why he does this? Promotion seems a rather pointless endeavour, especially compared to rational & objective evaluation. Promotion is the brother of salesmanship and a close cousin of zealotry, hence it is doubly un-good.]
- Because I believe OOP hype is ruining the industry by gutting data tools, knowledge, and philosophy; trading the influence of set theory in for mass tangled navigational object graphs, the new GoTo. -- top
- ["Mass tangled navigational object graphs"??? Under what circumstances? I've never seen it. I know of no OO programming technique that requires in-depth examination of run-time relationships between multiple objects, except perhaps very rarely at debug-time. Even then, in 25 years of OO programming I can't recall a circumstance where I had to do it (during debugging) on more than an instance or two. Furthermore, this treatment of OO as some sort of opponent to the RelationalModel is strange. There is no competition between these -- OOP is comfortably used to handle client-side (and sometimes server-side) programming, with the DBMS retrieving and storing facts. These are complementary paradigms, rather than competitive. There was, at one time, a certain competitive aspect between OODBs and RDBMSes, but that's long gone now.]
  - Run-time graph messes are only part of it: the relationship between classes in the code can also be messy. Relational offers more discipline and more tools to study inter-related "structures".
  - Or one can just write properly polymorphic code. OOP works best if you only ever use pointers to Interfaces, and follow the DependencyInversionPrinciple.
  - Polymorphism is often limiting because the granularity of variation between "instances" of real-world things that I see is smaller than a practical method size. Using sets of features is a better and more query-able way to manage variations on a theme in my opinion. Anyhow, see below with regard to what "proper OO" is and which to use as the reference of debate. - t
  - OOP is not about representing real-world-things as objects, except perhaps as proxies to devices or humans. It also isn't about sharing implementation code.
  - [True. If you're in the business of creating simulations, maybe it's appropriate to represent real-world-things as objects. Modern practice uses objects to represent computational machinery for manipulating information about real-world-things. Remember that except for industrial controllers and the like, computers are information processing machines, not real-world-thing processing machines.]
- I'm in partial agreement with TopMind on that subject. OODB's are still around. I see them a lot when working with SceneGraph designs, and occasionally as a 'cache' for high-performance 'world models'. DocumentObjectModel, for example, is a nightmare mongrel of OODB and OOP that should be shot and buried in Stephen King's pet cemetary. VisitorPattern is still common - there is no need for VisitorPattern in OOP except to handle an OODB. Also, languages like SmallTalk, with their 'EverythingIsa' object philosophy - which forces even messages to be objects - tend to force programmers to build 'object graphs' to represent complex messages. That simply steers them in the OODB direction - which, I believe we all agree, is a bad direction. Erlang does better: messages are immutable values, and objects are FirstClass processes that may react polymorphically to a message or delegate it (encapsulation + polymorphism + delegation + ActorsModel + NygaardClassification -- Erlang is more OO than some OOPLs). One may pass an object-reference in a message, but that's for protocol (i.e. establishing connections, handshakes, a reply-to address, etc.) - one simply doesn't use object-graphs for data because there are easily accessible and high-performance alternatives. OOP shouldn't be blamed for the flaws of OODB, but OOP 'education' could better discourage OODB, and OOPLs would do well to raise some barriers to accidentally producing OODBs and lower accessibility barriers to alternative data management. So this problem does exist. But I think TopMind sees the side-effects of OODB and blames it on OOP, which irritates me enough to reply.
- [Perhaps I've been unclear: I don't disagree that OODBs are still around, only that there is no significant debate to be sustained over OODBs vs RDBMSes. Whilst inappropriate application of one or the other will always happen, there is no longer any serious, considered argument that OODBs should categorically replace RDBMSes.]
- Everybody has a different view on what OOP is and should be. It is difficult to criticize a moving target without some collateral damage. Sometimes Afghan civilians get hurt in the war.

I'll need to disagree with you here. On the very large scale - by which I mean DistributedSystems - objects are primitives; they are a basis for distribution, for security (ObjectCapabilityModel, access control), for LiveProgramming and persistence, and so on. On smaller scales, TotalFunctionalProgramming and LogicProgramming (including relations/databases and queries, DataLog style) work very well, offering strong opportunites for optimization (due to strong properties like guaranteed termination, independence of evaluation order, etc.). In the middle-layer, DataflowProgramming and FunctionalReactiveProgramming act as 'plumbing' between objects and services (i.e. MultiCaster is included here; related AlanKayOnMessaging). It is infeasible to create a single-Database-to-rule-all-services. It is a natural consequence of politics and technology that, on very large scales, long-lived data (anything longer lived than a message) is managed in hundreds of different databases - ranging from small databases in sensor devices to large databases in corporate clouds.

Putting the Database in the "middle layer" is a mistake, on the very large scale, because it requires pumping data to a common database, which means systems at the edge must know about a common database - i.e. carry a reference to the appropriate DatabaseManagementSystem?. Not only does this introduce management and modularity challenges; it also introduces security challenges, since you can't easily grant access to just a subset of data sources without introducing expensive filters. More practical is to pipe in the other direction: construct a 'virtual' Database by 'subscribing' (DataflowProgramming / FunctionalReactiveProgramming) to many smaller databases, along with performing any processing or transforms. An object might provide a reference for mutating a database, but the data itself can be accessed via SideEffect-free FunctionalReactiveProgramming, which allows many optimizations for very-large-scale multi-cast networks (up to exponential performance, bandwidth, and space savings). This is more secure, more modular (by which I mean it has better distributed management and partial-service sharing without violating security), and makes databases/LogicProgramming a suitable subordinate to both OOP and its lower-plumbing-layer FunctionalReactiveProgramming.

DistributedSystems programming is my passion. I have looked, but I do not see a practical alternative to OOP on the very-large scale. I've done much work on DistributedTransactions and distributed DataflowProgramming / FunctionalReactiveProgramming to make programming this very-large-scale ever more feasible. Of course, distributed OOP itself may serve as a substrate for even-higher-level programming (i.e. MetaProgramming - describing distributed applications via LogicProgramming or FunctionalProgramming - thus completing the cycle).

I'm not an OOP fanatic. I am, however, an OOP expert: I know the good, the bad, and the ugly when it comes to OOP and its most popular implementations; I've worked with OOP for over ten years; I keep up-to-date with OO design patterns (though I believe DesignPatternsAreMissingLanguageFeatures), and I contributed much to ArgumentsAgainstOop. I've known a few OO fanatics, so I know what you're fighting against, but you seem to assume anyone who defends OO is an OO fanatic. I think it ridiculous to assume use of OOP means rejecting other techniques, yet your own arguments tend to all have that as a basis: assume relational or functional or whatever is mutually exclusive with OOP, then argue that OOP is bad because some feature might be better performed by alternative paradigm <pick one>. I also feel you often blindly ignore the larger scale; e.g. procedural access to a named database (remote reference) as opposed to a 'global' database is fundamentally an OOP technique, and processes are objects at the level of the operating system, and so on.

Your statements are too general for me to inspect. Can you provide a specific example or scenario that doesn't require too much domain background education for the reader? Otherwise, they appear to either be WalledGardens or plugs for pet technologies. I agree there may perhaps be domains where existing RDBMS cannot handle, but they are not documented here.

No. (a) YouCantLearnSomethingUntilYouAlreadyAlmostKnowIt, (b) therefore, I can't know what you're missing until I already almost know it, (c) therefore any example I might provide is as likely to fly well above your head as it is to be too trivial for you to grok the distinction you need. If you want clarification, you'll need to ask the right questions to milk it from me, and you'll need to detail scenarios that give me some clue of what you're failing to understand.

And there is not even ONE domain RDBMS can handle on the 'very large scale'. To say otherwise strongly implies we could put everything (strongly everything - across all organizations) associated with a common domain into just one RDBMS. This conflicts with communications, technology, and political requirements of every domain I can imagine. ReductioAdAbsurdum: it must be the case that RDBMS does not handle even one domain (at least not any I can imagine) on the 'very large scale'. Why should I document a list of every domain I can imagine? On the very large scale, we use multiple RDBMS's - i.e. different RDBMS's for different data management and security domains. The moment you have two or more RDBMS's in the world, you need something on the larger scale (above RDBMS) to distinguish, reference, and access the different RDBMS's. Use of references tied to different RDBMS's suggests OOP. QED.

You are the only one I know of that claims that OOP "scales better" than RDBMS. It's not a claim I wish to dig into right now due to its low popularity/commonality and due to the difficulty of getting specifics out of you. Maybe another day. By the way, semi-distributed RDBMS are fairly common. For example, store branches usually have their own independent RDBMS that feeds info to a "master" or "HQ" DB during the night or asynchronously. Sometimes the HQ system copies everything, sometimes only a subset. Unique keys are often the store ID plus a local counter. This approach has drawbacks, but generally works in practice. I invite you to show how OOP can improve on this model. -t

RDBMS typically distributes via mirroring and caching, which is especially useful for read-only interactions (LDAP). Any object can distribute in this manner (EeLanguage calls it the 'unum pattern'; OpenCroquet is based on it). The mechanism you name for RDBMS distribution isn't anything special, but doesn't cover the case where: (a) different objects are on different machines, (b) distributed 'data' in the sense that there is no "master" server/HDD. I have worked with 'truly' distributed RDBMS database data before (i.e. where five computers each hold one third of the data, overlap for double-failure redundancy). RDBMS performance scales poorly for joins and complex queries in such cases: if you need to distribute the data, you're already in serious trouble because you won't have enough space to perform a join! This is why almost nobody does it; it's much easier to buy a bigger RAID drive, and perform regular backups and mirroring.

These may be implementation problems/limits, not "paradigm is bad" problems. Hard to tell without the requirements details.
The requirements for RDBMS are well known. Feel free to look them up on your own time. Perhaps, after learning those details, you can return to the problem of distributed RDBMS and rediscover what has been known since the 70s.
I meant the test/sample application requirements. This should have been obvious such that your question puzzles me. And RDBMS "requirements" are generally a minimum. It does not necessarily limit improvements and extensions. This is important when distinguishing between conceptual flaws/limits in relational and implementation flaws/limits in a specific RDBMS product.
Am I to understand that, in your mind, a 'scalable' RDBMS only needs to support exactly one application? Because, to me, 'scalability' includes growing to many simultaneous applications, and adding new applications over time (scalable modularity). And RDBMS "requirements" are quite relevant when you're the guy inventing a 'scalable' RDBMS.
Are you being uncreative on purpose? If lots of apps are the issue, then create a sample with lots of small apps that bust whatever it is you claim gets busted. Think about what gets busted, and work on creating the simplest example you can that illustrates the bustativity.
You failed to grok the relevant point. You asked for 'application requirements' as though the "scalable RDBMS" is being developed and can be optimized on a per-application basis. It isn't "a lot of apps" that are the issue; it's "Oracle can't know the apps in advance of development of the RDBMS" that is the issue. You must not depend on "application requirements" when analyzing whether RDBMS scales. Doing so simply guarantees your analysis was incomplete. You must instead analyze from the requirements for the RDBMS. As far as fingering me for not providing examples: examples are difficult to provide, difficult to simplify, and in the end prove nothing and get picked at or dismissed with a wave of the hands and "it could just be implementation issues, not a problem of RDBMS in general" as though specific examples should be expected to demonstrate generalities. That seems like an incredibly inefficient and foolish way to communicate with you.
English is usually not sufficient for these kinds of things. The devil's often in the details and English cannot provide them without looking like code.

As far as suggesting I'm the "only one you know" who thinks OOP moves better than RDBMS on the very large scale: Ponder the relationships between SOA and OOP. Peruse the motivations behind ErlangLanguage, and concern yourself with why MnesiaDatabase is subordinate to the process objects. I've never found anyone who thinks RDBMS scales better than OOP on the 'very large scale'. RDBMS scales well, certainly better than certain OOP implementations. But on the very large scale, it's all services and processes communicating via messages across named references, and always has been. On the broad scale, a given database in a given RDBMS is one object instance.

So, how can OOP improve on a 'distributed global RDBMS'? I named how earlier, in the discussion that was 'too general' for you - security, distributed management, and the various other features associated with having distinct objects. A specific case of improvement: a sensor device keeps its own RDBMS and doesn't need to feed or query the global RDBMS. Suppose for a moment that the contrary was true: you cannot name RDBMS objects, there is only one global RDBMS in the entire world, and all code in the world references this global RDBMS. Each domain gets its own set of tables, and all sensor devices store data to these tables: every camera in the entire world stores photos to the global 'photo' table, for example. I invite you to consider how well this scales in terms of a few common SoftwareEngineering requirements: performance, security and secrecy, and safety (including disruption tolerance).

The code may not "know" whether there's only one or not. And it largely depends on how the RDBMS manages name-spaces. If we need more indirection, then add more indirection to the naming system. It's not an inherant in-born fault of the relational paradigm. And what do you mean by "each domain"? And you have not stated why the cameras "must" use one central DB. It's still unclear exactly what you are trying to achieve. I cannot "build" a system without a requirements document.

Logically, there are exactly two possibilities: (1) there is a global, implicit DB. (2) there are many DB's distinguished by some feature - which will essentially be a 'name' or 'reference'. In case (2), you have already admitted a higher scaling factor than the DB - in particular, you are now using names to distinguish 'DB objects' and scale upwards. Thus, you may not logically use case (2) to argue that RDBMS scales better than OOP; attempting to do so is utterly counter-productive and logically inconsistent. Therefore, you must (to be logical) use case (1): there is a global, implicit DB. It doesn't need to be a "centralized" DB. (I never said "centralized" RDBMS above, did I? It could be a distributed RDBMS.) But it does need to be a common DB to every bit of running software in the whole damn world. Anything else is logically equivalent to admitting objects scale above RDBMS.

Without specifics, I cannot verify whether named "DB objects" are really needed. The devil's in the details. I know you hate specific examples, they go agains your grain, but they can greatly speed up communication.
TopMind, I know you hate logic - because you really, really suck at it - but there is no logical reason for you to bother "verifying" whether you "really need" DB objects. I 'invited' you to pursue how far you can get without DB objects. Just assume you can get by without them, and ponder the relevant questions of scalable security, modularity, management, performance, etc.
Re: "I know you hate logic..." - No, I just hate convoluted presentation of alleged logic.

By "each domain" I mean each and every given domain you can possibly name, plus the imaginary ones, that might involve software: photography, oceanography, physical security, cryptozoology, software engineering, etc. Some domains may overlap, of course. E.g. oceanography could also use the photography tables for some things, due to domain overlap (oceanography includes photography). But any attempts to start playing with "the naming system" as you suggest is likely to just reinvent OOP inside the global DB via 'table objects', which is similarly counter-productive to demonstrating RDBMS scalability (in the absence of OOP). So I said there is some near-constant set of tables per domain. As to exactly how these tables are chosen - perhaps a panel of domain experts chooses and maintains the tables for everyone and all software in the world... but how the tables are chosen isn't too relevant to the challenge I set before you. Assume the tables for each domain are very well chosen, and work from there.

Again, without specifics, I cannot tell if it's really necessary to "invent OO in the DB". But if such is required, it does not mean that "relational fails", since the set of design possibilities for relational and for OO are not necessarily mutually-exclusive. Sometimes the solutions will converge into the same or very similar design. The issue here is whether "scaling" forces DB's to be more OO.

Since you need specifics, consider this problem: you have some constant number - say 50000 tables - to play with to store, access, and share all data for every bit of software from pacemakers to usenet. You can have namespaces, but you can't get more tables by using them. Now, clearly 50000 tables is not enough to avoid sharing... i.e. there are more than 50000 pacemakers in this world, and certainly more than 50000 cars, more than 50000 websites, more than 50000 news groups, and more than 50000 small businesses. So you'll be sharing these tables: all pacemakers use the same set of pacemaker tables. All shipping businesses use the same set of shipping-business tables. All e-mail clients and servers use the same, global set of e-mail tables. And this isn't simply the 'same schema'; by 'same set of tables' I literally mean that you can find data for every pacemaker in the entire world by performing a query unless you add a security filter (which someone will need be trusted to maintain). You can assume the schema will be well-chosen, of course, since there'll be a lot of really smart people thinking hard and standardizing it. Are you grokking so far? Because that is (in essence) what it means to say "relational can achieve the very large scale" without introducing 'DB objects'. (Allowing each small business, news-group, or pacemaker to create its own set of tables is just cheating - it isn't fundamentally different than giving each its own dedicated database-object.) Introducing 'DB objects' allows for different pacemakers to each have their own database with their own tables, but also means admitting a relational database - on the very-large scale - is just another plain-old-object. That doesn't mean relational "fails", but it does mean that objects/OO - above relational - were the key to scalability.

In theory, how the tables are partitioned, replicated, or whatever could be pushed down almost entirely to the implementor's or configurator's viewpoint/concern. The SQL query designer may not have to care, and it may be changed under the hood without the designer ever knowing the difference. (In practice there may be some performance and timing trade-off changes that affect design decisions.) If you are implying that OOP better fits the physical realm of the implementor (systems software programmer) dealing with server boxes and cables, I would not necessarily disagree. I've already agreed that SystemsSoftware is an area that OO seems to better fit than domain development. OO was invented for physical modeling, and as long as the physical parts don't hit high quantities or have to interface with something that involves high quantities, it may be just dandy there. -t

["Very", "high", "better" and "large" are ambiguously qualitative, uselessly relative, unpleasantly emotive tags that serve negligible purpose in rational, scientific discourse. Stop using them, or confine them to the pointless watercooler debates where they belong.]

You say "in theory", but which one? Will I also find perfect caches, GodRamIllusion, and a SufficientlySmartCompiler in this anonymous theory of yours? In this theory, how do you ensure the "implementor's or configurator's viewpoint/concern" will be secure, especially after you scale to multiple users and organizations? ... Also, what does "very large scale" mean to you, TopMind? To me 'very large scale' will encompass millions of machines across different users and organizations and generations. The Internet is 'very large scale'.

Again, my answer is "it depends on the specific requirements". I cannot give specific answers to generalities, I can only give specifics about specifics. Questions about whether you partition row-wise, column-wise, or both-wise (copy) have roughly analogous issues even for an OO-only solution. When dealing with physical separation of data in situations where you want to "hide" this separation from users such that it looks like one big data-set (when wanted), there's a ton of trade-offs to consider with regard to pitting time issues against bandwidth issues against integrity issues, etc. No paradigm rids the need to make these trade-offs, they only gives us tools to manage them. I generally start backwards: what does the user ideally want to see. If we cannot fully deliver it due to limited disk costs and/or the speed of light (for example), then what trade-off combinations best fit their need profiles. I cannot tell you that the users would rate recency (up-to-date) info over integrity issues, for example. Only the user or knowledge of the user's needs can tell us where to set the trade-off dials. - t

You aim to suggest that design is a ZeroSumGame of trade-offs. But that conclusion is easily debunked: it doesn't take much effort to take a high-quality product and (intentionally) design something that is 'worse' by every quality and metric you're using, therefore it must also be possible in the general case to take a design and improve it by every metric and quality. The possibility exists that solution S2 is better in every way you measure compared to solution S1. In those cases, it doesn't matter how you 'weight' features. Given this is true, you cannot assume that there must be a context where a particular paradigm-set - Relational+Procedural, for example - will be stronger than some other paradigm-set. Instead, you actually need to do some work and find those contexts in order to have a valid point.

Further, on the 'very large scale', you need to ensure that all relevant forms of 'scalability' has a suitably high 'weight' when it comes to these trade-offs. That constraint further restricts your ability to assume there is some context where Relational will provide superior scalability.

Earlier you stated, "On a very large scale, databases are superior in my opinion, largely because they provide ready-made features to manage big collections of semi-similar stuff." Some questions for you: (A) What did you mean by 'very large scale'? (Are you just scaling the amount of data? Or are you also scaling the number of developers, number of users, number of CPUs, geographic scaling - distribution, temporal scaling - how long programs live, etc.?) (B) Can you provide contexts and convincing arguments where Relational will 'generally' scale better than use of Objects?

I cannot give a simple metric because it depends on lots of details and interrelations, such as how many other tables/lists it joins with and how often. And I cannot provide objective evidence because I believe that relational's benefits are largely phychological; that is "mind fit" (remember the story behind my hanlde). Further, some individuals may have "OOP heads", which I don't dispute. The computer doesn't really care what paradigm it's running. Human management of complexity is primary issue. See SoftwareEngineeringAsManagementOfSoftware. (I'm sure you believe that computer-assisted validation via type-checking is a big factor, but I don't want to rekindle that debate here.)

Well, I think this argument is done. I personally think you're HandWaving above and have been for a while now, but I'll LetTheReaderDecide.

I personally think you are HandWaving on the requirements. More precisely, making non-confirmable excuses why you cannot provide it. Why can't you just give a scenario where "here's OO doing X and here's relational trying to do X but failing right here at line 123 because one cannot rename the table while it's being used" or whatnot. You only have to provide one scenario of failure to demonstrate the existence of a weakness or flaw. You don't need generalities to acheive that. I didn't ask for a theory lesson or a list of your pet topics; I only asked for a semi-realistic specific example of where relational allegedly fails over OOP with regard to scaling. -t

I've described what I mean by scaling. You refuse to provide even that much. And if you were intelligent, perhaps you'd already be in the know as to why 'small examples' don't do much for showing 'scaling failures'. Let me know when you figure that one out.

I thought you were trying to describe the scenario in slightly more detail (and not doing so well), not define "scaling".

You're looking at the wrong text. Look to the top, where I said: "the very large scale - by which I mean DistributedSystems". I even provided an example: "The Internet is 'very large scale'."

The "scaling" issue I originally referred to is not so much about computer systems breaking or crashing more. It's more about managing all the instances/records/data from a human standpoint. If there is a problem, how does one go about trouble-shooting, for example. If I can query my 100,000 "units" using a decent query tool, it's easier to hunt for problems or clues. Say some of the cameras you mentioned are sending corrupted data and we want to see if we can find a pattern to the problem cameras and/or their attributes to give us clues. We may create a quicky report/query by location, corruption time (picture/vid time-stamp), by installation date, by model number, by installer employee ID, combos of these, etc. etc. etc. This kind of troubleshooting is quite common in large systems.

So, to you, 'scaling' refers only to 'scalable management', and in particular for 'data' (as opposed to also scaling process: consumers, producers, concurrent activity, new applications), without any provision for scalable security, scalable cross-security-domain management, or scalable performance. Well, OOP isn't much about 'data management' so it doesn't even compete; under that biased and rather featureless form of 'scalable' I can see how RDBMS would be superior.
You seem to be assuming that "data" is mutually-exclusive with all those features/characteristics. I have to protest that. Like I said many times (and should have been anticipated by you), one can "convert" an application to be heavily built out of data-centric idioms. (Related: DataAndCodeAreTheSameThing) If you shift its design to a data-centric design, then you have lots of ready-made idioms you can use almost out of the DB box to apply to your app. (See DataIdiomAndBehaviorIdiomQuantity.) It's not economical to roll-your-own transaction manager and security system etc. from scratch for each and every app. (Yes, such reinventing sometimes makes sense, but shouldn't become a habit.) And it provides standardization. You can hire people who already know a given RDBMS. True, there's OOP libraries for a lot of these, but then the "benefits" you talk about are not from OOP, but from the libraries.
You suggest "convert" as though it is trivial, but compilers and interpreters are all about "conversion" and are anything but trivial. Anyhow, while "data" is not mutually-exclusive with all those features - and I did not suggest it to be: my wording was 'also' scaling, not 'instead of' scaling - the relational model for data is far more restrictive than the wider array of "data" models in general. Regarding your implied accusations about OOP: (1) even for OOP, one doesn't need to "roll-your-own transaction manager and security system etc. from scratch for each and every app" - just use an OOPL with distributed transactions and you're set (and one form of scalable security is nearly free - ObjectCapabilityModel). Suggesting that OOP implies that these things must be invented on a per-app basis is a StrawMan, and you should know that already. (2) Your accusation does, however, apply to your 'conversions' - unless you write in a higher-level language that compiles down to use relational under-the-hood (at which point you aren't much programming relational), you will need to laboriously re-invent these 'conversions' on a per-app basis or you will need to use libraries - at which point you are (as you say) not using relational for management - you're using libraries. Anyhow, DesignPatternsAreMissingLanguageFeatures. If you rely on them on the broad scale, then you can't get by with just hiring people who know RDBMS; they must also grok your SoftwareDesignPatterns.
- "Convert" was a poor choice of words on my part. "Re-think design towards data-centricity" would be better.
- That's fair. Similarly, OO designs involve ObjectCapabilityModel-secure patterns and such when scaling to multiple organizations or mutually untrusted users.
- As far as "relational model for data is far more restrictive", you haven't provided a dissectible instance/example of that.
- The only reason to use the word "relational" is to add restrictions - otherwise you'd say "arbitrary data model", would you not? Are you contesting that relational model is NOT more restrictive than 'arbitrary' data model? Well, here are some examples of restrictions: (a) every data element is a tuple in a relation, (b) each relation has a fixed number of attributes, (c) variables in a RelationalAlgebra or RelationalCalculus carry relations (rather than references to tuples, tuple elements, values, sub-values, or external structures like databases, queries, or communications elements), (d) operations semantically involve whole relations (though implementation may be optimized somewhat by indexing). These restrictions are GoodThings most of the time, since restrictions fundamentally offer guarantees which aide for optimizations and might even aide 'psychology' - ability to understand the system with confidence. However, they still 'restrict'.
- To make use of indices requires maintaining them, and that is difficult to do when you need to partition the data, there are hundreds of data sources and applications adding and removing arbitrary data, and you are dealing with node failures (disruption), and so on. The optimizations that allow RDBMS to perform effectively on a single system do not scale to distributed systems. The security used for RDBMS to protect from abuse within an organization does not scale effectively to protecting data and maintaining secrets for an RDBMS scaled across organizations. And so on. It may be that new mechanisms can be found to partially resolve some of these issues, but to simply declare they will be found is WishfulThinking. One might compromise RelationalModel to achieve scalability, but then one isn't using relational to achieve scalability. Meanwhile, "arbitrary data model" does not have many restrictions - does not benefit from them, but also is not hurt by them.
- With careful decision, one can get both worlds - without compromise to either - by using Relational as a primitive element within a higher-level model. This design is known as stratification because you don't "mix" primitives into an impure goo. FunctionalReactiveProgramming, especially, allows Relational to scale (by tracking 'views' of data that combine small, distributed databases into larger 'virtual' databases on demand, allow programmers to name fallback sources for partial failure on a per-app basis, tweak input sources based on selection factors, etc. - essentially ad-hoc distributed queries). And OOP allows FunctionalReactiveProgramming to scale (by naming distributed data-sources and thereby providing ObjectCapabilityModel access-control precisely limiting which data each organization can access and (separately) which data they can manipulate - without reliance on a benign implementation or mutually trusted third-party organization to maintain security. Given distributed transactions and you can manipulate dozens of independent databases (for which you have mutate-capabilities) with a single update. So, use of Relational can reach the 'very large' Internet-scale... it just isn't scaled by use of Relational. It is scaled, instead, by use of OOP and OOP-plumbing elements, and a few runtime features (independent of Relational/OO distinction) such as PersistentLanguage and DistributedTransactions.
- Regarding OO transaction and security system "not from scratch", that may be true, but I addressed that already later in the paragraph (now below). Thus, I did NOT commit a StrawMan sin and deserve an apology.
- Your argued that the features could be provided by libraries "which are not OO". That does not correct, or address, the fallacy. OO does not need 'libraries' to get these features any more than Relational does.
- As far as "missing language features", I believe in separate-but-integratable features instead of One Big Language (OBL). Humans usually fsck things up when they try to consolidate too many concepts into a single master thing. But OBL is for another topic anyhow. - t
- Humans hopefully have hundreds of years ahead of them to properly consolidate concepts. But, short term - i.e. for my next few projects, at least - I agree that separate-but-integratable features is a more practical option. These beliefs are not incompatible.
- I have to wait 500 years before you produce your OBL to test? That's probably how long it will take to get right ;-) But I expect Cylon-like things will be writing our software by then anyhow, and they won't code in text.
- I'm not making an OBL. I'd love to do so, but I can't figure out how to integrate DeclarativeMetaprogramming orthogonally to LiveProgramming and distribution. Doesn't mean it's not possible. I just haven't figured it out. Yet. I'll leave it for MYOBL2.0
Libraries can be made for any paradigm. And it could be argued that a collection of such libraries is more or less a DBMS, or at least a DBMS kit, which takes us back to GreencoddsTenthRuleOfProgramming. - top
That's a serious stretch, TopMind. I doubt you could argue that any collection of libraries no matter what the paradigm is a DBMS, even if you can imagine a few cases that might qualify. Since DBMS-kit isn't defined, I'll not bother with it, except to note that your GreencoddsTenthRuleOfProgramming can't be reasonably argued to apply on the basis that a library might be pluggable into a DBMS.
Please clarify. Note that there's a fairly-wide spectrum between being database-like and being a database. GreenCodding? is a scale, not a Boolean attribute.
Until you define "DBMS-kit" you can't even argue that it increases this imaginary 'green-codding' scale (which also needs to be defined, since it seems you're in a mood to make up words and leave me to guess your intent). I suspect that after you define "DBMS-kit", you'll either be able to find DBMS-kits that don't increase the 'green-codding' scale, or you'll prove unable to claim every collection of libraries is a DBMS-kit. I.e. I believe you're making an untrue statement... and not even a useful one.
Now you know how I feel about your lack of specific scenarios for RDBMS allegedly failing at something nebulous. ANOTHER reason to present a specific scenario is that it reduces vocabulary problems because it's more concrete. I don't want to have vocab debates, I want to see real failures in real systems, or something sufficient close to real. A claim such as, "you cannot add that feature without adding a new column" is less likely to generate vocabulary issues than the definition of DBMS. General statements are prone to vocabulary problems.
Sane designers simply won't use an RDBMS where it won't scale, TopMind, and thus won't have a failure. But you should consider decisions to avoid use of RDBMS based on its limitations to be 'failures' for RDBMS. You should consider the reasons that different organizations use different RDBMS's to be a scaling failure (i.e. it literally means that RDBMS "failed" to scale to multiple organizations - compare to the Internet, which did scale to multiple organizations). You should consider the reasons Amazon SimpleDB introduced 'domains' and doesn't allow cross-domain 'join' queries to be a failure for RDBMS scaling, because if RDBMS was 'scaling' properly then there'd be no reason to create separate 'domains' (essentially separate databases) - one would just use one big domain. You should consider any compromises to RelationalModel integrity for scaling - such as allowing observation of duplicates, falling back on dirty reads, and so on - to be failures for RDBMS scaling, because that's what they are: you were forced to switch to a not-quite-Relational system in order to scale. You aren't in the habit of seeing "failure" without also having an idiot to point at (i.e. you're looking for some dumb guy who "tried to use a shared RDBMS to control all the pacemakers produced by his company" but had problems with transactions and security... or some other failure) but 'failure' is broader than that. Open your eyes and its obvious that RDBMS is failing to scale - real failures, in real systems.
They have custom-built DBMS. But what does that have to do with OOP? Maybe they BOTH suck past a certain point.
Different web-servers, ServiceOrientedArchitecture, etc. tend to follow the basic OOP tenets insofar as developers interact with them (name/URI, a common language, a common protocol, ability for objects to do whatever they want with a message, but contracts for expected behavior, de-factor or standardized message formats, etc.) - the Internet in essence uses an OOP design in order to scale, and is very-large scale. Similarly, SimpleDB's 'domains' are essentially separate objects on the Amazon Cloud. Different RDBMS's can't be treated as common extensions of a single entity; each is an independent object, and must be treated as such by applications. Objects are being used instead of Relational in order to achieve these forms of scalability at the high level. There are other designs with potential to reach Internet-scale, such as dataflow programming (more focus on multi-cast, publish-subscribe, embedding filters and transforms and joins) - and these have been partially proven for large systems (gigabytes of data per second, thousands of nodes, a few organizations) using DataDistributionService. So OOP is not the only scalable design. But it is a proven design. By comparison, relational has been repeatedly tried... and has yet to succeed. Repeatedly, repeatedly, repeatedly: at the very-large scale, RDBMS switches to smaller, independent 'database objects'... possibly amended with a common protocol for distributed transactions.
It's not clear whether the paradigm or the implementations failed.
That's a fundamental problem with asking for 'examples' rather than following that arcane logic from first principles, TopMind. You got what you asked for. And examples seem to be the only thing you're equipped to understand, since I doubt you could follow the logic or recognize the first principles for relational model, information theory, etc.
- You are probably full of it, but anyhow it wouldn't matter if I understood if you used a formal and clear proof. One processes one formal logic clause at a time when verifying, only focusing on the application of formal logic, not "understanding" the "rules" behind relational. If relational precludes X then it precludes X. Why it precludes X is a side issue. You're just making excuses not to be clear, hiding behind academic-sounding gobbledygook-gook. I didn't get what I asked for: a sample application that has or emulates the problem you described.
- This gives a pretty hard raking of SimpleDB: http://www.ryanpark.org/2008/04/top-10-avoid-the-simpledb-hype.html (period) It's suggested that Amazon doesn't use Oracle because they don't want to pay the big bucks Oracle would ask to provide and tune a high-end system, but may regret it when they reinvent RDBMS idioms the manual way. The paradigm that failed is Amazon's Wallet.
And also you tend to use an overly-wide definition of OO. If I use an overly-wide definition of relational, it could be made to fit also.
I favor NygaardClassification, which is narrow enough to exclude: pure functional programming, functional reactive programming, pure dataflow programming, logic programming, constraint logic programming, term rewrite systems, procedural programming in the absence of a heap, procedural programming in the absence of procedure-pointers, SnuspLanguage, CellularAutomaton, and many, many other designs. That you consider this 'overly wide' baffles me. I suspect you have an agenda in dismissing any definition of OO that makes it look like a common means of developing and organizing programs. Fact is, some definitions encompass more than others by sheer population...
I bet they'd *like* to have the ability to query across hard partitions, but simply have not got around to it, or perhaps use partial replication to an RDBMS to get it. Nobody chooses hard partitions by default. I guarantee that every IT manager would at least like the *option* of a one-big-DB view of everything.
If I didn't believe cross-domain 'joins' to be something they'd *like* to have, I wouldn't have suggested it to be a 'failure'.

OOP does not provide that out of the box, and "encapsulation" tends to make it difficult because it says we have to explicitly add collection-oriented "services" to each object or object "kind" we want to analyze.

That really isn't much a problem. One of those "kinds" could be "relational database", then could be used repeatedly. But OOP does have issues for data-management when taken by itself. But just like you're presumably using something like 'procedural' on the lower-level scale for relational, you should asume that OOP gets its pick of lower-level scale components. I earlier named which ones I'd choose: functional reactive and event-flow programming for 'plumbing', and functional + logic for transformations and pattern-matching. Relational gets to use these too, if it wants them; the question is overall scalability at the top-level.

An RDBMS that partitioning data across multiple locations is more likely to automatically handle indexing, joining, and combining results that may be from physically split up tables or DB's. Those are far less likely to come out-of-the-box in an OOP system. You don't just build that kind of stuff into an OOP app because you "might need it someday". But it would be natural to have partition management in an RDBMS. Nobody would ever go, "hmmmm, I wonder why they added all that into this RDBMS". But it would look out of place in an OOP app.

It wouldn't look out-of-place in a VirtualMachine or OOP language and runtime. DistributedSystems programming with OOP typically involves replication, partitioning, distributed transactions, concurrency, and some sort of delayed-events system (i.e. implicit queue for messages rather than processing on stack). The OpenCroquet project is one take on this, but there are many others.
But you are building all that infrastructure from scratch using OOP. OOP itself is not providing it. RDBMS present to one fairly high-level collection management idioms out of the box and people expect that, or at least could easily consider it part of a new or improved RDBMS. A DBMS is generally a set pre-packaged collection-handling idioms/abstractions. That is what a DBMS "is". One could build transaction handlers out of tinker-toys, but that doesn't mean that tinker-toys are "about" transactions.
Relational does not provide persistence, distribution, concurrency, or transactions. Those are features of an RDBMS - a 'runtime' atop which relational runs - and that the relational languages (SQL) and protocols (ODBC) must integrate with meaningfully. Compare: OOP does not provide persistence, distribution, concurrency, or transactions by itself. Those, however, may easily be features of an OOP runtime, with which the OOPLs and cross-platform communications protocols must integrate meaningfully. I am not suggesting one build all that infrastructure "from scratch" using OOP on a per-app basis any more than you are suggesting the same for users of relational. If you are going to make a fair and reasonable comparison between OO and Relational, then you cannot be comparing Relational with <Swarm of features provided by its runtime> to OO without the same... yet that is what you are attempting to do. If you can argue that OO is incompatible with a feature, that's fair, but you have made no such arguments.
- Note I said "DBMS" and not "relational" in the paragraph you replied to. As far as the scope of tools, see below.
- You were comparing small-scale "OOP" to industrial-strength "RDBMS". The point stands that this is neither a fair nor a reasonable comparison. Concurrency, distribution, disruption-tolerance (including persistence), and mechanisms for recovery after partial-failure (like transactions) are features of "scalable systems". A DataBaseManagementSystem? is "about" data-management, not about transactions; support for concurrency, transactions, persistence, distribution, etc. are optional. Sure, they are necessary in the context of industrial-strength use, so they are common. But they're still optional. Similarly, an OOP system with concurrency, transactions, persistence, and distribution suitable for industrial-strength use (and, indeed, DistributedTransactions are part of SOA). But their presence doesn't make an OOP system into a DBMS - OOP, after all, still lacks standardized collection handling, data-model, query language, and management support. I'd say it's a GoodThing if OOP goes even further in that direction - i.e. by rejecting 'immutable object types' - to allow 'pure' functional & logical components (for messages, data transform, and collections) to be added without risk of feature overlap.
I do agree that OO is not strong for collection and data management (Lacks standardized collection handling, above), but that only means that OO should be stratified with a paradigm that is strong at those things. (Similarly, Relational is incompatible with communications, so you stratify it with procedural.)
Perhaps you are right, but if we are comparing pairs or sets of tools, then our comparison scope is more complicated than OOP versus relational or OOP versus RDBMS. We'd be in a different ball game. (And, what do you mean by "incompatible with communications"?)
You're free to either compare OOP to RDBMS, but you ought to assume you're talking about industrial-strength OOP if you're also talking about industrial-strength RDBMS. Or, if you wish to talk about small-scale OOP, then you should perhaps consider small-scale RDBMS - the sort that sits inside an application and lacks automatic persistence, special support for concurrency, etc. To extend your metaphor: if you're gonna enjoy a ball game, it helps if the competitors are from the same league. And by "incompatible with communications" I mean "RelationalModel computations may not talk to devices, humans, etc. while processing". It wouldn't be Relational if queries had side-effects. You use procedural for communication... and in computation, SideEffect is simply another word for 'communication' via state transform or messaging.
As far as what's available, we currently have Oracle and DB2 as the pinnacle of scalable RDBMS. We don't have anything comparable to a standard (formal or de-facto) on the OOP side.
That's a market and implementation issue - relevant, certainly, but not fundamental to the technology. We've had persistence, transactions, concurrency, and distribution for objects for quite a while, though, and standards for transactions (X/Open XA being a popular one, supported by COM, EJB, and even many RDBMS's). Communications standards are abundant, which is a problem - Relational, at least, has an advantage in having settled on SQL which, despite its flaws, has the advantage of being a near monopoly. But it's worth noting that Oracle and DB2 - the "pinnacles" of scalable RDBMS - still fail to scale to support multiple organizations. The current work on scaling on OOP is on providing enough high-performance concurrency and language-level security to allow multi-organization CloudComputing and automatic distribution. RDBMS, however, isn't even competing on this 'very large scale'... except in the form of RDBMS 'objects' - i.e. rather than a bigger relational database, we switch to using more relational databases. I suspect Relational could scale above individual databases as a subordinate to FunctionalReactiveProgramming - ability to watch views, join across databases, and maintain a query on multiple independent RDBMS's - but it hasn't happened yet. OTOH, I am implementing all the necessary elements as part of MYOBL1.0. (I use a data-model incompatible with SQL, though one could add plugins to an Oracle DB to integrate.)

Regarding examples being too small: It's usually possible, with enough thinking, to create a simpler demo that isolates the problem and only the problem. Now I agree that sometimes if you strip out all the side stuff, one may wonder why it was done that way. After a few question and response cycles, one may either start to agree with the reason for "doing it that way", or suggest alternatives that may have avoided the demo'd problem.

Example:

A: "We can't use the file system to store and access all the NY mug-shot photos. It cannot handle it; so we have to buy something expensive instead.

B: "What do you mean? What's happening?"

A: If you put all 100,000 mug-shots in a single folder, the folder takes more than 15 minutes to open in Windows Explorer, and it's also rather slow to access from a program."

B: "Have you thought of using a hash to divide them into multiple folders?"

A: "How?"

B: "You use the booking number digits. Take the last 2 digits of the booking number, since they appear to be either sequential or random, and put each file into the corresponding folder. You'd have 100 folders, each named after all combinations of the last 2 digits. Thus, no one folder gets overly bulky. Here's a little script to save and retrieve a given photo..."

A: "Oh, okay. I didn't think of that. You are right, we can use the file system effectively."

That's not a 'demo', nor does it exhibit a fundamental scaling problem (bad scaling for folders of certain FileSystems is an accident of implementation, not a problem in B-tree based filesystems). I suspect you'll have a much harder time coming up with a 'demo' for scaling problems that are more fundamental, and any attempt to explain why it is fundamental will necessarily move into an argument based on known truths and logic. I'm relatively strong with logic, so I'd rather jump straight to the meat. But I'll try to remember that artificial dialog and logically ineffectual demonstrations help you grok things.

When there are communication problems, it's generally wise to try different approaches rather than simply repeat the same technique louder.

Is the WWW Object Oriented?

This question is interesting enough to deserve its own page. Content moved to ObjectOrientedInternet.

Pooh with small voice says: "To have a scalable OO Internet, you need a UnifiedDataModel."