Pernicious Ingrown Sql

In my travels I've noticed a troubling disease. At first I thought it was just the occasional isolated case, but now I'm worried that it's an epidemic.

Yet another case of ObjectRelationalPsychologicalMismatch.

I've observed that a lot of people have the notion that a web application should be looked at as a contraption for generating SQL strings and then wrapping HTML around SQL result sets.

If that's what gets the UseCase done, that's what should be used.

(This is in stark contrast to my perspective, that a database is a place where good objects go to rest when we don't need them, and that the fact that a database speaks SQL mainly influences how I optimize certain operations.)

Symptoms include some or all of of the following:

lots of untyped strings and ints with semi-mystical values
large amounts of hand-written SQL throughout the code base
extensive use of the database as a processing engine
a muddy object model
an object model that doesn't map well to the business domain
an inability to change the schema because of the code
an inability to change the code because of the schema
lock-in to a particular database vendor and version
pain, misery, and woe

Email WilliamPietri sent to the XpMailingList, reproduced here by RobSartin.

My point in posting this wasn't really a request for technical solutions; I'm looking for ways to wake programmers up, to get them to understand that there are other ways of looking at their code. Once people get it, I can get them to use an ObjectRelationalMapping tool. But how do I get them to think about things in the right way? -- WilliamPietri

Getting people to "think about things in the right way", is, of course, non-sense, and it won't succeed as a matter of practicality, nevermind that there's no "ideal"/"theoretical" case to be made for TheRightWay?(TM). Often times an ObjectRelationalMapping tool is the ("not so good but it works") solution to the wrong problem.

I HaveThisPattern. True story: in giving a series of tutorials on a ModelViewController-style web application framework, explaining the SeparationOfConcerns, how to configure the controller to invoke actions in response to specific request parameters, database interaction via a simplified ObjectRelationalMapping layer, and presentation generation, the most frequent questions is, "Where do I put my SQL?" -- StevenNewton

What I do to avoid this is use a data-driven ObjectRelationalMapping framework. It's a lot easier than writing or maintaining SQL to do various queries, and it winds up making applications more flexible (for instance, by providing database independence). The one I use is the EnterpriseObjectsFramework included with WebObjects; it's 100% Java, and you don't have to use it with WebObjects if you'd rather use some other web application server. (WebObjects is built on the EnterpriseObjectsFramework, not the other way around.)

One of the things that's great about the EnterpriseObjectsFramework is that you can reverse-engineer an existing database into objects very easily. You point the graphical EOModeler tool at your database, select which tables and stored procedures you want to use in your object model, and it will build an object model for you automatically. Then you can customize it and start using it in your application quickly. -- ChrisHanson

I HaveThisPattern, too. We required from a subcontractor that an application had to have a distinct object-relational mapping layer. They said, sure, they did this every time anyway, and had a finished framework ready for that. Later, it turned out that their "framework" for "object-relational mapping" consisted of a class that took a sql query string, executed the query and converted the result set into a vector of String arrays - never mind whether a field was a number, a boolean or a date, everything ended up as Strings. These were then sent to the servlets / JSPs for processing and display (<%=d[i][5]%> etc.). After all, Strings are objects, right? So it must be an object-relational mapping, right? - It would be more funny for me if it wasn't true. -- FalkBruegmann

Amazing! I've seen the same thing as FalkBruegmann talks about, except the ResultSet was converted to a two dimensional String array. Of course this was on a EJB project, with tons of duplicated code and no tests. Fun stuff. -- TomCopeland

I guess this Pattern / AntiPattern originated in the days of cgi-script web applications, where the database was the only place for data to persist between requests within an interactive session. If every piece of data with a lifetime of greater than one request had to get onto the database within the request that created it, it is easier to think of the data in SQL terms, rather than craft them into objects that would be mapped to SQL and destroyed within milliseconds anyway. -- AndrewMcGuinness

I think there is an ObjectRelationalPsychologicalMismatch (philosophical divide) issue here. OO proponents tend to see SQL as a mere storage mechanism, but relational proponents see relational as a powerful tool that can simplify code and structure management, and thus not something to be merely "hidden away" but used to its fullest potential. If you only use RelationalDatabases as a storage mechanism, I can understand such a distaste for "naked" SQL. But I tend to side with the ChrisDate viewpoint on this one. OO proponents are making a mistake by pushing aside relational's potential and end up reinventing the database in their own application code without realizing that they are doing such. (I agree that SQL is not the ideal relational interface language though. But, still better than hand-coding DB features from scratch.)

Further, in order to have "swappable vendor" database code, you would have to use a rather wimpy subset of RDBMS features, creating more code to do what the DB might otherwise do. Plus, the SQL standard is often not closely followed or leaves too many gray areas. The alternative is go with an open-source RDBMS or roll-your-own from scratch, which OO proponents seem fond of doing anyhow it appears to me and some other relational proponents.

SQL text found in applications is mostly monotonous. See WhereAndAnd. So IMHO creating "vendor swappable" code is not a big issue.

This problem is not limited to web applications. I have seen many VB applications that is also a contraption for generating SQL strings and then wrapping VB gui around SQL result sets. As the saying goes, you can write Fortran in any language, this kind of bad practice can be seen in any kind of system. Web apps are just the latest fad that attracted the newest batch of newbie programmers.

This problem may not be limited to web applications, but the issue mentioned above with objects being destroyed at the browser still exists and is a very strong factor in modern web app architecture decisions. Cookies and sessions are both unreliable on the browser. Is it any wonder we cling to the one data storage method over which we have the most control?

Besides, a well-designed relational database will take care of so much work for you. Maybe this is just laziness talking, but I trust an established database to police variable types more than my own code. Provided you don't ScatterSqlEverywhere, I see no problem in letting it do its job. See DatabaseIsSoftwareToo.

By the way...

"Web apps are just the latest fad that attracted the newest batch of newbie programmers."

Does this comment belong on this page?

-- LaraKramer

I would like to see a semi-realistic example of why it is a bad practice. I have seen some cases where wrapping adds and updates may be prudent because most RDBMS don't allow updatable views. But beyond this, I don't see read-based queries a problem. The maintenance effort caused by wrapping is equal or greater than that of not wrapping it in my observation. SQL used to its full potential is too powerful to be wrapped by a "flat" API anyhow.

One can make utilities to generate the repetitious parts of SQL (See HelpersInsteadOfWrappers for examples). This is automating the repetitous parts without trying completely push SQL into a closet. Another way of saying this is that automating repetition and hiding SQL are not necessarily the same thing.

Ok, I HaveThisPattern as well. I would love to write code in a nice, totally non-coupled fashion. Unfortunately, I have to react to business need and I'm sure I would end up taking too long writing code in this fashion. Sometimes having totally decoupled code with processing engines, data engines and the like are function of institutions that have a tight relationship with an IvoryTower or a very large corporation. Let's face it the medium and small businesses of the world (most of them), don't have the resources for this kind of intellectual twiddling. With that said, I would like to say I would l love to be in a spot where I had the luxury how to do it TheRightWay?. I don't believe this is a newbie or intellectual problem. It's probably more of a time-resource problem. -- JoshuaRobinson

I don't really agree with the overall premise of this page, but there are some legitimate concerns.

lots of untyped strings and ints with semi-mystical values

I'm not sure what an "untyped string" is, but I think the mention of integers probably refers to enumerated type codes in the database. One way of addressing this sort of thing is to accompany the schema with code that defines such things symbolically, and being consistent about using the symbolic names.

large amounts of hand-written SQL throughout the code base
lock-in to a particular database vendor and version
an inability to change the schema because of the code
an inability to change the code because of the schema

These are all legitimate concerns. Centralizing the definitions of non-trivial queries, sticking to standard SQL whenever possible, and having a flexible compatibility layer to hide details of the DBMS API all help here.

extensive use of the database as a processing engine

This, I don't see as a problem at all. For many processes, especially those involving large updates to the database, the DBMS can do things much more efficiently than application code can.

a muddy object model
an object model that doesn't map well to the business domain

These don't sound like problems with the use of SQL, unless it's perhaps an indirect consequence of trying to force the output of every query into an object.

pain, misery, and woe

That just sounds like a characteristic of programming large systems. :)

Wrapping every database in business objects discourages the use of queries tailored to the immediate needs of the application, which in turn leads to gross inefficiencies. A pattern I've seen often, for instance, occurs if you have need to display a list of only a few columns of some business entity; loading each business object in order to display only a few attributes in each row can be terribly inefficient, especially if the business objects contain complex hierarchical structure that's not needed for this particular display. (Consider for instance bank account transactions with detail records, when you only want to display a summary list of check numbers, description, and total amount.)

[That's an excellent point. Display shouldn't be considered business objects. Perhaps operations from the display are on business objects.]

The problem isn't limited to display, that's just a common example. Any sort of batch process or "macro" process that might have to examine numerous records in the database will be up against the same problem. Yes, you could use business objects to encapsulate some of this, as they also often encapsulate basic loading and updating of their corresponding database entity. But eventually, you'll run into some process that relates to multiple business objects. Personally, I think that attempts to map the entire database schema to business objects causes more code organization headaches than it solves, and, as implied above, discourages use of the inherent flexibility of a relational database system.

On the other hand, the inherent flexibility of SQL is what makes using it so difficult. When any statement can at any time, access and modify data, and you give up specific data structures for generic datastructures like resultsets, code becomes essentially weakly typed and data completely unprotected. Every time you write a new statement, you have to hope you aren't violating any business rules. With objects you know you aren't violating any business rules, they are all contained within the model. Trading Object.Name for Record["Name"], means giving up strong types, for those of us who like that. It means no longer having the compiler assist us when changing things around. It means resorting to hunting down field names with search instead of compile errors. IMHO, this leads to a lot of broken brittle code, it also leads to spending copious amounts of time chasing SQL statements just to find out what fields exist in the record you're working with. Using business objects, one can generate a base system to work with, and tune and optimize from there, you can always drop back to SQL when the need arises, and it always does. Starting with SQL on the other hand, forces one to start from scratch. I'd rather start with something and tune the slow parts, that start from nothing each time. It's not like using the object model forces one to give up SQL, it simply let's you only have to write the slow parts by hand, rather than all by hand.

I've been working as a DB designer for ~20 years now and on the way dabbled with OO (my favourite language being Smalltalk). This is mainly to try and overcome some of the difficulties people experience connecting OO with (relational) databases: nowadays mostly working with Java/JDBC. There are a lot of interesting issues involved but the above comments about SQL are both true and a little unfair. SQL is a flexible language precisely because, as Hugh Darwen states, a database is a set of axioms (the data asserted as tuples in a relation [translation: rows in a table]). The result of a query is a theorem (new data derived from the orignal facts/axioms/data). The process of deriving the theorem is a proof (we can prove a piece of SQL correct because it is based on relational calculus/algebra). Now, would you want your mathematical theorem generator to be inflexible? How would that limit you to what you could prove correct? In practise, not many people bother to prove things, but the point is that RDBMS's are based on a sound mathematical footing and hence one could prove things if you wanted to. -- PaulHorth?

"...code becomes essentially weakly typed and data completely unprotected. Every time you write a new statement, you have to hope you aren't violating any business rules."

If you're using an industrial-strength DBMS, you put the database constraints in the database, where they will be checked regardless of who accesses the database from where. This ability is fundamental to relational databases, for exactly the reasons you cite. Many database constraints can be easily expressed declaratively.

I agree, but find that most business rules can't be enforced with simple constraints. Most rules aren't simple validation, they are far more complex and the business objects make them easy to express. So I stand by my original statement, you basically have to hope you aren't violating rules when you allow sql instead of business objects.

Well, as a database is a set of axioms (see above), I would like to ensure they are correct and not inconsistent before I use them. At least with SQL, being based on relational calculus/algebra, can be 'proven to be correct. In the DB I use (Oracle) complex constraints can be enforced. I like to stick constraints that are company/system wide in the DB. Application-level constraints should go in the application layer. -- PaulHorth?

If you rely on business objects to enforce all the rules, then you will have a very hard time indeed when you have to "start tuning the hard parts" - you'll have to heavily refactor your validation code to decouple it from the business objects, or you'll have to replicate a lot of validation in different forms.

I use the database for simple validation that constraints can handle, but a system which works and is slow is far better than a system that doesn't work at all, and I find most SQL approaches are so riddles with bugs and violations of the rules that I consider them not working sufficiently, it doesn't matter how fast they are.

My experiences have been colored by working on a proprietary system that uses such an object-based approach. I've been "tuning" this system for several years.

I'm sympathetic to many of the concerns cited, though. I also preferred - vehemently - static type checking several years ago. I'm very concerned about maintainability and vendor lock-in. However, what we have done, and it seems to be serving us relatively well, is to add a heavy dose of unit tests to the system, along with centralizing the definition of non-trivial queries so that it's easy to automate many of the unit tests. In addition, we have our own DatabaseAbstractionLayer that isolates us from SQL dialectic differences -- but this was admittedly an expensive architecture to create, and the write path through this layer is only rudimentary at the moment, although the read path is working very well. For writing data, we currently do still rely on the business objects, where all of our data validation is encapsulated. (We're not using a very capable DBMS at the moment, although we hope to address this problem soon. We're using MS Jet, via DAO. You may laugh at me now, I won't be offended. Jet's numerous SQL idiosyncrasies are a major source of headaches for our abstraction layer, which generates SQL from relational expressions. The good news is that once I determine a workaround for a problem, nobody else has to worry about it again.)

I won't laugh, databases are a pain for most of us, and I know sql quite well. Knowing sql has little to do with the problems in using it. I've even considered not using objects and unit-testing every single procedure as an alternative method, I'm still on the fence. I like how fun and easy programming is with objects. I like pretending I simply have unlimited memory and no database. As a language, I think SQL sucks, it has little in abstraction capabilities and makes reuse a pain, maybe one day I'll find an approach I like on all fronts, until then, I'll take objects.

You may know SQL, but knowing it and knowing why RDBMSs are important and how to design a proper relational model and how to interface with objects and how to optimise it are two different things. Too many people spend a lot of effort trying to tie OO and DB together when nowadays it is probably cheaper for most medium to large scale projects to buy in (or get for free) a third-party tool like TopLink or Hibernate to do the ORM mapping. -- PaulHorth?

Maintenance of references to database names (tables and fields) is definitely an issue, and it would be nice to have the compiler warn you about invalid references. However, searching for these names does not have to be so difficult, especially if you use a simple trick. Use your DatabaseAbstractionLayer to enforce a uniform syntax for referring to them. For instance a small class named "fld" could be used to represent field names, so all field references appear as fld("FOO"). Now a simple regexp such as "\bfld\s*\(\s*"FOO"\s*\)" (Perl-style) can reliably find all references to field FOO, outside of SQL.

If you leave performance tuning to the end, in many applications it will end up consuming a huge amount of effort.

I disagree. I find it's quite easy to trace the database, find the most repetitive hits, and combine them, cache them, or write a proc to replace them. Few parts of an application really need that much effort, at least imho.

I keep telling our application-level programmers the same thing: retrieve only what you need from the database. This is the single most important guideline for performance when using a database, and following it requires bypassing business objects. Our small group of DB experts spend a lot of time rewriting application code to fix performance problems, but with decent schema documentation and well-designed schema, it should rarely require an expert to do this right in the first place.

Personally, I think dba's are a waste of time. I've yet to find a dba that knew as much as a solid programmer did. I've always found dba's to be programmers who couldn't cut it, of course I'm sure they're not all like that, just speaking from personal experience.

That's interesting, I've yet to find a programmer who knew as much as a solid DBA/designer about designing databases. Most programmers seem to treat an RDBMS as a place to shove data in a 'flat-file' structured manner: with all the horrific consequences we identified in the 1960's from that approach. Please do not judge people on how well they know your area of expertise, particularly when you may be ignorant of their area. -- PaulHorth?

[That's a dangerous attitude to take, whether it is supported or not. A similar attitude is that testers are programmers who couldn't cut it. The problem is that testing and DBA work and lots of other things are all essential, and it is counterproductive to sneer at those other professions.]

[A famous example is garbagemen and janitors. Civilization would collapse without them. It's much better to appreciate what they do for us, than to sneer that they can't do anything else - which may be true, may be false, but is irrelevant. We need them, and should be very glad they exist.]

<But that doesn't mean that we should put them in charge of anything but collecting garbage...>
- Are DBAs put in charge of anything but DB administration?
There is often a communication's gap between both groups. They both have different goals and different perspectives, which can create conflicts. Technical people are often poor communicators and this also contributes to the problem.

[Of course, good DBAs are quite skilled compared to garbagemen and other unskilled/semi-skilled professions, and that should be appreciated in its own right.]

[http://www.thefirstpost.co.uk/index.php?storyID=7092 (external link) The photos here illustrate the pressing requirements for garbage men and janitors quite effectively. Sorry if external links are frowned upon given that they are likely to disappear, but I was reading this page trying to decide how to tackle my current DB-based application (in terms of access layers) and while reading about janitors above I simultaneously ran into this link on another tab - A coincidence too good to pass up. Remove it if you will. More on topic, I will be the DBA and developer for this application, given the small size of the company I work for and I've previously built a (my first) RDBMS-based application using a single DatabaseAccessLayer object that dealt with ALL queries and attempted to return "business objects" representing the various outputs. I had mixed feelings about this - it helped in some ways but not others. More reading required... -- Chris R-M]

Like I said, I know it's not all like that, just recounting my experiences. I think dba's hamper development. I think programmers work better without them. I don't see a need to not allow the programmer to do the work himself, at least the one's who are good enough to do the dba's job, and there are always several. I like having a dba to maintain productions systems, upgrades, maintenance kind of stuff, but not in development. I'm sure other opinions will differ, but that's my take.

You do realize that your response has nothing to do with my critique, right? Note where I said it doesn't matter if you're right or wrong?
As to who should be allowed or forbidden to do certain kinds of work, to me this seems an issue of project ownership (much discussed on other pages here), not of job title/profession.

No, I don't realize that, you said that it was a dangerous position to take, so I justified why I take that position, how is that unrelated?

100% of what I said was about why it is a dangerous position to take whether it is true or not, so justifying why you think it is true is a complete non sequitur.

Except that I disagree with your assertion that it's a dangerous position to take.

That's fine, that's a direct disagreement, rather than going off at cross-purposes to my point. Ok, I explained why I thought it was a dangerous attitude, whether factually right or wrong; what part of my explanation is it that you differ with?

You said you think it's counter productive, I disagree, I think it's a more productive attitude. I think having that attitude is what makes one get the work done himself, rather than depending on a dba. I think programmers who don't need nor want dba's and can do the work themselves are far better programmers. I don't sneer at dba's, that's your projection, I just think they aren't needed, at least in development.

Allow me to make a distinction: the ability and willingness to do DB work without needing a DBA is what is productive, and can be a sign of better programmers.
That's different than the attitude that DBAs are useless, and are "programmers who couldn't cut it", as you said. It is the latter phrase that I would call sneering, and so this is not at all a projection (although it might be debatable).

Ah, I didn't realize you were just disagreeing with the "programmers who couldn't cut it" piece. OK, I was arguing the first distinction you made. So basically we agree, because I didn't say all dba's were programmers who couldn't cut it... I just said the ones I knew.

DbasGoneBad

Actually, I'm not sure how DBAs got into this conversation. I used the term "database experts", and the next response was about DBAs. The database experts in my work group are all programmers. I'm one of them.

I haven't worked with enough people titled "DBA" to have a strong opinion one way or the other. The ones I've met seem long on implementation-specific knowledge and short on fundamentals and theory, but the same could be said of most of the programmers that I've worked with. -- DanMuller

To you fans of heavily wrapping the DB behind API's and methods, I am curious about something. Would you still recommend it if the RelationalLanguage was consistent across vendors? If so, why? I am trying to figure out if it is the differences in vendor implementations that you are trying to protect from, or just the strings and their lack of compile-time checking, or being that it is a different paradigm than your favorite language? -- top

It's because it's strings. Even with a single perfect implementation, it'll get wrapped. Consider that the OO wrapper is likely to be an interpreter pattern that takes an abstract syntax tree of objects and creates sql out of it, the point of which is to make the sql operators into language elements rather than strings. I'd rather have to type ...

    new Or(
        new Eq("value", aVar), 
        new Eq("another value", aVar))

consider the lisp version

    (or
      (eq? "value"  aVar)
      (eq? "another value" aVar))

notice that wrapping queries doesn't necessarily weaken them, making insert, update, select, and delete into objects doesn't somehow cripple sql, it just makes them into language elements rather than strings.

    "'value'='"+aVar+"' or 'another value'='"+aVar+"'";

the object version is structured, and can be manipulated, changed, and can be translated directly to equivalent sql. It's just our way of making SQL native to the language. Strings suck.

But less than the alternative. I have not seen any SQL wrapper language that is simple, uncluttered and flexible. Sometimes there are patterns such as WhereAndAnd that one can take advantage of, but if something more complex comes along you may end up with a DiscontinuitySpike (have to rework the whole thing). Thus, I keep such wrappers light-duty so that they are easy to abandon if more complex SQL expressions are needed. They might resemble something such as:

  appendAndClause(clauseName, varName, value, validationOptions)

This simply adds an expression to a criteria clause (string). If simple appending no longer works, then I don't use such anymore and work with more direct strings. This way one can take advantage of patterns in SQL expressions, but resort to raw strings if/when needed without having to use wrappers for every other expression. Your approach seems all-or-nothing.

Boolean expressions in general don't convert to OOP API's very well.

Oh, explain to me how this is less functional ....

  new Or(
    new Eq(1,4),
    new And(
      new Eq(3,3),
      new Eq(5,5)))

than this

   (1==4)||(3==3&&5==5)

Oh that's right, it's not, is it more too type, certainly, it's called the interpreter pattern and it let's us have mini languages directly in the code. Sql's a mini language, thus we can interpret it from objects to sql, without losing any of its capabilities, so you don't have a clue what you're talking about.

They are harder to read and expand the code by about a factor of 3. I see no net advantage to bloating up the code with such.

Because it let's us work with a language that has semantics, rather than strings concatenated together.

If they make you feel comfortable, so be it. Just don't extrapolate your personal favorites to everybody else. I find them much harder to read if there is no trivial pattern. They slow me down, period! I have seen others laugh at such code also. It might be digestable for simpler expressions, but conditionals can get pretty elaborate and are hard enough to read in their original form. See ExpressionApiComplaints.

This wildly bloated code, isn't wildly bloated, and [?] be quite easy to work with, and is used for far more than SQL, those same expressions can be passed to HigherOrderFunctions and applied to collections, and lists. That's OO using functional programming idioms, unfortunately, languages like Java and CSharp don't have expressions, so we fake them. It's harder to read because it's prefix formatted, just like lisp and scheme, but once you get used to it, it's quite natural.

[Take a look at the C# 3.0 specification, where one can write this kind of code

  table.Where(m => m.Field != 'Value')

They've added expressions to the language and a generalized query syntax (lookup LINQ in google) -- PabloMontilla? ]

One can say that about anything ugly. Anyhow, do you have a specific example of it being wonderful and shining shoes? Further, what is your recommendation for those using weak meta-abilitied languages like Java?

To answer top's questions from my own perspective, here were some of my motivations for creating a relational API wrapper for SQL:

Portability among vendor implementations. (So, yes, this was a motivation. But not the only one.)
You need more than just simple expression wrappers. Translating the differences between vendors is not always a one-to-one translation or swap.
- Well, duh. Hence the next point.
Relational correctness. People get lost in the dark corners of SQL; I find it easier to use and to teach a more consistent relational algebra.
Huh? You created a new relational language that is translated back to SQL at run-time?
- Basically, yes. A C++ API modelled on the semantics of Date & Darwen's TutorialDee, not on SQL. D&D did all the heavy theoretical lifting; I plucked out the ideas that seemed most feasible and usefule, then figured out how to map them automatically to SQL.
Ease of use.
- Working in C++, I was able to create an API that was at least as easy to use as common SQL-based APIs. (Certainly less verbose to use than the examples above involving explicit "new" operations.) My original intention was also to have compile-time typing and name checking, but I lacked the time to create the tools to facilitate this, and people on the project had concerns regarding compile-time coupling overhead. We ended up with a dynamic API, which has advantages and disadvantages. Unit tests made up for some of the latter.
- Simplified concurrency models. We stuck to conservative, strong transaction modes, which are easily modelled in the API and easy to use, since there is less opportunity for concurrency programming errors. Our application serves small numbers of users at a time, however, so I'm not certain how well this would scale.
- Sounds like you are trying to create/emulate a whole damned database. I am surprised your boss lets you do such. Things must be slow there.
  - It was more work than expected, but it falls far, far short of writing an entire database system. No need to deal with implementing all the dirty details of storage and atomicity, for instance -- that's all still left to the underlying DBMS.

-- DanMuller

What's wrong with pushing computation onto the db? Unless I can use domain specific knowledge to short circuit some of the computation, the optimized db routines are likely to be a lot more performant than what I would otherwise write. Yes. working with strings is a pain in the ass.. but given that the industry seems to have standardized on p_programs_ expressed as strings (at least at the programmer level) I'm hard pressed to see the disadvantage, in practice...

I'd say that this entry also points to it's corollary, especially talk about getting developers to think about things in the 'right' - i.e. OO - way. This doesn't encourage developers to consider alternative approaches, but in my experience swap from one default pattern to another.

I work with a team of programmers and designers with many years experience, but most of it OO (while my programming is mostly server-side). They have translated the rule 'database access is expensive' - to become variously 'databases aren't performant' and 'to avoid going to the database, get as much information as possible every time you do'. Our default pattern for web search screens is to go off, query every possible result (possible hundreds of rows) back from the database to the Java layer, then let the user navigate through the record set a page at a time. Given that in 95% of cases the result to user wants is on the first result page, this is a complete waste of time. Of course, the second we want to make the application allow, say, flexible reordering or filtering of the results on the page, then it's NOT a waste of time.

Related: IsSearchPagingSmelly

This isn't specifically an OO/relational thing - it would be perfectly possible to have defined things all the way through to get the Top 5, but in my experience it does stem from a combination of always wanting to use the same approach, rather than thinking what is the most appropriate approach, and secondly an ignorance about databases - i.e. how to write an efficient query to get the nth page of data from a set in a stateless system - or even knowing that you can. Lack of knowledge of what your tools offer is the biggest cause of re-inventing the wheel and inappropriate decision making.

Of course, doing an efficient nth page query might be something your framework or O-R mapping tool provides, but as with any framework, how it operates should be something you understand. A framework should be a tool to save writing repetitive code, rather than a tool you use to avoid understanding 'what lies beneath'. It's good to know that doing operation X might be badly performing, but it's better to know that it's actually causing the server to page 6 Gb of virtual memory, not just waiting to lock a row - it gives a sense of perspective.

For hand-tuned SQL (i.e. queries, multi-row updates) I recommend storing it on the database - where views won't do, most implementations of SQL support functions / procedures that can accept bind parameters and either return result sets or pointers to result sets. This allows responsibility for SQL performance to be passed off to a smaller expert team / individual, who may not even know the target language. It also allows the same query to be shared across multiple languages (useful if you need to provide .NET and Java interfaces into the same system), or the same query to be tuned for different RDBMS versions (more useful), or even RDBMS (currently difficult due to variation in implementation).

I still think most of the problem is with Java's static typing (it's a Static/Dynamic conceptual split as much as Object/Relational). Ideally you want something that can automatically extend classes when the database structure is changed (or deprecate on column or table removal) - the sort of syntax that makes Oracle's PL/SQL a very concise language - you can define a collection of %ROWTYPE, fetch into it, and iterate through it, without having to know anything about the underlying table structure.

I like your statement:

"A framework should be a tool to save writing repetitive code, rather than a tool you use to avoid understanding 'what lies beneath'."

Seconded. It's so powerful a statement that it perhaps deserves its own topic. I boldified it, if you don't mind.

This is a quote in ASP.NET 4 Unleashed, page 935, that chapped my hide:

"A huge amount of developer time is spent performing brain-dead, tedious translations between the object and relational universes. I cringe when I think of the number of hours I've spent declaring classes that contain one-to-one mapping between properties and database columns. This is time I could have devoted to going to the park with my children, seeing a moving, walking my dog, and so on."

"LINQ to SQL promises to finally enable us to put SQL to a well-deserved death. Or more accurately, its promises to make SQL a subterranean language that we never need to interact with again. (SQL is plumbing, and I am not a plumber.) This is a good thing. Death to SQL"

(end quote)

Spot on. SQL is, at best, an unpleasantly-exposed low-level protocol for communicating with certain DBMSs.

And verbose period-trains with tons of post-fix set/get's is "high level"? That's progress? I'm all for improvement, but those things are not it. SQL is a higher level of abstraction than OOP in many ways, not lower.
Look out, your StrawMan is on fire.
I am resisting being drawn into a flame-war. (I mellowed out my prior language above a bit.)
Also note that I am not against "helper" utilities for the repetitious parts of SQL generation, but am against trying to wrap it away out of view (HelpersInsteadOfWrappers).

My solution would be to make app languages more like SQL (or a better query language), not SQL more like OOP. LINQ is a partial step in the right direction, but it cannot go the whole way because encapsulation inherently prevents it. RelationalBreaksEncapsulation. Yes, one could break encapsulation to "escape" into collection-orientation land with enough tweaking, but it will lead to GreencoddsTenthRuleOfProgramming. --top

Absolutely, we should "make app languages more like ... a better query language", as in (for example) TutorialDee. LINQ is also a step in that direction. Your bit about encapsulation, however, is utter nonsense.

If you like the ADA family of languages. If it's nonsense, you haven't articulated clearly why.

It's nonsense because "Relational" doesn't break encapsulation. DateAndDarwensTypeSystem demonstrates that inheritance, polymorphism, and encapsulation can all exist effectively in the context of a pure implementation of the RelationalModel.

Not as OOP is currently being used in most shops. Further, I vote to separate the type system from the relational engine (DoesRelationalRequireTypes). I know others prefer a kind of GodLanguage approach that integrates everything, but that prevents mix-and-match progress, experiments, and sharing in my opinion. Let's not rehash those debates here, just offer links for those interested in those debates.

You know how OOP is currently being used in most shops?

As an informal observation, yes. I've taken no formal survey. If your personal observation suggests otherwise, you are free and welcome to state so.

My personal observation suggests otherwise. Furthermore, my personal observation here is that except for rare and brief rays of lucidity, most of your "observations" are heavily coloured by what you wish was true, rather than what is true.

You mean my observation that OOP is more popular currently than data-driven techniques really means that deep-down I prefer OOP? This makes no sense. Anyhow, we are all subject to bias, bias being a human trait. Thus, you are not immune. CommonSenseIsAnIllusion.

Huh? How could you possibly derive that from what I wrote, and where have you acquired the apparent strange (and incorrect) notion that OOP and data-driven programming are an "either/or" relationship?

I'll get back to the "bias" discussion another day. As far as either/or, it is because they have fundamentally different underlying philosophies and base assumptions. In short, RelationalBreaksEncapsulation.

Utter nonsense. You are viewing the world through a foggy, distorting haze of your own peculiar biases and wishes. RelationalBreaksEncapsulation is a dire soup of misinterpretation and misunderstanding except for the clear bits written by yrs trly. As I noted above, DateAndDarwensTypeSystem demonstrates that inheritance, polymorphism, and encapsulation can all exist effectively in the context of a pure implementation of the RelationalModel.

Most OOP shops make the so-called FirstGreatBlunder, and this creates many of the mismatch problems. I am approaching this from the perspective that I am one of many developers, probably with zilch rank, and that I likely could not change the practice of FirstGreatBlunder. Thus, my statement is essentially correct in such an environment.

The fact that some shops code badly does not diminish my point -- or make your claim correct -- by even one iota.

Technically you may be correct because with enough indirection everything is everything. One can write a Fortran compiler in Cobol, and a Cobol compiler in Fortran, etc. However, that says little about the practical benefits of doing things a certain way. We appear to be running into the same kind of "idealism wall" we had in the bags-versus-sets battles where I approach it from what an "average" developer faces in terms of the work environment and jurisdiction/control versus what "could be" if the whole world was overhauled and rebooted. Being technically correct does not make you "practically" correct. If you want to claim a technical victory, be my guest. However, that won't change the reality of available choices of developers in the trenches who may not be able to change certain practices in an org. This wiki should welcome both perspectives. Ward never claimed it was a theory-only wiki. -t
Huh???
Where does the "huh" start? One can "bend" OOP to have features of relational and vice verse with enough programming such that technically they can each gain features of the others. However, paradigms are not about "features", but organizational philosophies. GrandParadigmUnification gives examples.
The entire paragraph, and your response immediately above, is rambling nonsense. You're just making noise.
If I'm schizoid, I'm not aware of it. It reads just fine to me. The problem may be you. Anyhow, if you point to specific words or phrases, then perhaps I can use a form of StepwiseRefinement to explain it. (I've adjusted it a bit since. Is it better now?)

I'm also skeptical that DateAndDarwensTypeSystem would be practical, but that's secondary.

I'm giving your skepticism, accompanied by no evidence whatsoever, precisely the weight it deserves: None.

Can you point to a successful production system that uses DateAndDarwensTypeSystem?

Depending on your definition of "successful" and "production", yes: The RelProject is the first successful implementation of the majority of the version 3 specification, though Alphora's Dataphor was apparently strongly influenced by an earlier version. The professional DBAs to whom it has been demonstrated have applauded it for eliminating accidental erroneous joins between unrelated types (e.g., invoice number joined to customer number) and for trivially handling multiple units in a single type. E.g., a LENGTH type can handle both metres and feet, or a TEMPERATURE type that can represent Fahrenheit, Kelvin, and Celsius. Programmers to whom it has been demonstrated have applauded it for supporting MultipleDispatch and workable specialisation by constraint.

I didn't mean implementation of a language that supplies the concepts, but rather a production domain application built with such a language. That's where most of the complaints about PerniciousIngrownSql come from.

I am aware of two such applications. One is used for birdwatching. The other was part of an evaluation comparing TutorialDee to SQL done by one of my students, based on a real-world scenario I supplied. In the latter, DateAndDarwensTypeSystem worked effectively and resulted in an implementation that was subjectively considerably more readable (hence, more easily maintained and perhaps less likely to result in user/developer error) than the SQL equivalent.

As far as joining invoice number to customer number, those kinds of bugs are usually caught early in testing of an app. It's "almost right" implementations that tend to cause the biggest bugs. I suppose it could be useful for ad-hoc querying for one-off requests, although it would have to be weighed against the "bloat" of most type systems. Further, a built-in natural join mechanism or "common join catalog" (AlternativesToNaturalJoins) could also prevent problem joins.

Any so-called "bloat" will be offset by the clarity of the schema design, and the elimination of any need for a "common join catalog".

But how is customized column types related to breaking encapsulation? It's not really a problem at the column-level.

There is no breaking of encapsulation. That's my point. Encapsulation, however, is only meaningful in the context of column types. It has no meaning at a relation or tuple level unless you're deliberately trying to make the FirstGreatBlunder, and even then it's questionable.

Most shops want stuff to work effectively under FirstGreatBlunder conditions. Like I already stated, an individual may not have the ability to overhaul usage of FirstGreatBlunder. Thus, how about we explore it under the assumption that FirstGreatBlunder is being done. -t
Your claim was that, and I quote, "LINQ is a partial step in the right direction, but it cannot go the whole way because encapsulation inherently prevents it. RelationalBreaksEncapsulation." That's nonsense and I've proven it. Your mumbling about what you think "most shops want" is utterly irrelevant.
[And besides, what more can you say about it than, "If you work in an environment where encapsulation is required to be broken, then encapsulation will be broken."?]
Your "proven" is a UselessTruth, as described above in the "fortran = cobol" discussion above. And "broken" is a continuum. Features/implementation hidden to some components may not be hidden for others.
[How is it useless? It rather clearly demonstrates that your claim that encapsulation is necessarily broken by relational is false. That's a use, and it's the one that's important here. No, what's quite clear is that RelationalBreaksEncapsulation is a UselessLie. BTW, if you want to demonstrate that something is a continuum, you might consider using like a continuum instead of using it like it only had true/false values.]
Re: "and that's the one important here" - I have to disagree. I'm generally talking about practical issues here, not theoretical ones. I already gave you credit for a theoretical victory and am now moving to the practical issues. Please clarify your continuum statement. -t
[How much more "practical" do you want than real, concrete implementations? That's not just theory. It's been implemented, and is in use. Since it's been implemented, the claim that encapsulation is necessarily broken by relational is false, both in theory and in practice.]
- Like I said already repeatedly, a given shop would likely not want to switch languages and/or base practices at this point in software engineering history. If you don't want to explore that scenario, just say so. I'm not forcing it on you. Although I'm working on it, my browser has no "force him" button so far. (I think I'll finish the "Bop Head" button first.)
- [You can explore it all you want, but it won't have much to do with your claim.]
- Your version of the claim is divorced from reality.
- [So now you are saying that "LINQ is a partial step in the right direction, but it cannot go the whole way because encapsulation inherently prevents it. RelationalBreaksEncapsulation." is divorced from reality. I'm glad we've cleared that up.]
- I wouldn't use the word "inherently". (If I did in the past, I misspoke.) I'm merely saying trying to make relational look too much like OO and vice versa can get messy and ugly from a practical standpoint. More specifically, it takes on the lowest-common-denominator of the two paradigms, making bad OO and bad relational at the same time. -t
[Re: the continuum statement - Your use of broken used it with true/false values. It was either broken or not, nothing inbetween, nor would the language you use allow for something inbetween. That rather dramatically undermines your claim to it being a continuum.]
Nothing is 100% encapsulated because one can break into the disk or RAM cabinet and sample raw bits directly in the chips/wires. Encapsulation is a UsefulLie, but never absolute (barring some new encryption technology). It's relative to a given perspective. Similarly, we may use a certain abstraction layer in say 95% of our application, but go around it 5% of the time for say performance reasons. In HelpersInsteadOfWrappers, it's also suggested that the repetitious grunty parts of interacting with SQL, HTML, and other sub-languages can be wrapped via API's, but without the expectation that we are always hiding it away. -t
[Encapsulation isn't broken because one can get at the encapsulated stuff. (One can have encapsulation by convention.) It's broken when one does get at the encapsulated stuff. So, if you've "gone around" the encapsulation 5% of the time in your application, you've broken encapsulation. You can't change what was encapsulated without breaking other parts of your application.]
Convention can make just about anything "encapsulated", even machine language.
No, convention cannot make "just about anything" encapsulated. However, machine language can employ encapsulation by convention, and it is good practice to do so.
Anyhow, I'd rather deal with specific scenarios rather than try to process and parse generalities here because what-if's can expand quickly.
I've provided specific scenarios. You've pulled numbers out of your ass, such as your "95%" and "5%" a few bullet points up from here.
Where, I haven't seen them. And of course the numbers are "made up"; they are part of an example. Please read more carefully before doing your rude Asperger impression.
[Try reading this page starting with the text "I am aware of two such applications.".]
One can claim they played leapfrog with a unicorn, but if the reader cannot verify and inspect the existence and nature of the alleged product, it does very little good for your case. I'm not outright dismissing anecdotal statements, but it's not what I was asking for above. Readers would like to see a semi-realistic code-inspect-able app that can apply to their own domain, along with references to the ways and spots it makes for better design and maintainability than a typical or common alternative, such as a FirstGreatBlunder design or OO-lite design.
[Why would the readers want to see that? None of the disputed claims were that it made "for better design and maintanability", so why cloud the issue by going into that here? Anyway, you asked for "production domain application built with such a language". You got them. Now you are claiming you want "a semi-realistic code-inspect-able app". Could you make up your mind?]
Inspect-able code is implied. Anecdotes are a dime-a-dozen.
[Really? Then why didn't you ask for inspectable code at that time those examples were given instead of waiting till now? It's quite clear that neither of us picked up on that implicit requirement. BTW, you weren't given anecdotes, you were given real production systems that use Date and Darwen's type system.]
- I meant domain applications, not SystemsSoftware implementations. I suppose I should have spelled it out, but thought we've been down similar roads enough times that it didn't need stating. Further, SystemsSoftware generally will not typically use SQL directly (as a current practice), and thus won't face PerniciousIngrownSql to begin with. -t
- This is the first time I've heard a birdwatching application described as "systems software".
- What, I thought reporting and CRUD were "too narrow" to serve as a good test ground, per below. My enemies cannot make up their/its mind(s). Anyhow, I'd like to see more details on your......bird CRUD. -t
Indeed. I'm happy to provide TutorialDee examples, but here is not the place. The original claim that I disputed was RelationalBreaksEncapsulation, and I believe it now adequately proven that the RelationalModel can support fully encapsulated types. DateAndDarwensTypeSystem is proof. Subsequent discussion was mere deviation and distraction; certainly nothing that warrants code samples. Top, if you still feel RelationalBreaksEncapsulation, you are free to make the case by writing a critique of TheThirdManifesto and DateAndDarwensTypeSystem.
- See note above about SystemsSoftware. As far as "deviation and distraction", I have to disagree. I'm simply focusing on practical issues and you wish to focus on theoretical issues. Neither perspective is "wrong" or "deviant", just different. If you don't wish to discuss the practical side, just say so and move on rather than insult the other scenario/path as objectively wrong or deviant or off-topic.
- You're quibbling in order to sustain an argument, not discussing the practical side. You've no apparent interest in a dialogue, only in sustaining your narrow views -- informed (it would appear) by nothing more than developing business reporting applications -- as general principles.
- Projection. "Narrow view" is ridiculous in light of the fact that probably more than 99% of shops run and create FirstGreatBlunder systems. If 99% is "narrow", then I truly am as delusional as you claim and have a busted-up brain. (Again, I'm not necessarily condoning FirstGreatBlunder, only looking at ways to better live with it.) -t
{Fully encapsulated types, eh? Okay, tell me then how relational allows me to fully encapsulate foreign keys. Because, after all, what OO 'fully encapsulates' is relationships between independently mutable entities.}
The phrase "fully encapsulate foreign keys" is meaningless in the RelationalModel. In the relational model, foreign keys are a particular kind of database constraint. They are not type constraints.
{Oh? So you're saying this form of encapsulation doesn't count because RelationalModel cannot express it? No, no, that's just wrong. RelationalBreaksEncapsulation because that form of encapsulation is "meaningless in the RelationalModel." After all, it is NOT meaningless in other models. Indeed, there is even a security model based around encapsulation of relationships: ObjectCapabilityModel. RelationalBreaksEncapsulation and CantEncapsulateLinks are both comparing relational to OO, and so it is the forms of encapsulation utilized in OO that matter in an argument refuting those claims. DateAndDarwensTypeSystem is useful and flexible for similar reasons that pure, HigherOrderFunctions and corecursion are useful and flexible. But it does not allow you to protect shared relationships, e.g. where you can talk to Alice and Charlie, and Alice and Charlie can talk to Bob, but you cannot talk to Bob. It does not support what OO developers would recognize as being objects, and certainly breaks the encapsulation that an ocap system would depend upon.}
Encapsulating relationships between RelVars is an odd notion. DateAndDarwensTypeSystem certainly does support what OO developers would typically accomplish using objects, but strictly employing immutable value semantics (see ValueObject). Applying ObjectCapabilityModel concepts to the RelationalModel is peculiar at best; it requires alternative security models.
{The value judgement of whether such encapsulation is a GoodThing, a BadThing, or an OddThing seems irrelevant, logically, to the argument about whether relational supports its expression. And your other statement would be less misleading as: DateAndDarwensTypeSystem supports a small but useful subset of what OO developers would accomplish using objects - in particular, the subset that can be accomplished with pure, immutable ValueObjects.}
The RelationalModel encapsulates relationships at a database level, where a database consists of a defined set of relvars and constraints on those relvars. As for your supposedly "less misleading" statement, I agree, though that was due to carelessness on my part. I should have written "DateAndDarwensTypeSystem combined with the RelationalModel certainly does support what OO developers would typically accomplish..." It remains to be formally and/or empirically determined whether that combination encompasses a small or large subset of what OO developers would accomplish using (typical) objects, but I suspect it's the latter.
{RelationalModel doesn't encapsulate relationships at all. It exposes them. You could encapsulate relationships to a database (a fixed set of relvars) if you forbid surrogate keys from being returned in a query, but that is neither a property of RelationalModel nor of the traditional RDBMS.}
Information hiding is a notion separate from encapsulation, though they are often -- but not necessarily -- found together.
{Wrong. Encapsulation of relationships is data encapsulation. Data encapsulation is information hiding.}
"Encapsulation of relationships is data encapsulation." Huh?
{Look it up. Or are you denying that relationships are data?}
I keep telling you guys, with enough indirection, everything is everything. -t

But a practical buggaboo at the column level is how does RDBMS type info "get into" an existing app language? This is one reason why "type dumb" scriptish languages seem to work better for RDBMS interaction.

The canonical<->custom type value conversions that must occur between the RDBMS and client language are trivially specified. DateAndDarwensTypeSystem defines explicit mechanisms by which values in a given type may be instantiated (or "selected", in Date & Darwen's parlance) from values of more primitive types, or by which the explicitly exposed (i.e., not violating encapsulation) component values of a given type's value may be extracted. That said, research is being done into converting RDBMS type definitions into native client language definitions, in a manner not unlike the object broker mechanisms of CORBA.

Meanwhile, PerniciousIngrownRun-onORMBuilder-NotationCalls are now widely considered a "GoodThing".

JanuaryEleven

CategoryAntiPattern