Database Abstraction Layer

A DatabaseAbstractionLayer is an API layer that attempts to abstract from details of a particular database or query language.

The following discussion of pros and cons was originally at SqlStringsAndSecurity:

Lots of code is not a free lunch. More code is associated with more maintenance cost, more errors, et cetera. Generally, it depends on the nature and audience of the application. For small intranets, parameterized queries are often overkill.

First, it's not lot's of code, second, parameterized queries are never overkill, they're the simplest thing that works, concatenated sql doesn't work, it leaves you open to attack.

If you work hard not to ScatterSqlEverywhere then the problem on this page mostly disappears. --francis

Oh boy. I smell a HolyWar. SQL is a high-level language. Wrapping it into lower-level API's can make for more cluttery, verbose busy-work layers and is generally anti-YagNi. See ExpressionApiComplaints for some counter-arguments.

It doesn't matter what level it is. Access to external services (printers, thermometers, databases) should be localized to manage dependencies. In the case of small intranet applications, that shouldn't be too hard. Parameterized queries are cheap and easy. -- EricHodges

Databases are not the low-level service, OO is. Wrap the OO away to hide its ugliness, not the database. With a good DB-centric design, the DB is integral to the system design, not a far off planet. Sometimes the DB portions do more real work than the imperative/OOP app code. A comparison to printers is naive. (More on "external" below.) [Added to paragraph post-replies.] -- top

It doesn't matter what level it is. The database is an external service used by the application, just like it uses a printer, a thermometer, a disk drive or a display. The database is a server and the application is a client. It's good practice to localize access to external services. It makes it easier to swap them out, deal with upgrades, etc. This is basic dependency management. -- EricHodges

I disagree. See DatabaseVendorLock.

You disagree that access to external services should be localized? Or you disagree that dependencies should be managed? I see nothing on DatabaseVendorLock that argues against localizing database accesses, except where you say the chance of changing vendors is small. The chance of changing vendors is 100% when you write code to run with multiple RDBMSs, and even if you stay with the same vendor you can be sure they will release new versions. -- EH

[They are both high-level, that's not the point, within the application layer, OO is easier to use than a bunch of resultsets, especially in compiled languages, and parameterized queries are simple and easy to do, no reason to execute dangerous raw sql.]

Well, at least we seem to semi-agree on something: Data-centric development is harder under compiled languages (I am a dynamic/type-free fan). But, we don't need OO to "protect" code from raw SQL, as shown above. Thus, there may be a false dichotomy here somewhere. -- top

[[OO is not quite, but almost, orthogonal to this issue.]] Comments only relevant to the original context in SqlStringsAndSecurity snipped.

Okay, lets agree to not bring OO versus procedural into this discussion. Confer? -- top

Let's leave OO out of it entirely. It has nothing to do with OO. No matter what paradigm you use, it's a good idea to isolate access to external services.

"External" is relative. I view the database as a PARTNER to the software code, not an external service. None of the issues here has to do with "externalness" anyhow, so please stop using that word. If the SQL or query engine was built into the language, similar issues pop up. String queries share many things in common with RegularExpressions, yet RE are not necessarily "external".

And yet they are isolated in their own class and/or package. Let's agree that if the database is not built into the language and is an external service then access to it should be isolated. -- EH

Regular expressions are "isolated"? Anyhow, your rule usually fails DecisionMathAndYagni in my observation. If your experience differs, then let's just AgreeToDisagree.

Yes, regular expressions are isolated in their own class and/or package in most languages (Perl is a significant exception.) How do you manage dependencies on external services in your own code? Do you create a "zone of shear" that can be modified when the external service changes? -- EH

Do you create a special module for accessing the file system? Or do you sprinkle interactions with java.io.* all over your code?

java.io is the special module for accessing the file system. If and when the file system changes the impact is isolated to java.io.

[[That does not directly answer his question, yet I think that a direct answer would be beneficial.]]

I intend it as a direct answer. java.io is a perfect example of a package that provides a "zone of shear" between applications and external services. It's at a different level of abstraction than prepared SQL statements, but the same rules apply. If you want to insulate your application code from changes in external services, build a zone of shear between them. -- EH

So is java.sql and javax.sql.

[[Having calls to java.io sprinkled throughout an app is a bad idea in the same way that having calls to java.sql is.]] How about having calls to java.lang.String sprinkled throughout the application?

Different levels of abstraction. java.sql isolates your app from SQL (well, it ought to). A class containing prepared statements for a specific table isolates your app from that table. A class that generates prepared statements in different SQL dialects isolates your app from different SQL dialects. -- EH

Of course different level of abstraction. SQL is a much higher level of abstraction, than i/o or most OO apis for that matter. Hiding a more powerful abstraction behind a less powerful abstraction is not always a good idea. As far as the dialects go, it is rarely an issue worth considering. Writing a class that generates different SQL strings for different dialects is for a lot of projects just a waste of time.

Again, it doesn't matter how "high" the level of abstraction is. The forces and patterns are fractal. They can be applied at any level of abstraction. Considering different SQL dialects is always an issue worth considering if you know you're going to support multiple SQL dialects. The person who started this page knows he wants to validate SQL statements before executing them. Therefore using prepared statements is a valid solution and wrapping them in a module will make maintaining them simpler.

You are blowing out of proportion: 1) the differences between SQL implementations, 2) the likelihood that a project needs to support several databases, 3) the importance of dealing with 1 and 2 upfront rather than when and if is needed.

I'm just using SQL dialect isolation as an example, perhaps because I'm in the middle of doing just that. I'm not claiming all applications need that. I do believe that any application that is expected to survive more than a year will need isolation from RDBMS vendor upgrades. -- EH

More than a year? Again, I think you are assuming a rather high database churn rate/probability. Perhaps the nature of your resume experience leads you into database conversion and/or cross-platform DB projects. Thus, your perspective may be tilted that way.

I'm working off of Oracle's release history:

1997 - Oracle 8
1999 - Oracle 8i
2001 - Oracle 9i
2003 - Oracle 10g

For the last 7 years they've released a new version every 2 years. If an application is supported for more than a year it has a better than 50% chance of being in production when a new version of Oracle is released. -- EH

And for the record, here's Microsoft's SQL Server release history:

1995 - SQL Server 6.0
1996 - SQL Server 6.5
1998 - SQL Server 7.0
2000 - SQL Server 2000

Great. Now give me a SQL string that is a valid statement in Oracle 8 (or Oracle 7.3 for that matter) and is not valid in Oracle 10. The same about SQL Server.

"SELECT column FROM table WHERE REGEXP_LIKE(column, '[^[:digit:]]')"

That uses Oracle 10g's new regex feature. It isn't valid in Oracle 7, 8 or 9. SQL Server 2000 added bigint, sql_variant and table data types, INSTEAD OF and AFTER triggers, ON UPDATE and ON DELETE clauses in the REFERENCES clauses of CREATE TABLE and ALTER TABLE, etc. -- EH

Maybe you should exercise more care in reading. I asked you for a string that was valid in a previous version and was later made invalid by a new version. You have given the opposite situation which is uninteresting. Bottom line is that both Oracle and MSSql and Sybase and IBM and whoever else maintain a strict backward compatibility. Therefore an application that runs on Oracle 7 can run on Oracle 10g no problem, and you can take advantage of new features at your own leisure when and if necessary. --Costin

Why are you opposed to this, Costin? -- EH

How is wrapping going to help in this situation? Also, does it make sense to wrap 1000 statements in case 2 change? Would you like to buy some meteor insurance?

Wrapping helps by creating a zone of shear. It makes it easier to change the way your app talks to the database by routing all access through a common set of methods, classes and packages. It takes very little effort to create, so the cost is usually worth it. You have to put those 1000 lines someplace; you might as well put them in their own package. It also makes it easier to refactor the common aspects and reduce the amount of code involved. Why are you so opposed to this idea? What's the advantage to distributing them throughout the code? -- EH

Because it separates (pushes apart) related business logic and creates more code and more unnecessary indirection (layers). I guess I lean more towards YagNi than you. Sure, code generators and copy-n-paste might make it relatively easy to create all the formality in the code, but just because code is easy to write does not necessarily make it easy to read or navigate.

You mean you don't like it because it adds a method call? Think of the method call as a chance to comment the intent of the SQL statement. That makes the code easier to read. I don't use code generators or copy and paste. I use a refactoring editor. I don't worry about the performance of extra method calls because my compiler is very good at optimizing them away.

[[This does seem to be a matter of whether YAGNI applies or not. Costin has been making some very astute observations that support YAGNI here, but I think that it nonetheless doesn't apply, because when this stuff is scattered throughout the code, if it ever needs severe changes, then it being scattered will quite sharply raise the cost of changing it -- so much so that the cost alone will discourage the changes (including switching to a different vendor), which is not in the spirit of YAGNI at all. But if instead it is gathered up into a module, you don't have to worry about breaking unrelated code, and it's much easier to see the scope of what needs to be changed without accidentally leaving anything out, and it's much easier to unit test the changes.]]

[[The short version: following YAGNI in this case will lead to needing to change every module in the system if a big change is ever needed, and even though Costin doubtless can do such a change to e.g. 150 files all in a day's work, on the other hand, most companies panic when they see someone check in 150 files all at once, and will often insist on general principles that the change be undone because it looks too sweeping.]]

[[Having said that, on the other hand it doubtless depends on the nature of the codebase. If you've only got 5 files and 1KLOC to start with, then perhaps YAGNI does apply.]]

[[Similarly on the question raised above as to whether direct use of Java strings everywhere is appropriate: no, not in general, because "string" is a concrete data structure choice, and in the general case that decision should be encapsulated behind an abstract facade. But obviously there are many exceptions. -- dm]]

In response to Eric: why I am opposed to this? To begin with, I corrected some matters of fact. New versions of a particular DBMS rarely are a justification for massive SQL rewriting. Packaged application that need to run on different DBMSes are a rarity (it is worth mentioning SAP that does run on several DBMSes, but then SAP has a whole subsystem and a language for mapping tables and stored procedures with screen editors and business rules and whatever else. However it is clear that the economies of scale for SAP do no apply on most any other project. On the other hand I have internal knowledge of other big applications like Oracle Financials, that don't give a dime about making it portable across different DBMS vendor. Your mom and pop, custom made project, even if for a big enterprise would better focus on the situation at hand. Once the decision is made to go with Oracle it is unlikely to ever be reversed to go with MS SQL Server. Therefore wasting effort to program for the general case is very rarely a good investment. On the other hand, making applications DBMS dependent, has a cohort of advantages like better performance, better data modeling options for example you can implement polymorphic references (foreign keys) in most DBMSes using vendor specific SQL extensions,but you cannot do it generically, simpler and more elegant SQL code etc.

As to the pressing question where to put the damn SQL strings? Let's say you have 1000 of them. Putting them in one centralized "module" (package, class, resource, whatever ) is hardly a good choice. It breaks locality of reasoning, to begin with. It may slow down development, and doesn't buy you anything for real.

I'd put them close to where they are used. Typically module owners have access to a restricted set of tables. For example the shipping module deals with the a few tables related to shipping. That module should host most of the statements related to those tables, and typically all statements that perform updates. Like DavidParnas teaches us very clearly, a module is a work assignment for one developer (ahem, two in case of pair programming). Why maximize intermodule dependencies then? What's the purpose of having a module containing all queries? Who will be responsible for that module?

The purpose could be to have all queries available for whoever wants to ask. Although I doubt the usefulness of such inquiries. If there's a modification to be made to the shipping tables, than the guy who's responsible for shipping (collective ownership included) is the guy to write the modification. So if (huge if) changes are ever needed because an incompatible SQL dialect has popped out of the blue, then it is obvious that changes should not be made by one persons, but by all persons responsible for individual modules that are issuing queries. Voila, division of work and parallelism at its best. For the purpose of reviews, it's very simple, Eclipse search, or in case of other languages tools like grep are your friends . You'll get a list of methods making calls in no time. For the purpose of security instruct your developers to only use prepared statements, or else. There's no extra magic that you get for free when you crowd all your SQL strings into AllDatabaseQueriesAreDefinedHere? module. To appease the "true believers", you can have a compromise that each module will have a (convention name goes here) file/class whatever that defines its queries, but again, this is just for everybody to save face. Pragmatism always trumps fundamentalism in programming.

We have very different philosophies about code organization and ownership. Modules aren't work assignments, they are logical groups of classes. There is no single person responsible for the shipping tables. Everyone on the team is equally responsible for them. And we have different experience with database vendor lock. 3 of my last 5 employers refused to be locked into a single database vendor. I brought up the same point over on DatabaseVendorLock. -- EH

Note that CompaniesHireLikeMinded. Companies that want to be double sure they are not locked into a vendor may be more likely to hire someone who has done or mentioned vendor wrapping before, for example.

We have very different philosophies indeed. However, you are quick to pretend that your particular philosophy is the way of developing software, without bothering much to substantiate it other than "your experience". It happened again on this page. As to what modules are you are kindly asked to review your SoftwareFundamentals. "Logical group of classes" tells you very little. What's so logical about grouping unrelated SQL strings in one module? Such logic is only in the eye of the beholder. When you assign somebody a task, however, it's a different ball game. I'm not going to repeat DavidParnas argumentation here.

My experience is all I have. That's all any of us have, even if it's the experience of reading DavidParnas. I don't group unrelated SQL strings in one class. I group related SQL strings in one class, and I group those classes in a package. When I assign someone a task they know exactly where the database interface is. -- EH

Well, there are experiences and experiences. Reading what DavidParnas has to say can be an enlightening experience. I highly recommend it.
Comparing arrays to a database is apples and oranges. If I only had Parnas' tools, I probably would have concluded the same thing. A database is a protocol, not an implementation any more than classes or API's are an implementation because they are a specific paradigm. See WrappingWhatYouDontLike.

As for database vendor lock, there's nothing much to debate. The database vendor turnover rate is minimal, and this can be verified with statistical data from institutions that research DBMS market. That some managers have been deluded into "database vendor independence is good regardless" marketing slogan of lately has little consideration in comparison to actual data. How many of those guys did actually move from one database to another, and of those who did, what were the objective technical reasons (because the economical penalty is high regardless of the development strategy).

All of them moved from one database to another. Two of the three were selling software that worked with multiple RDBMSs. The other was trying to switch their in-house applications from one RDBMS to another. One of the employers that didn't mandate vendor neutrality later wished they had. None of this was based on marketing slogans. All of it was based on simple economics. -- EH

Sliding Layers - General Problem

The issue about DBMS version changes is not specific to SQL. It's part of a bigger issue. Any time a software product uses components from different vendors or different "stack" parts, there is always the risk of change conflict. And wrapping it with an API often does not solve it because many changes also require interface changes, and/or changes to the wrapper kit. It may be a GUI system, a file system, a network, a content manager, etc. Maintaining and coordinating with a middle-man "shear" layer in between is not free. The trade-offs must be weighed, and as usual, I view the process of weighing the trade-offs through the eye of economic and finance-like metrics. Generally I recommend only wrapping the portions likely to change in the future rather wrap the whole thing up-front. --top

Pragmatism trumps, yes, but that's different than pure expediency. Practices and methodologies and patterns etc arise (ideally anyway) out of finding pragmatic solutions for real world problems.

So I am not following your reasoning completely here. There seems to be a big disconnect on one particular topic, which I don't see being addressed head on. I mentioned abstract facades. Would you litter linked list manipulation all over the code base? I would not, because it exposes an implementation choice. Better to encapsulate. Surely you agree on that?

You are very unclear here. I wouldn't litter all the code with while (p= p->next), no. But this is not a valid analogy. The abstract interface is your SQL language and database client libraries (JDBC or whatever). It can be a very powerful abstraction that doesn't need to be encapsulated by lesser abstractions. Domain-specific wrappers to reduce repetition or "encapsulate" a concept is fine by me, by the way. -t

But if it applies to linked lists, why doesn't it apply to SQL, which is just another implementation choice? It's not the only possible relational language -- obviously it is the pragmatic choice 99% of the time, but that's not the point. It does sometimes happen that SQL/RDBMS turns out to be overkill, and their use gets backed out in favor of some more trivial persistence solution.

The important thing to remember is that SQL DBMSes are not necessarily "persistence solutions".

Also you said locality of reasoning is broken. I don't see that. If you replace sql_exec(blahblah, "SELECT * FROM....blah blah") with "FindAllRedCars?()", surely locality of reasoning works as well or better with the encapsulation?

{In my opinion, it is unnecessary formality that just clutters up code if it is only being called once. A similar debate can be found in LongFunctions.}

If you put this function in the same file , or in the same module, maybe it's almost just as good, as SELECT * FROM ... blah blah", but if you put it however into the "all database interactions go here module", I just don't see it as a strategy that buys you anything.

Another thing: companies do change database vendors sometimes, I've seen it happen and been involved with it sometimes (having been a contractor for decades helps see a very wide breadth of practice in the field). I don't know enough to say how frequently it happens -- probably not too very often -- but you can't say it "never" happens (you said "unlikely", but the general context seemed to have the gloss of "never").

Companies need to know that changing the DBMS from under a project involves big costs regardless. And they also need to know that nowadays there's almost never an objective engineering reason to switch from SQL Server to Oracle or the other way around. Therefore the effort put into making the code DBMS vendor independent will not pay itself in a majority of cases.

Oh, and as for Eclipse: that's probably getting too far afield, but any discussion that assumes that everyone is using a refactoring browser is...well, that's a rather optimistic assumption, and at the very least, needs to be made clear as a cornerstone of the conversation. Most of the industry still does not use such things. And it can change the conclusion of many arguments whether it is assumed or not. -- dm

Well, if you program Java you don't have a reason not to. If you program C++, you still have grep, and that works pretty reliable. If you still are not comfortable, you can temporarily remove dbtools.h, oci.h (or whatever you might be using) and you'll get a nice list of all the places that interact with your database. --Costin

Sure, but you shouldn't have too, you should have a layer for all that database stuff, a layer separate from the layer containing business logic and classes. The database layer won't be one big class either, it'll have many, hopefully organized in some logical manner. It should be simple to know where all the code is because it should be organized logically in the first place, you shouldn't have to grep for it, or search for it at all.

Do you mind if I ask you, what is so "logical" about organizing all database access in one module? And whom do you put in charge of that module?

[Do you mind if I answer you? It's logical to organize all database access in one module because it isolates database dependencies in that single module, the same way it's logical to isolate dependencies on any external service. I put the team in charge of that module. -- EH]

Does it also reduce the number of dependencies?

[Moving the dependencies doesn't reduce their number. Sometimes moving them makes redundant code more obvious, but that's a side effect.]

Does it make the interface between a module and the database worse (in the sense of lower level of abstraction, less stable, etc.)?

[No.]

I rather do not take your word for it. All the people that tried that failed one way or the other to provide an equally powerful abstraction as SQL is.

[I don't try to provide an equally powerful abstraction. I only abstract my use of SQL. Why would moving the SQL statements make the interface worse?]

You just moved some bytes around to no good avail. Whenever something about say, cars changes, now we have to change 2 modules, the module about cars, and the module containing database query strings.

[I don't understand the scenario. If the way the app talks to the database changes, then we only change the database abstraction layer. If the way cars are used in the app changes we only change the app code. If both change then we change both, but we'd have to change all that code no matter where we put the database access methods.]

Therefore where you can say that it "isolates", I can also say that it "entangles", because you put together in one "logical" unit, things that are logically unrelated. So, is it more likely to change the database tomorrow, or is it more likely that the guys that deal with the cars table needs to modify one or the other query?

[Once again, I don't put things that are not logically related in one logical unit. All of the database access for a specific view or table goes in its own class. All of those classes go in a package. When someone goes to change a query he can see all of the other queries on that table or view and he may realize that the query he needs already exists. If the queries are distributed throughout the app that's much less likely to happen.]

That we should isolate dependencies on an external service, is also not a definite piece of wisdom that applies regardless of the context. Again, take any random piece of software and you'll likely see pervasive calls to i/o libraries, without needing a central module that "encapsulates/isolates" all I/O requests. Why? Because I/O is very generic and pervasive thing.

[The I/O libraries encapsulate and isolate the I/O requests. In the same way, if you look at my software you'll see pervasive calls to the database abstraction layer.]

In contrast, searching for a good example of "isolating" dependency to an external service, we can easily find example like using a fax service, or a credit card processing service. Why? Because these are very narrow services.

So the more specific and the more restricted an external service is, the more likely that it is good to isolate that dependency. The more generic and the more pervasive it is (like I/O) the less likely is that "isolating"/"entangling" is a good idea. Now all is left to do is to observe that SQL+JDBC is very generic, high power abstraction, it is pervasive (in the sense that most functional modules or put it the other way around most use case interactions are going to touch it), therefore it is more to the I/O side of the spectrum than to the credit card processing service side.

[You're arguing from the viewpoint that I/O access isn't already encapsulated. I don't hold that view. It doesn't matter to me how "powerful", "pervasive" or "generic" an external service is. If I access it more than once I'll isolate those accesses. It's cheap, easy and it makes my life better. -- EH]

So do you or do you not have a module that encapsulates all I/O thingies on top and above of java.io just like you have something on top iof java.sql? In case you do have such a thing, please reference me a decent open-source software that has something similar.

[I do. For instance, I have a class that creates ZIP files containing images and an associated XML file. Look at Log4J for an open source example of something similar.]

And reading configuration parameters (if any) is going to use I/O intermediated by the same module that writes ZIP files?

[No.]

And if the owner of module X needs a subset of the database schema reflected in X1, X2, ... Xn, what is so bloody illogical for him to write his own queries and have them at hand (i.e., in his module)? Concrete justification required.

[You're speaking of module "owners", as if an individual owned some piece of the code instead of the team. If you work in that sort of environment, I guess you have to keep "your" code in "your" modules.]

'Yes, in most of places I worked and where my friends work, regardless of the XP handwaving on the subject, an individual has the primary responsibility for creating and maintaining a software module. That is ownership: it means responsibility. Now management 101 tells you this basic principle: When everyone's responsible, no one's really responsible. If everyone is knowledgeable about everything in a project, no one has in-depth knowledge about anything. Further XP unwarranted assumptions about the only glorious way software should be developed, and derisory comments about how other honorable people develop software is going to be regarded as trolling and safely deleted.''

Are your systems divided into modules by ownership? That's not very logical at all. I'd much prefer software where the modules were logically dived up into subsystems that make sense to the domain at hand. If Joe Bob and Billy both need to use the car module, then they should... rather than putting car queries into modules they own, just because they own them.

Regardless of how we divide them (by functionality, so that they can form a meaningful work assignment), the fact is that some person will take ownership (aka responsibility) for it. This happens in a majority of software houses.

[But it isn't that way where I work, or where I've worked for many years. If none of the modules are "mine" then I can't use ownership as a criteria for grouping behavior. Even if some of them were mine, I think there are much better criteria.]

If none of the modules are "yours", then maybe you've been working on rather trivial software. For most projects it makes no economical sense not to assign engineers in charge of a module.

[No.]

To further explain it, the layer for database access is already there and it's called JDBC in Java, ODBC in C, ADO.NET in guess what, and so on so forth. It is a high abstraction layer and it is pervasive. Just like programmers in Scheme, Ml etc, use a concrete list implementation together with standard functions on list (map,fold,concat, car, cdr, etc) without ever trying to hide the lists in their own module, so programmers of business application can use SQL as a high level and pervasive abstraction. Building a concrete "database access layer" with functions like getAllRedCars() is obviously going to reduce the level of abstraction, and you can see that immediately when you need getAllGreenCars(). You can further consult amusing mishaps in designing "database abstraction" layers in pages like EjbTernaryRelationshipExample.

[I used JDBC and ODBC. They abstract the RDBMS, but not tables or views. I create my own abstractions for those. If you have getAllRedCars() and need getAllGreenCars, consider refactoring them both to getAllCars(Color).]

So you're willing to recreate SQL in API form? Good luck.

[No.]

The alternative is very simple: the owner of the module that deals with cars, gets access to the car tables. He writes queries as needed to implement his functionality. These queries are integral part of the implementation of this module. Putting them in another module where they'll share the space with queries for donuts and queries for submarines, do not a proper module make. Furthermore, in the unlikely scenario that you need to change databases, it is the owner of the cars module that needs to change queries related to cars, the owner of the donuts modules that needs to change the queries related to donuts and the owner of the submarines module that will change queries about submarines. If you have been looking at a database schema lately you may have noticed that tables cluster nicely together in sub-schema that correspond to modules (subsystems) of the software. If more discipline is needed the DBA can even enforce that the car guy can only query car-related tables, and so on, so forth. Very logical.

That's not so logical. In the likely scenario that the schema changes, using raw sql pretty much guarantees that you'll break something, because you'll have to chase down all those "strings" to fix them, the compiler won't assist you at all. So you need the queries in one definable package, or subsystem, and separated from the business logic so it's easy to find and fix sql strings. Sql mixed with business logic and spread all over the app is a bad way to write programs because it's hard to change one without breaking the other if they're mixed.

?? Because ... ?? The reasonable conclusion should be the other way around, when you change the SQL code and it is out of the context that it serves, it is more likely to break that context.

Their change reasons are different, so they belong in different packages/modules. SQL needs to be wrapped and isolated from business logic precisely because within the programming language, SQL is nothing more than a string. Programs are more stable when making a schema change only requires and update to the data layer, if you don't touch the business logic layer, you're unlikely to break it. If SQL were a native language, it might be different, but it's not, and SQL is nothing more than a very fragile string.

You could pre-compile it if you want, or you can run your favorite unit tests. Putting all fragile strings into the big ball of mud, centralized SQL strings repository is not going to make them less fragile.

Learn to read, a package is not a big ball of mud, it could be composed of many well organized classes making it all very clean and logical and easy to maintain, and far less fragile than if it were mixed in with the business classes.

That alone is the reason the OO guys map objects to the database, it's rock solid reliable, and allows you to make drastic changes to both the model and the schema and not have to worry about breaking stuff. Any sufficiently large program that uses inline sql mixed with business rules is almost impossible to change drastically without breaking a ton of things, and not always knowing it. No one's claiming that you can wrap SQL and keeps it's expressiveness, we're claiming that wrapping it makes software easier to change and more stable, we'll trade it's expressiveness for some stability and ease of change, two things far more important to most programmers.

The above paragraphs has so many misconceptions and misrepresentation that is hard to begin with. it is not true that OO mapping of objects to database aka O/R mapping is rock solid. Results have been so and so, with notable and spectacular failures.

Bull. Fist, I didn't say it was the fastest approach, but it is a reliable approach, that's why it's so popular in the OO community. Secondly, you didn't address the main point of the paragraph, that schema change in those large apps breaks lot's of code when sql is inlined.

There's then a fallacy that the database schema changes and the application doesn't need to change. When a well designed schema changes it is because some business rules change. Then you may need to modify some app code and some SQL code.

It's not a fallacy, schema changes often happen that don't require any changes to the business rules, it could be for performance reasons, or normalization reasons, but schemas do change, and don't always require changing code, other than sql code.

One way or the other there's the same amount of change regardless of where you put your queries. The only question is whether you go to a known big place for all queries , or if you do a grep, or the most likely example, if the product table changes the guy responsible for the module that plays with product, he'll know where to make the changes, and we've just contained changes in one module. In any case, in order to "fix" any SQL string you need to have a picture of what's that SQL doing. And that picture belongs with the module that needs it. The guy who knows about submarines will know what to do with something changes about submarines. He should own the SQL code accessing submarine tables.

Then there's this wild claim that sufficiently large programs cannot absolutely be maintained with inline SQL. It is not inline SQL or SQL in one big ball of mud (ahem SQL strings) that is the problem, but rather with code modularity. If the module about submarines is accessing the tables about jelly beans, you've got yourself problems. Else, you've got mostly no big deal. There are lots of big software systems that I know of that have lots of SQL statements inline, some using (shock and horror here from OO bigots) SQL pre-processors that statically compile everything (or most of the things). They run fine and have been maintained years and years and years.

Didn't say they didn't run or couldn't be maintained, I said they are harder to maintain, and grow and that's not a wild claim, it's a fairly common claim.

On the other side you just need to google a bit to hear the horror stories about EJBs and toplink, and other database insulation layers. It works both ways in either case, the criteria that makes it or breaks it is not where you put your SQL strings.

It's all about where you put your sql strings, and I'm waiting for links to some of those horror stories.

The bottom line is that the claim that "SQL has to be wrapped and isolated" has been backed largely by "because I said so, from my experience", doubled by conveniently made assumptions, and nothing concrete. I'm tired of this discussion of deaf people. Show me some code that proves your points or else there's LongFunctionDiscussion? all over again. Good luck.

You know, you keep asking for code.. and time and time again, it never does any good because you ignore it, so quit asking for proof of something you're not willing to even have an open mind about. You don't seem to accept that discussions can be had using logic alone as proof. Fine, be hard headed, use your embedded sql strings, and continue to write insecure unsafe applications. Either you use procs, parameterized queries, or unsafe strings, there are no other options, so what are you arguing about?

[Then don't put them in a module where they share space with queries for donuts and submarines. I don't think anyone has advocated that. Put them in a class that only has queries for cars. Put that class in a package that only has queries for the sub-system. Put that package in a package that only has sub-system queries. What's the big deal?]

Be my guest and do just that. Or put them in the class that needs whatever particular query. What's the very big deal about that? It's not me who started the troll fest. It's always you repeating your assertive "experience" as the one and only model to follow.

[It isn't a troll. It's free advice. It's worth twice what you paid for it. -- EricHodges]

Let me explain to you when you were a troll so you can pay attention in the future. When you implied that DBMS software upgrades every 2 years or so, therefore apps need to be ready to change their SQL code because of that. When I further challenge you to make the connection between the two unrelated facts (on the contrary DBMS vendors are very careful not to break compatibility with existing apps), you came back with a willful misunderstanding when you showed just the opposite of what was needed . When further challenged, on this obvious inadvertence you responded "why are you opposed to this, Costin?", like in "why can't you give me a pass, and challenge me to get the facts straight, because the conclusion is already obvious and you should be supporting it". The further trolling behavior is when you dismiss a normal practice that is quite prevalent in the industry (module ownership) with derisive comments that are valid only if one blindly believes that a particular XP practice is the only valid one. This is not a form an honest argumentation or debate, but has all the characteristic signs of trolling. If you did it inadvertently, just be mindful in the future. You keep repeating this behavior, and at some point in time somebody has to call you on that.

[[I don't think Eric is trolling, but there does seem to be a question as to which of several principles should apply. I personally think that it is a major design flaw that the embedding language doesn't understand the query syntax, and that bandaids are required to get around the problems caused by that. This would fix many issues, including the lack of compile time checking.]]

[[On the other hand, this would seem to clash with the design philosophy of Java, so I can't see it ever happening there. On the third hand, Java is no pinnacle of perfection, so, so what, this is still in some sense "the right approach". -- dm]]

To further some discussion about modularity, one needs to consider that module boundaries cross through into the database. The database schema can be divided into sub-schemas with contracts between them. SQL and JDBC is just a very high level abstraction, and that's it; the semantic about particular tables belongs with the modules in charge of the information in those particular tables. Extracting the corresponding SQL out of those modules is going to buy you mostly nothing. If you were programming a distributed system you wouldn't be putting all accesses through corba layer, or all accesses through RMI, Web Services and what have you, in their own separated module, but rather the payment module will be solely responsible of using CORBA for interacting with payment gateway, and accounting module will be solely responsible for using CORBA for interacting with maybe tax software (substitute CORBA with Web Services or even File I/O). The fact that both accesses may be using CORBA has nothing to do with whether the interactions should be "isolated" into a module called "all distributed calls go through here, so we could know where to fix if tomorrow the sky falls down".

Let's assume you distribute SQL access instead of centralizing it. Two new requirements show up:

1, Each select, insert, update and delete needs security checking at a table or column level. Security info needs to be kept for 100,000 web app users and administered by the web app admins, not the database admins, so you can't use the RDBMSs built in security for this.

2. Each insert, update and delete needs to leave an audit trail that includes the user ID from the web app.

How would you satisfy these requirements?

-- EricHodges

To begin with, you do not know what you're asking for. But even if you had your SQL in one big "centralizing" thingie, the worst solution would be to modify each SQL string. Much better solutions are bloody obvious.

I don't know what I'm asking for? I'm asking how you would satisfy those requirements, aren't I? And why would you modify the SQL strings? And what's a better solution?

{How about we ask Eric for clarification rather than accuse him of not knowing what he is asking for. Generally one does not put the security checks right at the SQL, but at the top of modules. No use in starting a module if we cannot finish it due to security issue detections. Example:}

  module foo(){
    if (hasAccess('foo', userProfile, autoMessage=true)) {
       ...
       rs = query(stdConn, "select blah from glah....");
       ...
    }
  }
  // this code doesn't seem to have anything to do with Eric's stated requirements
  // that user access should be controlled at the table and column level.

(Roughly based on examples from AccessControlList

The "requirements" are so fuzzy it's not worth considering giving you a response straight away, and further important details are left as an exercise. As to what's a better solution it's easy to give you the best of solution once you come with proper requirements (for example, what exactly is this security check, please define it. But I'd rather wait for somebody to make the case that indeed but putting your SQL strings in one big place is going to help you with meeting these requirements.

Meeting the second requirement, is again absolutely trivial, and no DBA worth his name would let that be handled at the application level. Again, I'm waiting for the miraculous solution that would be possible if all the SQL strings were in one big place. You made me really curious. If you don't come up with that solution you don't have an argument to begin with.

The security check is that any user can be granted or denied access to any table or column within a table. Access is denied unless explicitly granted. Table grants can be overridden by column denies.

I usually see it defined by access to "features", which correspond imperfectly with entities.

User: Martha

Feature Descript. Access Type ----------- ------------ Feature foo read Feature Bar none Feature Grog alter Feature Zog admin Feature Meef alter ....

But it isn't in this case.

{Then I appeal to the low-probability argument again. I am not going to go around wrapping stuff in anticipate of remote changes. Violate YagNi only for high-probability stuff.}

This isn't low probability. It's an example taken from life. We aren't violating YAGNI once we need it.

Something can be taken from life, AND still be low probability

The fact that it has happened means the probability is 100%. I'm not talking about wrapping in anticipation. I'm asking how you would satisfy these requirements.

How will the DBA handle it? The DBA has no way of knowing the web app user's ID.

[It is my observation that the application level is often the best place to handle most user security. The relationship between entities and user access is often not tight enough to rely on it alone. Plus, the DBA is not the person to micromanage security for each app, but rather an "app administrator" of some sort.]

And that's the case with this hypothetical scenario. The product will be sold to users who have no DBA. Those users will administer their own security through the web app. They may share database services with competitors who have also bought the product.

Please clarify. Why would they share database services? Each company can put their data into a separate named database. The database login info is not replicated all over the app. At least it shouldn't be. But that is different than queries.

One way the app is deployed is as a central server. Multiple customers can buy contracts to use it. The database includes much more than a single customer's information. Nothing is replicated. They use the same database.

[The DBA (or the competent app developer in absence of a real DBA) will write SQL DBMS-specific code that will check that a session variable containing web user id will be set for the connection. If the variable is not set, then it will refuse the execution of update. But if set, it will log the update thus fulfilling the requirement. The app will enforce that the session variable is always set to the corresponding web user id, by technical means easily within the reach of competent application developer, and that code can easily be put in one place, to follow OnceAndOnlyOnce.]

By "session", do you mean a web app session or a database session? And this app is non-DBMS specific.

[Regardless of technical details, let's hear the argument first that this "challenge" has a spectacular solution under the presumption that all SQL strings are under one big umbrella.]

I suspect the technical details are crucial to communication here. For web apps, generally I approach security as follows. The user has to log-in to the web app. Once they log in, their user ID is set as a session variable. A standard header or "include" file checks to see that this session variable is set for each and every web script/module. If not, they are redirected to a log-in screen or expiration notice. Thus, the user ID must be available to go any further. We don't have to keep checking its existence for the rest of the script. However, its mere existence does not necessarily mean the user has access to anything. The rest resembles the examples above more or less. A many-to-many table stores the access type/level for each feature for each user. What the "features" are, is part of the app design. -- top

So I'm still waiting to here about that spectacular solution, that is not feasible unless all SQL strings are aligned nicely in one big place (however you want to call it). In absence of any such realistic solution, the whole discussion is a waste of time.

I asked first. How would you meet the requirements above if the SQL access was distributed throughout the app?

The "I asked first" child play does not work, because when you did ask, provided that you wanted to make a valid argument and not just continue your trollfest, you had to have something already in mind for a solution. And that solution would make the case for what you argue (putting all the SQL strings under one umbrella) and the same solution must not be possible or must be quite inconvenient if SQl strings are spread across different modules by functionality. Showing what you had in mind (if anything) would not cost you anything, but would be just a fair display of honesty, which would be a welcome change, and something I have the right to ask after the gratuitous hostility and trolling you've displayed so far.

Hostility? What are you talking about? When you call my contributions a "trollfest", I sense hostility on your part. What have I written that seems hostile to you?

Yes, you have a right to ask. I'd like to hear your answer first before I give mine. I also have a right to ask.

I would go ahead and comb the code and change it. I made an probability estimation, and in this particular case was wrong. SoftwareDevelopmentIsGambling.

By "comb the code and change it", do you mean insert security and audit trail method calls around each SQL access? Or move all SQL access to it's own abstraction?

I need more details or samples about your requirements to answer that. I am not fully understanding them. They don't fit what I usually see.

Feel free to ask for details. I don't know what you want to know.

How about we focus on just the audit trail for now. Ideally that would be done through the database and is built-in to most of the big-name ones. However, the tricky part is coordinating the userID with the DBMS. If we do it at the app level, then perhaps we could simply alter/override the Query() function/method so that it tracks it there. Every SQL statement would (should?) go through the Query() function anyhow.

  function query(dbHandle, theSQL) {
    if (containsSubstring(lowerCase(theSQL),
                ' insert ',' update ',' delete ')){
      logTheSql(theSql, appContext['userID']);
    }
    ....
  }

That assumes we can somehow change the query method. In the case of Java, you can't change the query method returned by java.sql.Connection.createStatement() or java.sql.Connection.prepareStatement(). To change the query method, we need to introduce our own "zone of shear" between the code and JDBC.

The strategy proposed above can be implemented in different ways (most elegantly with AOP, but not only). However it is unreliable and doesn't quite meet the spec. The fact that a SQL string has ' insert ' in it may provide false positives and the other way around, SQL strings without insert may actually trigger inserts. The trivial case is 'insert into ...' see the first space is missing. In comparison, writing corresponding triggers for all the tables is trivial and can be done so as to completely meet the spec.

How will the triggers get the web app user IDs?

All major DBMSes allows connections to support session variables (variables that have independent values for each session). Now how the app sets this session values automatically, depends on the design of a particular application but it is quite an easy problem to solve. For example you can use per-thread variables, then intercept before executing the first statement in a usecase case, get the web user id value from web session, and execute the corresponding SQL that sets it with the connection. Depending on connection pooling and other parameters you may need an extra step to unset it. Quite easy to solve. But the best solution depends on particular settings.

Can you point me at an example of this? How standardized is this across the different major RDBMS vendors? Can these RDBMS sessions last as long as web app sessions? Can popular RDBMSs maintain hundreds of thousands of them at a time? I've never used one. My web apps always grab a connection from a connection pool, so web app sessions don't have any correspondence to RDBMS sessions.

Here's for Oracle. One time setup for the session variable (actually its called package variable)

 create or replace package WebSessionCapsule? 
 is 
 WebID VARCHAR2(100); 
 end;

Then the client needs to announce his identity, with the statement:

  begin WebSessionPackage?.WebID:='test'; end;

Then a trigger will just have to check whether this variable is not null - otherwise the trigger raises an exception, and if it is not null, insert that value in a log table:

  begin insert into WebLogs(WEBID) values (WebSessionCapsule?.WebID); end;

For similar mechanism, please know your SQL from your friendly vendor supplied manuals.

If you use connection pooling, then of course you have to glue in some app code that will write the web identity to the session variable at the beginning of a "web request transaction" and clear it out at the end. This is left as an exercise for the curious reader. In the meantime, I'll kindly and patiently wait for Eric to make a display of his good faith.

A bit more patience. Are you saying that each time the web app modifies a record in the database it has to send the strings above to let the database know who the web app user is? Are you going to add that code before every SQL statement that might modify the database?

Not at all. To begin with there's no record anywhere.

Sure there are. There are records in the database. Read my question again. Your trace below shows that the web app has to tell the database the user ID before it modifies a record.

Please read the posted code again, that WEBID is a variable in the server memory and is not stored in any record

It looks like the WEBID variable has to be set before the web app can modify a record. That's what I asked about.

It is called a variable, and that's what it is. If you are ready to worry about performance hits, just forget it. Now how you set this variable depends on your settings, but in the general case, the processing "trace" of handling a web request with database is:

 A:  WebReqComesIn?()
     SomeMethodIsCalled? ()
 B:       TransactionManagerIfAny?.beginTransaction() // alternatively this happen automatically
 C:          ConnectionPool. handMeAConnection() // alternatively datasource.getConnection()
 D_1:             connection.stmt1()
               ....
 D_N              connection.stmtN()
 E:            Transaction.Commit() // alternatively this is done automatically
 F:          ConnectionPool.releaseConnection() // alternatively connection.close()
 G:   FlushTheBuffersAndFinnish?

So now your technical problem is to set the SQL session variable for the connection before step D_1, and to set it to NULL after step D_N. That's your exercise for today. If you can't figure it out, yes you can set it N times from D_1 to D_N and that will also work, but is unwarranted.

Where exactly does the user-ID come from? We have to store the user-ID in our logging table.

No need to be condescending. Is there a standard way to do this across the popular RDBMSs, or will I have to write specific code for each one? (I just found a page http://vyaskn.tripod.com/oracle_sql_server_differences_equivalents.htm that claims there is no equivalent to package variables in Microsoft's SQL Server.)

Although I am not 100% sure, not having programmed Transact SQL in a few years and also not having a handy reference, I think you can use temporary tables (tables whose names begin with #), for exactly this purpose, as their content is defined to be per session.

With a little bit of care I'm sure we can devise a quasi-generic and mechanism for transmitting a session-specific and hidden variable from the application layer to a database trigger. That's all you need. And all DBAs and their grandmothers know that triggers are the foolproof mechanism to implement an audit trail on database updates. The most trivial solution is to create a table (even if not all databases support temporary tables, you can even make it permanent), insert the needed value in it at the beginning of the use case and delete it at the end. A SELECT against that table will only have the value visible for the current session. To make it safer you can add a DBMS specific identifier for the current session, such as you can get from the special dictionary table V$SESSION in Oracle.

Thanks, now I see how I could make that work. I'd need to wrap each servlet method with the setup and teardown for the DBMS context. I's need to write triggers. I'd need to write DMBS specific code, or just use tables for the context.

Well, it looks to me that you lack a bit of imagination and/or experience. Setting the session id and removing it is exactly one bunch of code that works for all contexts, it is not necessary to wrap each servlet method individually. The trigger code is again one bunch of lines for all your tables. All in all, under 50 lines of code for a fullproof solution.

What I did was to move the DBMS access into my model objects. They all know how to insert, update, delete and populate themselves. They share the common behavior, so I just added auditing and security in that common behavior.

That may also work. Now the downside is that you've just dumbed down database access patterns to objects inserting/updating/deleting themselves. For simple applications that might be enough, for complex business apps, it typically isn't. Then you have the extra burden of proof to show that all the code in the application follows your object model and necessary conventions.

Except I haven't had to dumb down anything. Our model objects really represent DB tables, views and relationships. We can use any database access pattern we want to and the model objects will encapsulate the applications view of the data. It's easy to show that all the code follows the rules. Any occurrence of Statement or PreparedStatement outside the DB model package gives it away.

Whatever your object model may be, as a client of that module, I have to use a restricted set of APIs that obviously does not have the full expressive power of SQL. That's the unwarranted drawback of having a database layer module. If, on the other hand you give me rights to modify the module, or send whatever SQL string command I want to pass it through the module, then you're back to square one, because you can't guarantee that all update/uinsert/delete are properly logged. Doing this in the database will still leave the app developer with full expressive power of SQL while 100% meeting the auditing requirements.

To take a look at a practical example, here's your classic demo object/data model with Order and Order_Lines one to many relationships. I need to insert the data for one order and the related lines in exactly one client/server roundtrip to the database having 2 INSERT statements - obviously I could do it in exactly 1 by using a stored procedure, but I don't want to loose some flexibility. Most database wrapping layers will do that in n+1 statements. It is obvious that most database wrapping solutions in existence restrict severely the expressive power of SQL that is sent to the database from the application layer.

Your right, but n+1 statements is more than good enough for most applications I've encountered, and n+1 comes free, no hand written code. I can always jump in and hand write something specific when the need arises, but that's fairly rare, n+1 is usually sufficient.

And when you jump in and hand write something specific, you may just have blown the nice architecture that was suppose to guarantee the auditing of all updates.

They both achieve the same outcome... that fact that hand written code can do it in one trip, is just an optimization, and a premature one much of the time. Gotta get it working before worring about getting it fast, and n+1 tends to get it working very fast IMHO.

It depends on unwarranted assumptions that you are making to call it premature optimization. For other people it is not an optimization it is just the normal way of doing business. In terms of lines of code, solutions that build upon database abstraction layers are typically longer than solutions that just use plain old SQL statements, as it is well known that SQL solves lots of use cases easier and more concisely than any procedural language. SQL in itself is one of the most powerful abstraction languages, much more powerful for regular data intensive application than any OO API ever invented.

[I agree in spirit, but that's not the right way to express it. It's not SQL per se that is so powerful, it is the relational calculus, of which SQL is one expression. This is not a completely trivial nitpick, because people have in fact written OO APIs that are as powerful as SQL, by differently expressing the relational calculus.]

Perhaps we could live with a few false positives. One would have to ask the customer/manager. As far as Java not allowing you to override stuff, that is a deep flaw in something (OnlyUseOverrideableOperations). If we do have to visit each and every query, then perhaps create another kind of query function, "loggedQuery()", or perhaps add a named parameter for those that need logging.

No false positives. If we create a loggedQuery, we could also create a loggedInsert, a loggedUpdate and a loggedDelete. These would avoid the false positives.

[[There are plenty of false positives and missed hits if you rely on searching in strings. Otherwise you may need to impose limitations in constructing the sql. For example all insert statements may go through an executeInsert(String theRestOfTheCommand) that will prepend an "INSERT " to theRestOfTheCommand, and execute it. Kind of clumsy if you ask me and prevents other useful functionality from being available to the developers, but in any case it has no bearing on this discussion as it works just as well (bad) no matter if you put theRestOfTheCommands under one big (centralized and isolated) umbrella, or sprinkle them throughout the code.]]

Perhaps it is kind of a side-issue, but it would be interesting to see what kind of flaws there are in substring detection-based techniques. I think one could tune it to get say 99% right.
- Something that is 99% right in one environment will suddenly become 0% right in another. The only industrial strength way to do this is by completely parsing the string.
  - That is the issue: do we really need industrial strength. Why not just log them all to avoid any string diddling? Perhaps run some tests with them all logged, and then remove the most frequent ones by replacing individual query calls with "readOnlyQuery()" or the like. This is kind of a compromise appraoch. (Note that if we have to hand-replace each one, we may misclassify a few anyhow. Either we log them all, or risk errors. Fully parsing each one would probably slow down the thing too much.)
  - Addressing only the speed issue: I doubt it; full fledged lexing and parsing may only be tens of machine instructions per byte parsed (in a compiled language or assuming HotSpot does its job). You wouldn't want to do this in a speed-critical environment, but these days, you probably wouldn't even notice the overhead. It all depends, of course.
  - Sometimes you have to do a query inside a loop because SQL is not powerful enough to do it in a single query, or because there are some bad programmers out there.
  - True. So there is always some situation where parsing might be too expensive (although note that the parsing speed and other such issues tend to be dwarfed by the time it takes for the RDBMS to do disk accesses)
  - I don't think you can even begin to contemplate parsing "DECLARE <variables> BEGIN <pl/sql statrements galore> END;" which is a perfectly valid and many times extremely useful statement valid with all Oracle clients, and the same apply to most major databases.

None of that matters. False positives are not allowed. We're dealing with the movement of real money.

Either way, it requires going through the code and making a lot of changes. That particular change is indeed something that would be easier under the centralized SQL approach [[not true at all -- Costin]]. However, the probability of having to do that depends on ALL of these being true:

Logging is requested after the application is written
Logging was never anticipated as fairly likely before first coding
We cannot use the RDBMS to log for us (such as not finding a solution to the user-ID communication problem.)
Overriding/altering the Query() function with SQL command detection is deemed unsatisfactory by the managers.

There are just too many alternatives to visiting each and every SQL statement (whether centralized or not.) The probability still appears too small to bypass YagNi.

And yet each of these are true in this scenario. The question is simple. What would you do?

I think Dougs point was that to meet those requirements, you can't just throw strings at the database, parsing strings is not a solution, because it'll always fall short. That kind of security needs to be done in the application, and that's where an object model comes into play, either that, or you put security checks in every procedure that gets called. Either way, sql is poor at handling that requirement, you need the app to do it. AOP would be good at this, Meta programming would be good at this, and an object model would be good at this, without too much work at all, but sql strings are horrible at this.

Quite the contrary is the case, at least for the logging of updates. Technically there's no need to parse strings that was just an anonymously posted straw man tryout. Using SQL triggers the logging of (insert/update/delete) can be dealt with much more reliably and securely, with much less programming effort. Actually within 10-20 lines of code you can generate all needed triggers, some other tens of LOC are needed to put the corresponding glue in the application layer. In under 100 LOC, you have a 100% secure and verifiable/provable/auditable solution. I'd be amused to see a comparable object model feature.

[You need to reread the requirement, it was more than just logging, and in a web environment..users don't generally get their own connections, so you can't use built in database security.]

I still don't understand how the RDBMS gets the web app user IDs for the logs. Can you elucidate?

Because the application tier sets that user identity for the DBMS connection.

{I am curious about this also. How do the triggers know what the user ID is?}

Someone (I assume Costin) explains it in the section above this. The example shows the application using Oracle package variable.

The options are:

Don't wrap the SQL
Wrap locally (within same module/class)
Centralize all SQL into one module/class/file
Using a language in which SQL (or the equivalent) is an integral part, and parsed at compile time.

(Hmm, 6 hours without an update; everyone is in agreement now, eh? ;-)

[No, just tired of the discussion, neither side will budge.]

It is very simple: no convincing argument was produced that if you don't centralize SQL under one big umbrella, the sky is going to fall.

A package is not one big umbrella, it's a well organized place to put things that are related. Plenty of convincing arguments have been made, just none you agree with.

It's one thing to write down an argument, and it's entirely another thing to be able to defend it convincingly rather than spinning the issues. And recurring to unreasonable "debate tactics". Could you recap the "convincing" arguments?

Doug was far more reasonable than you Costin, so watch slinging ego insults about the one true way, you don't have a leg to stand on.

Just because there are SQL strings don't make them related. There is a more principled and substantial debate to be made whether modules of code should be organized by layers or by functionality (horizontally or vertically vis-a-vis architectural layers and use cases resolution). But whatever anyone calls that package, it will be a big umbrella because a non-trivial software with SQL inside will contain many unrelated SQL queries, and the implication that if you don't put it in one place/module/package/whatever the sky is going to fall, was supported so far only by trolling and handwaving.

It's not about "slinging ego insults", it's about judging objectively what happened on this page, and a careful reading will make it easy to recognize. I hope the person in cause, has the strength to delete that crap. Anyway, this page is kind of ready for junk, I kind of didn't want that unsupported handwaving to pass unchallenged as pearls of software engineering wisdom.

[I hope it's clear that I (Doug) have made comparatively few comments; there have been few signatures, and varying use of italic and [] markers, and hence the possibility of confusion about who wrote what.]

[I've been thinking about Costin's point that SQL is very abstract already, and that wrapping it in a lower level abstraction isn't the right way to go. There have been a lot of points and counterpoints made - too many for me to track, actually - but I am sympathetic to Costin's argument on that particular point. I believe that Sun's attempts to simplify access to databases has actually backfired; nor do there seem to be a lot of e.g. raving EJB fans in the world.]

[On the other hand, apparently unlike Costin, I don't think that this means that the SQL shouldn't be confined to its own module(s), I just think that means there are a lot of bad ways to do it, and that if that's the only choice, then maybe scattering SQL all over is the better of the two.]

The better of the two evils? It has not been established that SQL being where it is needed, and not under some arbitrary global umbrella, has major drawbacks.

[But I do believe in encapsulation, and to me this seems to mean hiding the names of tables and columns. Hiding the SQL operators (and miscellaneous keywords), on the other hand, may not be necessary.]

[Costin's critique of this, if I understand correctly, is that this isn't necessary if the schema is done correctly in the first place. That may be true, but I don't find it comforting. Costin is an unusual guy, and may well be able to do perfect schemas that never need to change, but I observe that the rest of the world has troubles doing so.]

My critique of this is very simple: modules should be divided as much as possible by functionality and not by arbitrary layering. Therefore if Doug is in charge of the payments module, we should trust him to be competent enough to write whatever necessary queries against tables related to payments. And where is the natural place for those? Of course, in Doug's payments module. If we are under the assumption that a module is a work assignment, and somebody's responsible for every module (quite a common practice in the industry), we'd have the folllowing alternatives, with AllQueriesGoHere? module:

We put someone in charge of it. Therefore that poor guy would have to create rather trivial queries for Doug where Doug is quite capable of doing it. That becomes easily a tremendous and unnecessary bottleneck. Been there done that, seen that, didn't like it.
We leave it with no owner (the far more preferable solution), and everybody can check-in all the modification he/shhe needs related to their own functionality. Relying on developer's competence should be the norm rather than the exception. But because that module contains pieces of code from everybody, pieces that have nothing in common other than the misfortune of being SQL strings (what a miserable fate), and it is overall nobody's responsibility but responsibility for pieces thereof, we've just exposed a module that is fake and artificial. It's just a side effect of the arbitrary decision to put SQL in one place, so no one can really take responsibility of it as an work assignment. It's free check-in for everybody. Not really a module, but a place whose sole purpose in life is to make it easier to find SQL than using a code browser or a simple grep.

For further enlightment about modules, modularity, and why modules should be work assignments, SoftwareFundamentals is highly recommended.

Furthermore if we trust Doug to be a software professional (and at a minimum any software professional should be able to use SQL competently - even without prior exposure he/she should pick it on the job and hit the ground running - it is after all just trivial mathematics), and then the goary scenario of big changes comes in (we switch from Oracle to Db2, or we support them both, or the idiot DBA "refactored" a master detail into a in-place array column). Whom do we go to make adjustments in the queries needed by the payments module? Should it be Doug, because he knows what's all about? We're not going to bottleneck on one guy, owner of the big hairy AllQueriesGoHere? module, are we? And since Doug is a competent and professional developer, he should have no big trobule and no big fuss finding those queries within its very own payments module, right? Voila, it was much ado about nothing, after all.

[So to me, the question is how to correctly encapsulate the high level SQL semantics without making things worse with a clumsy low level wrapper layer. If you assume that's impossible, you get one conclusion, if you assume it can be done, you get a different conclusion. -- DougMerritt]

And if I ask you again, Doug, what was the case for encapsulating SQL? Why don't we encapsulate String library as well, or all access to java.io library, or all access through HTTP, or web services, or Corba? Should we let all these unreliable module writers to play willy-nilly with I/O and with strings or with HTTP? What if they mess up? What if tomorrow we have a monster of all changes scenario in which we replace all accesses through SOAP by accesses through CORBA?

It is the never-ending battle about what to wrap and what not to wrap. FuzzyDistinctionBetweenInterfaceAndImplementation

I thought Costin and I agreed it's possible to wrap the "high level" SQL semantics. I thought he just didn't want to put it in the same package. -- EricHodges

You don't need to encapsulate SQL at all, this is just fairy tales told by the ChurchOfOoBigots?. On the contrary it is very simple: the module that deals with shipments should throw any SQL it needs towards the shipments related tables, and the module that deals with payments should throw any SQL it needs to the payments related tables. Very simple. Nobody needs a half-baked, clumsy API to "encapsulate" database access. JDBC, ODBC, ADO.NET or what have you are your friends. -- Anonymous

It's very simple if your application cleaves into well defined "shipment" and "payment" modules. Mine doesn't, so that isn't a realistic option. Of course, this could all be a fairy tale from the ChurchOfOoBigots?. We love to make up lies while we aren't buzy hating SQL. -- EricHodges

I would like to explore some actual examples. Until I see naked SQL flopping and flunking, it is hard for me to believe it is common. Yes, sometimes it causes problems, but again, SoftwareDevelopmentIsGambling. You got to know when to hold 'em, know when to..... - WillieNelson?

Hey Willie, you neither wrote nor performed that song. If you want serious discussion about SQL and encapsulation you'll have to go somewhere else. This area is just for OO bigots and our fairy tales. -- EricHodges

All us country singers are interchangeable, just like PlugCompatibleInterchangeableEngineers. -- WillieNelson?

Kenny Rogers never smoked pot on the roof of the White House.

He did, but he did not inhale, per tradition.

"It is very simple: no convincing argument was produced that if you don't centralize SQL under one big umbrella, the sky is going to fall."

No one said the sky was falling. The two requirements I listed above were easier to meet because I had already created abstractions for RDBMS accesses. I added security and auditing to 4 methods in one class. If I used Costin's approach I would write triggers for each RDBMS vendor (how many triggers would I have to write? Exactly one SQL batch - some 20 lines of code) and wrap every servlet method with code to establish the context. -- EricHodges

Handwaving your solution is not a serious way of having a conversation. It is most likely that you dumbed down the database access completely, but you just hide behind the bushes. Can you tell us what interfaces are exposed to other modules? How would one write queries against the database through that fancy 4 methods?

There are dozens of interfaces exposed to other modules. A typical one is a DeliveryEvent?. All of the queries related to delivry events reside inside the DeliveryEvent? implementation, or one of the classes that can have delivery events. I haven't "dumbed down" the database access. I've encapsulated it. -- EricHodges

You ain't gonna write no big business software through that dumbed down interface. When you have an extremely powerful language and you dumb it down to go through those 4 methods, it's just a bad idea in principle. At times it might work for trivial applications, but most business systems have a complex load of simple CRUD screens, stored procedures, complex SQL transactions, batch processing, and it ain't gonna be written through a dumbed down API replacing SQL.

It ain't? Oh well. I guess I'll go back to waving my hands, trolling Wiki, being hostile and dumbing down interfaces. It was a good run while it lasted. -- EricHodges

Actually you haven't posted any code or a sketch of what you did, you just brag that you solved it in 4 methods, without any possible drawbacks, but haven't shown anything for it. Whatever.

Come on guys, let's go back to pretending to be constructive and civilized. -- DougMerritt

Nah, the cat is out of the bag. They are wise to the fairy tales of the ChurchOfOoBigots?. There's no point in pretending now. I might as well come right out and say it: Object oriented programming is the One True Way. Everything else, from assembly to relational to functional, is false doctrine. An inquisition must be launched to purge Wiki of such heresies. Burn a few table lovers and set an example for the rest. -- EricHodges

Fool! You have not been authorized to disclose these secrets! Boy, are you in big trouble now...

The time for action is now! False dogma must be stamped out immediately! Suffer not a DBA to live!

Actually DBAs have quite a grip on their turf. Too politically strong, and you ain't gonna see many DBAs flying away from companies because of OO bigots. On the contrary, the OO bigots are in danger because some mighty fine outsourcing companies in Banglore and elsewhere, know very well their VB and how to throw some SQL get the data, and put it on the screen.

EditHint: Somebody should move most of this text to DatabaseAbstractionLayerDiscussion? and leave only links and definitions here...

I see this as a question about the level of language SQL is. One school of thought is comfortable with it as an application level language, and therefore the SQL belongs in the application source code. The other school is comfortable with the prime language as the application source code, and is happy to delegate the formatting of the SQL code to lower level processes, just as they would with a sockets api or any other external interface. It is personal preference, or experience. I am in the second school. Using SQL within application source code to me is the same as saying Enter Neat3 in COBOL - which was a way of writing inline assembler in an old NCR cobol. It is a pragmatic solution to a dissonant interface. -- PeterLynch

While arguments about RDBMS changes, VendorLockIn, SQL versions, and other stuff is interesting; an important reasson for a DatabaseAbstractionLayer and against PerniciousIngrownSql--regardless of whether OO is used or not--has been left on the floor:

Relational schemata frequently change.

Sometimes the change reflects a change in business rules, which the application programmer has to deal with (semantic schema evoluation). Sometimes the change reflects little more than the DBA monkeying with the works to tune the database (syntactic schema evolution). In either case, SQL queries in application code that used to work, now don't anymore. And since SQL is a loosely-coupled interface, recompiling won't fix the problem.

It is much easier for application programmers to deal with such changes when they are contained in one place; rather than sprinkled throughout the codebase. Plus, if you DO use a statically-typed language like JavaLanguage (OO or otherwise), the interface to the DatabaseAbstractionLayer is usable by the compiler's type-checker, the RefactoringBrowser, and other SW development tools, which makes assisted ReFactoring of the application much more tractable.

Schema evolution - especially the first case - occurs far more often than does migration/upgrade to the RDBMS.

"Contained in one place" often depends on the kind of change. Sometimes doing such adds a fair amount of indirection which can be expensive up front, kind of a PrematureAbstraction. It may result in having to make changes to both the caller and callee interface instead of just in one place if no wrapper is used. Needless to say, it is a sticky choice.

CategoryAbstraction