Where And And

Although StructuredQueryLanguage and WhereClause? has rich constructs, many SQL statements are like below:

 SELECT A, B, C
 FROM FOO
 WHERE X=5
   AND Y=6
   AND Z=7

And for most cases, the order of ANDs reflects the index structure of the table.

Is this good? Or bad? Or somewhere in-between? Does it have pitfalls?

It just shows that most SQL code is not exciting, and has a recurring pattern. This indicates automatically generating SQL code is feasible.

I have used DataDictionary(s) of sorts to generate SQL fitting such a pattern for QueryByExample and edit forms. However, I kept running into odd problems. For example, some RDBMS don't let you reference ID columns in certain spots because they appear in more than one table (joins). My solution is to use "ref" instead of "ID" for foreign key names (see DatabaseBestPractices), but existing tables or organizations may not follow this convention.

You should be able to use JOIN ON or NATURAL JOIN to match the ID columns together; then they will be de-duped in the result of the join. Unfortunately few RDMSes support this feature.

Another solution would be aliasing the columnnames or prefixing the columnnames with the tablenames or tablealiases.

NATURAL JOIN joins on all columns that have the same name in both tables, which can get funny results if both your 'company' table and your 'employee' table have a field 'name' but you wanted a join on 'company_id'.

JOIN USING (foo_field, bar_field) has the less severe drawback that you're not allowed to use a table prefix on the de-duped fields anywhere in query, which is annoying when you join with yet another table that has a field with the same name that is not part of the join condition.

When you have this many ANDs strung together, you're usually looking for a particular row. I've noticed that the relational model, while great for doing the same operation to a huge number of things, gets clunky when you try to get particular. -- GeorgePaci

If the result is supposed to return one row, but it returns many, then you simply asked for the wrong thing. Do you have an example of such a difficulty? The "difficulty" may be a sign of a deeper problem, such as bad schema design or something.

Any time you're looking for a particular row, you _need_ to specify a candidate key. To do anything else is to rest the correctness of your program on a fluke. Now, the clunkiness aspect is merely one of clunky candidate keys, which would be an issue regardless: if you have a list of objects, and you need a particular one, you need to check all the same data.

[Some vendor-specific SQL implementations do not default to distinct or unique rows. Which actually can cause data integrity issues. As a safety, I always use those keywords.]

I don't mind joins being made automatic somehow in SQL (to reduce ANDs). However, some way should remain to use CalculatedRelations in case we want to use a relationship outside of an EntityRelationshipDiagram link. Pre-limiting the link/join paths is anti-relational IMO.

One part of me like the idea of natural joins / automatic joining but another can see that one advantage of SQL as it stands is that everything required is in the statement - the statement specifies the relationships as well as the 'query'. This is why it's a verbose language compared to alternatives.

But there should at least be the option of natural joins.

In my experience, this pattern is subject to the EightyTwentyRule where often roughly 80% of the query fits this pattern, but there are often exceptions, such as OR clauses. -t

CategorySqlProgramming