(moved from MergingOoAndRelational)
Often an application (or database query) requests an entire record from a database table, even if only some of the columns are going to be used.
Some programmers believe this is a bad practice; others disagree.
Advantages of requesting the entire record:
- The query is simpler and more compact.
- The field names are not scattered throughout the SQL code.
- It can greatly simplify interface code.
- Fewer ChangePoints for typical additions or removal of column usage. Long lists of attribute or column names are a yellow-alert CodeSmell related to TooManyParameters.
- If the DBA knows that it is safe to delete a field, the DBA can delete the field without changing the queries.
- If the DBA knows that it is safe to delete a field, the DBA does not need to deprecate the field before deleting the field. This saves hassle.
Advantages of specifying just the fields that are going to be used:
- Always ask the database for what you need, and no more. Then you won't be surprised by a performance problem the first time that your application encounters a larger data set than you expected.
- If the entire record is requested, even the best logger may never be able to know which columns an application actually uses. "Solving" that might create security problems.
- As the system changes, it is often necessary to find queries that reference some particular field. If a query explicitly mentions required fields by name, this makes it somewhat easier to find queries that are affected by proposed changes to schema.
- Combined with DISTINCT (if using SQL and not a smarter RelationalAlgebra), it guarantees that the query is correct. If you retrieve many fields, but only examine a few, your application may see duplication of the effective records that it's actually using. (See AlwaysUseSelectDistinct, SelectDistinctIsaCodeSmell.)
Context / Scalability:
- The performance impact of requesting more fields than needed increases with the number of records retrieved, and may be affected by transaction modes and query method. Performance details will vary wildly among different database system implementations and their features for dealing (or not) with concurrency.
Work-arounds:
- One could flag any such query against a table with deprecated fields as a warning. The warning can be addressed by altering the code to explicitly request the fields it will use like it probably ought to have done in the first place. Regarding query tools that look at the existing table definition, perhaps, just filter out warnings from those client applications and ignore them.
It can greatly simplify interface code.
How? I can't think of any host language APIs I've worked with that would be affected one way or the other. The fields you need, you mention. The others, you don't.
Long lists of attribute or column names are a yellow-alert CodeSmell related to TooManyParameters.
Agree, as a general principle. However, the number of fields needed by the code running the query is whatever it is; this seems orthogonal to the choice of retrieving more fields than necessary from a query.
If the DBA knows that it is safe to delete a field, the DBA can delete the field without changing the queries.
And how would a DBA "know" this? You have to search through code anyway to find references to such a field. If you find references to a field only in the code, but not in the query itself, you have to study the code enough to backtrack to the query supplying the field -- if in fact the hit was a legitimate reference to the field.
If a query mentions the field explicitly, it will almost always be easier to find the field reference, and recognize it as such quickly and without doubt. (The only exception to this is in code that dynamically builds query strings.) You can then work forward from there to the code that uses the query's results.
This also relates to: The field names are not scattered throughout the SQL code. I find this to be an advantage when refactoring schema, rather than a disadvantage -- particularly if field names have been chosen carefully.