Strive For One Table

Whenever I am writing a fairly large query, I find one of the best rules of thumb it to start with the ideal table and work backward. A query may involve several tables, but generally you don't want to work with several tables. Thus, you first focus on generating a final table (virtual) in which most of the processing is applied to. It will thus have all the needed peices of data and flags/indicators to carry out the primary job. You then explore how to achieve this final table by various joins and pre-processing.

However, it is not always possible to reach that ideal, often for performance reasons. For example, a fair amount of filtering may need to take place to shrink the size of an intermediate join. For instance, if you are only dealing with one State of the US, you want to filter out all other states before joining those records to yet other tables. (Some optimization engines will do this automatically, but it depends on many factors.)

Some people try to apply different activities to the various tables involved along the way, which is often messier than pushing most of the processing onto the final table. One of the main reasons for postponing the more complex processing is tracability: you generally cannot "see" the intermediate tables without a lot of copy-and-paste experimentation.

(I am not happy with the name of this topic. It needs some pondering.)

--top