Normalization By Synthesis

A means of automated database schema refactoring ("normalization"), given the invariant that column names prove unique across the entire table-space of a database. Using a higher-level data description language (DDL), tools automatically convert from a more natural expression of data layout into a normalized form considered more suitable for use on an SQL database backend. In this way, a well-structured schema always results, given input even from those without dedicated database design experience.


According to the ArTech developers, TheTheoryOfRelationalDatabases, by DavidMaier, is the best book on NormalizationBySynthesis.

The only systems I know that implement NormalizationBySynthesis are GeneXus and DeKlarit therefore the following example is based on them:

Apparently a requirement for this NormalizationBySynthesis is that EachColumnNameBeUniqueAmongTheTablespace. So, for example, to define many to many relationship between Invoice and Product first you define an "transaction" ('it is somewhat like a view, but transaction is the name used by GeneXus documentation) with the name "Invoice":

 Invoice{
  InvoiceId:Integer

InvoiceLine{ ProductId:Integer ProductName:VarChar ProductPrice:Money ProductNumberOfItems:Integer } }

At this point, it generates this SQL ddl:

 create table Invoice (
   InvoiceId Integer
 )

create table InvoiceLine ( InvoiceId Integer, ProductId Integer, ProductName VarChar, ProductPrice Money, ProductNumberOfItems Integer, )

But, then a you add "transaction" with the name "Product" (in another file in the same project):

 Product{
  ProductId:Integer
  ProductName:VarChar
  ProductPrice:Money
 }

Note that ProductNumberOfItems is defined inside Invoice.InvoiceLine but not inside Product, based on this (and the name coincidence of ProductId, ProductName and ProductPrice) GeneXus generates the necessary "alter" statements to transform the previous model into (it is important to note here that GeneXus is way older than RubyMigrations?) this Sql ddl:

 create table Invoice (
   InvoiceId Integer
 )

create table InvoiceLine ( InvoiceId Integer ProductId Integer ProductNumberOfItems Integer }

create table Product ( ProductId Integer ProductName VarChar ProductPrice Money )

The normalization is correct, and the initial input is "more intuitive" for people not familiarized with normalization rules: The developer only needs to think about one particular view/perspective/transaction at a time (first the hierarchical structure of the Invoice, pretty much as it appears from the end user perspective, and then think only about data that describes a Product, also pretty much as conceptualized by an end user with no formal training of database normalization), and GeneXus automatically normalizes stuff, apparently by matching the column names, and using the local hierarchical structure of its "transaction/view" as guidance. This does not mean this is a tool to be used by end users, but it does means that it is going to be a lot easier for the developer doing the modeling to use the UbiquitousLanguage to design the database model (and to validate it with the user)


There is, of course, something that bothers me about the rule of having EachColumnNameBeUniqueAmongTheTablespace; I feel that those prefixes (Product before ProductId, ProductName and ProductPrice, for example) and the apparent fact that they make it possible to automate normalization reminds me of ObjectWeenies (like me) that believe (like me) that by creating object model we automatically and effortlessly (almost without noticing) we get a normalized model just by grouping stuff in objects to avoid violating DontRepeatYourself by repeating the class name in the field names.


Another advantage of this way of describing the databases, is that the code that was originally written to deal with the view/transaction "Invoice" is not altered when the view/transaction "Product" is added to the project, and therefore the code is not affected by changes in the underlying structure of the database (multiplicity changes do not have a damaging ripple effect on the code of the application).


Can we please place definitions of concepts on top, so those of us who've never seen this page before have some context in understanding what's being discussed? Thanks! Here's my attempt to summarize based on the first part of the ensuing discussion.

Thanks a lot for writing that definition at the top, I wasn't sure my self of how to define NormalizationBySynthesis, so I was hoping that someone smart like you could help me with that...


Wondering if the way NormalizationBySynthesis works has something to do with HegelianDialectic...


EditText of this page (last edited April 9, 2010) or FindPage with title or text search