Generic Data Model

Generic Data Model is a pattern where all the attributes are separated from the object and saved in different table.

For example instead of:

We use:
And data is like:

It is a very flexible solution but in large databases the performance could be terrible. Should be in category DataModelPatterns. I don't think it is an AntiPattern, maybe more like a GreyPattern.

Nah, it's a great pattern. Patterns have context, this one is great for rapid prototyping, that table becomes an object store! Though I usually do it with three tables, Object, Fields, Values, or something like that.

One reason I'm still not sure about this pattern is what it does to commands in SQL.

 UPDATE USER SET EMAIL = '', PASSWORD = 'xxxxx' WHERE NAME = 'thomas';
AnswerMe: Can anybody share experiences with these kind of tables and more complex statements? Was it a net loss or gain? Are there any pointers one should keep in mind? -- AnonymousDonor

I can share experiences. This sort of normalization can produce very bad performance. People who write reports with commercial reporting software (like BusinessObjects?/CrystalReports) hate tables like this. They make their job much more difficult than specific tables who's schema contain information from the problem domain.

Response to comment above: Most database environments can allow you to wrap these generic models into views/queries that appear to be tables. That can keep the CrystalReports people happy. One agile approach is to use a Generic Data Model during the initial stages of the design, when people don't really know what they need, and then move it to a conventional data model when the requirements are more well-understood and the performance starts to become an issue.

It would be nice if query languages more easily allowed one to transform between row-wise/column-wise. That way one could adjust it to get the view they wanted. However, such is difficult to implement unless one has a type-free database, but this is not the place for a pro-type-free rant. Related: SurveyReportingExample.

Definitely not an anti-pattern, and it can often greatly simplify table design for some corner scenarios. Apart from giving rapid prototyping benefits it also helps with handling 'jagged' data. Instead of introducing null columns using GenericDataModel enables us to save only relevant data thereby actually decresing complexity. Reporting and manual labor against the table can be handled via views as mentioned above.

The page title and commentary above seem to be a misuse of the term "Generic Data Model." The above is more commonly known as EntityAttributeValue. A generic data model generally refers to a predefined, non-specific data model intended to be used as a template for specific implementations. For example, a generic "patient records" model may be used as a template or starting point by hospitals wishing to implement their own health records systems.

ThomasKyte (the Tom behind argues strongly against Generic Data Model:

Loosely structured data isn't necessarily 'bad'. It gives a high level of flexibility, which is very useful in 'open' systems when you don't know your full ontology in advance, or when you must integrate many independently developed subsystems. Indexing is feasible for a generic data model (cf. EuclideanProximitySearchEngine, SemanticBinaryModel, DataSpace, LatentSemanticIndexing), though your DBMS must be designed for it. Support for 'views' can potentially inform a DBMS what sort of queries and updates to expect to achieve more efficient indexing.

I wouldn't recommend the structure presented above, but only because I feel it's incomplete. Some extra meta-data per record (who published it? when is it valid?) is useful for various properties (e.g. monotonic temporal updates support concurrency control and eventual consistency; identifying the publisher allows us to prioritize when conflicting data is published). And the idea of 'object' is something I'd rather be rid of - we should be able to offer data about regions (areas, volumes), markets, phenomena, subgraphs and trees, et cetera, so we need an ad-hoc concept of identity (cf. MultiCaster). My vision is more of an ad-hoc, composable registries of records, with a few standard metadata attributes per entry. Composition of registries allows us to publish one registry into another, functionally filter and transform some of the data and metadata, and support distributed queries. Composition replaces 'views'.

In general, the more structured your data model, the more efficiently you can deal with the data (to a point, of course). At least in an SQL database (the original context in which this antipattern was presented), EAV is a terrible idea. It does not give a "high level of flexibility"; rather, it makes the system inflexible because all the stuff that a decently designed DB could do well (with a proper data model) has to be pushed to the application layer. This makes the application harder to write, not to mention less efficient in terms of performance. If you don't know your full ontology in advance, then either you have more analysis to do, or (in the case of a truly open system), you'd probably be better advised to use a schemaless database such as CouchDb or MongoDb?. I've occasionally seen 2 or 3 fields' worth of "open" data profitably munged into an SQL database by this trick, but not the whole object.

And again, your claim that support for DB views (in the SQL sense, anyway) will help this is completely untrue as far as I can tell. Correct me if I'm wrong, but so far as I know, you can only ameliorate this situation with views if your data is amenable to a proper schema definition (which is essentially what a view is). And if the data is amenable to a proper schema definition, then there is no need for this antipattern in the first place.

RE: more structured your data model, the more efficiently you can deal with the data -- Not quite. We optimize some queries and updates at the expense of others. The goal, naturally, is to optimize the queries and updates we use often at expense to those we use rarely or never. A less structured data model can support similar tradeoffs through caching, annotations, views, and runtime profiling - e.g. a EuclideanProximitySearchEngine can learn which attributes are the most popular query targets, and adjust the index to split on those attributes appropriately more often. Without those techniques, we can generally achieve a performance within C*log(N) of ideal, but for a likely wider array of queries.

RE: you can only ameliorate this situation with views if your data is amenable to a proper schema definition (which is essentially what a view is) -- A view is akin to a query, not a schema. With a schema, we are concerned about redundant expressions of data (i.e. OnceAndOnlyOnce, DontRepeatYourself), avoiding loss of data, and data integrity (missing fields, relationships between fields or entries). Any fixed schema or ontology will be incomplete and inadequate in an open system - e.g. the field I work in is robotics, where there are thousands of diverse (and pluggable) payloads, platforms, actuators, sensors, protocols, and power supplies. A 'view', however, corresponds to a relationship between a data and its observer, e.g. a robotic system and a controller (such as a user interface, world model, or planning system); there can easily be dozens of concurrent views, with a lot of redundancy between them. It is vastly easier to develop an adequate view for a specific task than to develop a common schema.

RE: in an SQL database, EAV is a terrible idea -- I agree that SQL is not the most effective language for expressing rich, ad-hoc EAV queries. But EAV still gives you flexibility, in the formal sense that you can introduce new attributes for existing objects without modifying the data model. Ad-hoc integrity constraints could be added via triggered procedures. But I do not assume that a simple SQL DBMS is necessarily the 'context' of this discussion, but rather just a language to help explain what the GenericDataModel means. I acknowledge the possibility of constructing a DBMS and query language specifically for the GenericDataModel. We could have ad-hoc declarative integrity constraints. I would like to eventually build a generic, composable DBMS to support live, reactive, distributed queries and updates.

RE: metadata [...] is once again better left to the DB -- We can replace use of transactions with TemporalLogic and a 'time' field on every datum, and achieve a much more scalable and disruption-tolerant data model than supported by transactional systems. Similarly, we can achieve or augment security mechanisms (trust, auditing, access-control and revocation, spam killing, etc.) by keeping metadata about data's source and authentication, and achieve a much more composable security model (suitable for distributed views and queries) than is possible and convenient using 'identity'. The use of metadata has pervasive, non-obvious impacts on how we express our domain data models and queries - e.g. 6NF was developed for the RelationalModel in order to avoid anomalies in a temporal database, and in a temporal model developers will want to group queries based on what typically changes together. I do not believe such systematic applications of metadata should be entirely 'left to' the domain data model developers.

In my experience. Except for very specific scenarios this pattern is a very bad idea in SQL word. Performance and complexity are horrible. Actually an RDMBS is internally designed that way. Look at system tables to see EAV. RDBMS is mostly about giving language and tools to manage structure that we want extract from data. Building again EAV on top of RDBMS means loosing all those tools RDBMS gives.

Both! At the same time, up-hill, both ways, barefooted.

It would be nice if each "kind" (thin table or wide table) was made a kind of view (interface) such that one could switch between one view or the other as needed rather than have to choose up front. I agree there may be performance issues with the choice; but even then, both views could be offered even if one or the other has to be favored performance-wise. Emphasizing one would not preclude getting the other view, rather the non-favored view (or something else) would be slower. And it should be possible to change the emphasized view if we later find we need to swap trade-offs (perhaps bringing the DB down during re-projection and re-indexing.) I've been kicking around some of these issues while pondering DynamicRelational. --top

This is feasible under some very tight constraints... such as lossless, mutable views.

I have seen tables like this. The next step is to combine this with versioning, i.e. each entry has a version field so we can roll back single entries or groups to previous versions. And then the next step is gargabe collection because to delete anything (which you must to avoid killing your db) you have to find out which data is unreferenced. Effectively a more or less complete gargabe collection algorithm. I warned that that would happen. It happend. And it was quite a mess. kind of GreenspunsTenthRule? for DB. Any sufficiently advanced schema contains a garbage collection algorithm. -- GunnarZarncke


See also: AttributeTable, MultiParadigmDatabase, UniversalStatement, RdfTriples

EditText of this page (last edited December 10, 2013) or FindPage with title or text search