Meta Data

Literally "data about data." (See DefinitionOfMeta.) The behavior of systems that contain and process metadata can be changed at runtime by the users. This gives the users much more control of the system they're using, allowing them to "reprogram" it on the fly (within strictly limited boundaries) to meet changing requirements.

Examples of usage:

BeOs TaggedDatabaseFileSystem
CompositeTypes
DublinCore
EXIF data in digital camera images
File info (IPTC) in PhotoShop images
FileTypingSystem
ResourceDescriptionFramework
MacOs ResourceFork
MetadataEncodingTransmissionStandard? (METS)
MetadataObjectDescriptionSchema? (MODS)
MP3 ID tags
TIFF image format tags

Criticism:

MythOfMetadata

Related to:

MetaLevel

Advantage: Productivity -- The behavior of the system can be extended and customized by adding and changing data, without making any changes to the code. This can be much easier than changing code, and is often done by different people. (So, much functionality gets added to the system without coders expending any effort at all. Result = infinite productivity.)

Disadvantage: Complexity -- Writing a program that builds N different screen layouts based on its input data is more complex than building a program that just displays one screen.

Disadvantage: Scheduling difficulties -- Nothing works until the data driven engine starts to work. Thus, there can be weeks (or months!) of development effort before the first screen works. After that, screens are developed much more rapidly than if each one were done custom, but the "discontinuous slope" of the "development productivity" curve can cause ulcers in naive development managers.

But: With the right planning a meta data driven display can be constructed in days; more complex and custom data and field types and customization of display and data validation can all be added incrementally once the basics are in place.

Disadvantage: Documentation of an interface created using meta data tends to document the interface at too low a level, i.e. at the level of individual fields. This is encouraged by the meta data decomposition - and may be partially generated automatically. Users generally require a higher level description, a description at the form rather than at the field level.

Metadata is real. And it more simple and common than you might imagine.

Metadata is data that describes other data. It enables one to process the other data in a generic manner.

Let's consider a common example, that you'll find in any relational database:

I have a table called AUTHOR. It contains data. I type "SELECT * FROM AUTHOR;" and the system lists the contents of the AUTHOR records. This is data, just ordinary data -- Names, like "Joe" or "Sam", for example.

Now I type "DESCRIBE AUTHOR;" and it displays this:

mysql> DESCRIBE AUTHOR;

 +---------+-------------+------+-----+---------+-------+
 | Field   | Type        | Null | Key | Default | Extra |
 +---------+-------------+------+-----+---------+-------+
 | A_ID    | int(11)     |      | PRI | 0       |       |
 | A_FNAME | varchar(20) | YES  |     | NULL    |       |
 | A_LNAME | varchar(20) | YES  | MUL | NULL    |       |
 | A_MNAME | varchar(20) | YES  |     | NULL    |       |
 | A_DOB   | date        | YES  |     | NULL    |       |
 | A_BIO   | blob        | YES  |     | NULL    |       |
 +---------+-------------+------+-----+---------+-------+
 6 rows in set (0.00 sec)
 mysql>

This is metadata: It is data that describes other data. This data describes the format of other data, so that it can be processed. Within the AUTHOR table is a column called A_FNAME. It may contain values, like "Joe" or "Sam". That's data. In the relational database's "system tables" is more data, like the string "A_FNAME".

This is metadata; in this case it's the name of a column in a user table. As you can see, the metadata shows that the maximum length of the data in the "A_FNAME" field is 20 characters. Also, NULL values *ARE* allowed in this column. All relevant data about the format of your user tables is stored in the database's "system tables." It is data that describes the format of other data. It's "metadata." Metadata is also ordinary data. You can SELECT and process it just like any other data. The database engine fetches and uses metadata in a "data-driven" manner, to process *your* data, in the user tables.

Now for the deep philosophical issues:

As is often the case with metadata, the metadata can describe itself. In a relational database, the metadata describing all the tables in the system is stored in the "system catalog" tables. The system catalog tables are also tables. So, therefore, they're described by data in the system catalog tables. It's recursive. How interesting. ;->

Metadata is "data about data".

For example: "2" is data. "The number of people in the car" is metadata. It provides meaning to the number "2" and places it in some kind of context.

The confusion comes from the fact that "metadata + data == data".

"There are 2 people in the car" is a single piece of data that is self-describing because it combines a piece of data (2) with a piece of metadata (this is the number of people in the car). Together, though, it becomes data again.

Thus, you can layer metadata miles high. A Java method is a whole heap of data and metadata bundled together in such a way as it instructs the computer to do something. Then with a tool like XDoclet, I can add some _further_ metadata to this method to say "this method is part of the remote interface of the FooEnterpriseBean?". That's metadata again.

Lisp exploits this layering of data and metadata by treating code as data (and vice versa). This way, at the most basic level, you can take advantage of the abstraction-building techniques of combining data with metadata, and layering the whole thing up into a program.

It's a very pure form of program-building, and helps explain why Lisp programmers are even worse than Smalltalk programmers in the "Why the Hell did my perfect language fail?" department.

XML was also designed to be metadata-rich and self-describing, although most people actually _producing_ XML ignore this (as anyone who has had the pleasure to do a CVS merge on the XML generated by Websphere Studio will attest to). This also leads to the fact that most XML-based programming languages end up looking like a poor imitation of Lisp.

XML has essentially become a way of serializing object hierarchies and sending them as data back and forth between services. the 'self-describing' PromiseOfXml has been sort of lost in the shuffle. --JonGrover

Another way to look at MetaData is to consider the ConceptStructure. The above example provides a StructuralDefinition? of the Container of the data. Structure need not be exclusively about data, included can be methods and locations. Thus such a structure can not only define what the data looks like, but also what process or processes act upon the data, and the location of the data, the processes, and processed data. -- DonaldNoyes 20070718