Database Is Representer Of Facts

It is as simple as that. Database represents facts about the reality of an enterprise. More exactly, a database stores a set of propositions that are believed to be true (after all, it is mostly the humans who input data into the database) about the realities of an enterprise.

I don't know who originally had this beautiful idea; I read it first in AnIntroductionToDatabaseSystems. Some attribute it to HughDarwen; it probably also belongs to those who pioneered the research into deductive databases. But the idea is splendid nevertheless, and has some very practical implications on how we design databases and database applications.

For the relational model, it translates quite simply: every tuple T(A1=a1,A2=a2,... An=an) is in fact a proposition.

For example, if in a table I have the tuple (user_id=user1, firstName=Costin, last_name=Cozianu, password=<md5_hash>), that can be seen as the proposition there is a person who is identified to the system by the user_iduser1, and whose name is Costin, and last name is Cozianu, and whose chosen password has that particular hash code.

When we formulate queries to the database, the database uses a set of rules to make deduction and give us other propositions derived from the basic facts that are directly stored.

When we issue an update to the database (I mean update OR insert OR delete)it can be interpreted like we really say: the reality has changed, the new facts are the following : ....

The applications of this principles are just too many to put on this page. But I'll try to present some.

-- CostinCozianu


Denormalization

One immediate consequence is interpreting what is a normalized schema and what is denormalized.

A fully normalized schema is when we cannot state a fact more than once. A denormalized schema on the contrary allows us to state a fact more than once, with the (negative) consequences described in DenormalizationIsOk.

Objects and Database.

A database is not supposed to store objects, like an engineer's design of a bridge is not supposed to store a bridge, but to abstract a bridge and give it a mathematical representation, allowing the engineer to perform calculations, formulate hypothesis and prove theorems (the bridge will not collapse). The whole purpose of modeling not only in software, but in all the domains that I can think of is to give us a mathematical and logical representation of the realities in an as abstract form as possible in order to allow mathematical and logical reasoning. The purpose of modeling is not to store the reality itself, but to store relevant facts about the reality.

From here, it is easy to see the fallacy that was exemplified in RelationalHasNoObjectIdentity when somebody reproached that relational cannot store objects with no data, that is objects without information, no facts about the underlying reality at all, but merely we had an instance in our runtime memory and its class was X, we can't further identify it.

Object identity cannot properly model anything at all in the reality, the only real modeling is through information (i.e. the car in the dealer stock with the VIN:'XXYYZZZ', or the [??]). In order to model something our information about objects in reality should be identifiable to users like a VIN, or like a SSN or any primary key at all, otherwise if the only distinctive information is object identity, the information is essentially lost.

It is lost because we can't connect an user to an information that stayed at a particular address in the runtime, or has a particular address (physical or logical) in the persistence storage.

-- CostinCozianu


I'm a big fan of relational theory. However, we must consider the theoretical soundness of relational theory as being an independent matter from its effectiveness in expressing business requirements or comparison against OO languages for this purpose.

Object Identity is a *perfect* model of a property of reality - the distinction between objects. Sometimes other unique information may be available and preferred for key purposes; sometimes not and the identity is used. Business requirements typically do not want to turn away clients because they have a same first & last name as an existing customer. As a general principle the business *relationship* is considered primary, and this relationship is an *identity* for which rules may vary & occasional exceptions be made.

eg, Customers holding separate accounts for home & work use.

Additionally, there are cases where identification is by context of the object; eg sections of a text document. Now with our theoretically perfect relational theory and our expressed approach of 'concrete properties', how are we going to manage a document structure for updates? -- ThomasWhitmore

I have kicked around relational representations of documents. I am not sure nesting (hierarchies) is the best approach. It is kind of unintuitive to create a whole other nesting level just to italicize a section of text. That is letting the tail wag the dog. Plus, the boundaries between "treatments" of text are not always nicely represented as hierarchical. Further, cross references and referenced "styles" are are well suited for relational. Perhaps start a ModelingDocuments? topic? It is a fun abstraction exercise. -- top


Even if we are to grant the idealized world of completely normalized data across an enterprise, why should the enterprise's data necessarily be stored in a RelationalDatabase? ObjectOrientedDatabases are just as capable of storing a completely normalized domain model. Likewise, I may choose not to store a normalized model in either one. -- MarkAddleman

You can store a completely normalized model in a ObjectOrientedDatabase as well. But object oriented databases of today are limiting the form of facts that you can store in there (they HAVE to be image of some objects), the way you make deductions from there (queries), the way you transform informations (update facts) - you cannot perform set operations, and so on. Particularly you broke the DatabaseApplicationIndependence which can be seen itself as a consequence of the above principle. Facts that we use to model reality are essentially and by their very nature not tied to one OO language. See the tuple/proposition about me as above. It should be true, meaningful and available for further interpretations (get me the username of all users who haven't payed the bill in the last 3 months for example) in any OO or non-OO language.

More, on object databases is a little bit tougher to make assessments if you really have a fully denormalized model, because the object identity is kind of staying in your way. It's really much tougher to judge in terms of associations, aggregations, compositions, and so on, if an object model is in the higher normal forms (4th and 5th), even if they are usually in the 3rd normal form.

As C.J. Date affirmed, at a closer analysis, an application of DatabaseIsRepresenterOfFacts principle is that any proper OO database should be relational and any proper relational database should also be OO. Unfortunately, the assertion is quite far from the reality of today (both OO and relational database implementations). SemanticBinaryModel tries to bridge this gap. -- CostinCozianu


object oriented databases of today are limiting the form of facts that you can store in there (they HAVE to be image of some objects)

I don't see how this is any more limiting than representing all the data as tables and columns.

the way you make deductions from there (queries)

True. It would be nice if every ObjectOrientedDatabase supported a common query language like OQL. In practice, is this really an issue? Importing data into a RelationalDatabase is a well understood problem. I don't see how this is any different than using one system for OLTP and another for OLAP.

<thinking out loud> I think the ODMG has a proposal for an XmlSchema of data exported from an ObjectOrientedDatabase. Is XPath a fully relational query language? </thinking out loud>

the way you transform informations (update facts)

Just as having a single system of record for an enterprise data is a GoodThing, having a single source of the business rules is a GoodThing. In the best of circumstances, allowing multiple applications to update data violates OnceAndOnlyOnce. More likely, subtle differences in the business rules will cause updates that are incompatible with each other or completely contradictory. ObjectOrientedDatabases solve this problem by encapsulating the facts with the logic that maintains them.

Not only does should an enterprise strive for DataNormalization?, it should also strive for CodeNormalization.

-- MarkAddleman

The problem with code normalization is that you absolutely can't trust the code to maintain your data integrity. So the object databases don;t solve anything at all, really, unless they offer you an alternative to specify constraints. Because the languages of today and the formal proof of correctness are still too far apart. It's very hard to reason if a particular piece of code is absolutely correct, but if instead you can specify CHECK predicate_about_this_tuple, or CHECK predicate_about_this_transition or CHECK predicate_on_this_whole_relation it is much easier to be confident you won't have data integrity problems.

Unfortunately, relational databases of today allow you only a limited form of the first, and the second can be partially resolved by triggers, while for the 3rd ... I'll address encapsulation in another page, encapsulation limits the usefulness of data while integrity is best enforce at a logical level. -- CostinCozianu


I can check multiple tables from a single trigger on one table in Microsoft SQL Server. I presume many popular DBMS support this. So all three checks are possible. -- JeffWinchell

The basic idea is to have a declarative language for specifying integrity constraints rather than programming. This has some advantages, and the most important that the implication of triggers cannot be fully evaluated by a DBMS engine the same way it could with a SQL operation. But more important triggers are something hidden from the client, or at least they are a serious barrier for the client programmer to reason about the implications of a SQL statement.

Let me give you an example of something that you probably can't do within a TransactSQL trigger or any other kind of trigger (warning I might be wrong here): check the same (not other) table for a valid transition to the new state, as a result of a update delete or bulk insert statement. That's because you have the old and new image only of each row for the row level triggers. What you also can't do is that you can't have transaction level checks that are done before the whole transaction is committed, rather than for each individual statement. -- CostinCozianu

Triggers are like tigers, hidden in the woods lurking. You cannot see them until they attack and then they are the last thing you will ever see. :-)


What I'm trying to discern is the difference between DatabaseIsRepresenterOfFacts as described versus SystemOfRecord?. I think the problem that I have with your description of DatabaseIsRepresenterOfFacts is that it assumes a particular implementation. I have yet to find a business person who cares about NormalForms or even technology of any kind. They care about answers and the technology is merely a tool to achieve it and as long as the tools provide consistent answers at a reasonable cost, they're happy.

It may be true that a RelationalDatabase are able to provide more consistent answers than a ObjectOrientedDatabase because you can describe data integrity constraints. However, it may also be true that an ObjectOrientedDatabase can provide more consistent answers than a RelationalDatabase because business rules are more tightly coupled with the data and the rules can be more expressive than you would practically want in a RelationalDatabase. I think the decision between the two is an engineering decision (meaning it involves more art than science). -- MarkAddleman

Your assertion is nice but business rules coupled to the data means actually nothing at close analysis, or at least to me is subject to StopUsingMetaphors. So in the RelationalDatabase more rules about data integrity (consistency of stored facts) can be stated in a declarative manner, which is safer and easier to verify, as opposed to an imperative manner in an OO programming language. The imperative approach is more subject to application bugs, and no matter how much you trust your programmer capabilities, it is harder to verify their correctness then if you just choose to simply declare business rules. Of course some business rules and business processing logic have to be encoded in an imperative programming style, which is also what you do in an application that deals with a relational database. -- CostinCozianu


Declarative vs imperative are both possible in OO languages, to a substantial degree. Assertions and unit tests are both examples of declarative constraint. Having done major work with declarative and imperative systems in OO languages, I have to state there are some very clear advantages and reliability factors to imperative structure in certain areas.

If your triggers support before/ after image accessibility to allow useful transition processing, and they also support joined evaluations, then perhaps they'll provide before/ after images of the joined data for the trigger/ constraint to use. I have worked on engines and algorithms to produce such results. Don't know how much milage of hierarchy recursion you expect to get, though. -- ThomasWhitmore


Database is a Filing Cabinet

A database is merely a collection of text that has been typed in by users over an extended period of time. I'm not sure what elevates this data to a set of "facts". The database is often a combination of truth, conjecture, and lies. The latter often comes into play when a user is forced to enter a value by some business rule even though the user may not know the actual value. The database is not a representer of facts; it is merely a persistent storage of manually entered data. -- WayneMack

If that's how you look at it, that's what you're going to get. No more, no less. Good luck using your friendly database as a filing cabinet. -- Costin

It is useful to consider the implications there, though. An over-literal interpretation of the database as a collection of "facts" precludes an "UPDATE" verb; since if the database says it, it must be true! It is a fact! The presence of facilities to allow users to correct each others' and the system's mistakes, and to allow different people to be given the praise or blame for a particular fact, is important. -- GlyphLefkowitz

Facts can change. Or rather, something that's true at one time may not be true later. Codd gets around this by describing relations as "time varying", though that's fudging things a bit. Date's formulation is a bit better: A relvar can change (just like any variable), but its current value, the underlying relation, is constant and refers to the facts as they exist at a given point in time.

Personally, I think we should go the SpecialRelativity route and treat time as an inherent component of the relation. Then you'd never have to update - it's just inserting a new tuple to show that the facts are now different. And you automatically get versioning. -- JonathanTang

It's been done. The question is, what to do to prevent running out of always-finite media in the face of frequent updates.

Maybe that is why the Borg invaded a neighboring universe (triggering a nasty war in the process).

I'm a big fan of insert-only data stores (meaning no updates or explicit deletes). The finite storage problem isn't as big as it seems at first. If the data is garbage-collected when no longer referenced, then it shouldn't take any extra storage over what a time-varying data store would. Just because it's fully versioned doesn't mean that anybody can just walk up and ask to see every version that ever existed.

To extend the example from above, dealing with product prices, consider this. There is likely to be a need in this same system to record each customer transaction ("transaction" as in an exchange of goods for money). Along with each transaction, we will need to know what products were purchased, and what the price was at the time of the transaction. Maybe we also need to know some other time-varying data about the product in addition to the current price. This scenario crops up pretty often, in a multitude of different applications.

Using normal techniques, we can't get away with just relating the purchase record to the product record. Later on, the price of the product may change, and our historical record of the purchase would become inaccurate. I usually see this handled by copying the time-varying aspects into the historical record. This denormalizes the data, causing duplication. Often, there are orders of magnitudes more purchases than there are distinct products.

My approach here is to create a versioned product table to record the time-varying aspects of the product. Each purchase record links to a specific version of the product. Later, when the price changes, the historical record is still accurate, and the data is still normalized.

If you work out the numbers, it almost always takes less storage space to use this solution. The only exceptions I know of are where the time-varying data is smaller or the same size as a foreign key to the versioned table.

The problem with this approach is in the assignment of ownership of the "facts". The price (or at least the billing price) needs to be determined by the salesman, the sales organization, or some other cognizant group. Prices may vary based on quantity, bundling, inventory, promotions, coupons, damage, and many other factors. The price is not a "fact" to be determined by programmers and DBAs and provided to the sales force. The price is a piece of information that is determined by a salesman and recorded into the database. The database is a filing cabinet and the responsibility for the "correctness" of the data is in the hands of those who enter the data.

One of my major frustrations is to hear "the computer is wrong and I cannot correct it." The database is not a "representer of facts". It is a collection of information entered by fallible people. Information entered by programmers and DBAs is not necessarily any more correct than information entered by users. Information that is correct today or even for this transaction may not be correct tomorrow or for the next transaction. Incorrect information can and will be entered into a database and the correctness of the information will vary over time. When information is found to be correct or found to be no longer valid, it needs to be corrected immediately. The user needs to go into the filing cabinet and correct the information in real time, not request an audience with the oracles (pun intended) of the database and be forced to convince them that the database facts are incorrect.

In some cases, you are right - the salesman determines the price. In others (think Wal*Mart), the price charged to the customer is in fact solely determined by what is in the database. Wal*Mart cashiers don't have any negotiating power. Damaged goods are also not haggled over - there is a clearly defined procedure for discounting them if appropriate. That can all be modeled in the database.

When dealing with historical records, it doesn't make sense to change the data. In some cases, it may make sense to supersede the data with more, newer data. However, that is not the same as changing the one and only copy of the data. If the price changes, then put a new record showing that the current price is X. That doesn't change the fact that yesterday the price was Y. If you stick with the database, then the price yesterday is exactly what the customer was charged. If you need to provide a refund, then you refund Y dollars, not X dollars.

We may have different shopping experiences, but I have found the price charged to the customer is based on the price marked on the shelf. This is often different from the price in the database. Sometimes the item does not even appear in the database. The cashier stands there, article and bar code reader in hand, with the marked price and the database price obviously in disagreement, but the cashier is prevented from making a correction, because the database is the representer of "facts" and the cashier is obviously not competent to determine the facts. Instead of solving the problem the moment it is identified, we submit a request with appropriate documentation to the programming department. In order to process the customer in less than two weeks, the cashier is usually allowed to enter the item using a generic product code, reducing the inventory control benefit that was the intended result of the system. Business decisions need to be in the hands of the people doing the work, not in the hands of some headquarters folk doing database modelling.

Most retailers I have encountered use the receipt as the historical record of a transaction. The cash register exchange is transient data. Is it reasonable to expect a customer returning merchandise to provide a primary key to locate the transaction? Customers often cannot remember the day they made their purchases, much less the specific time and cash register. There is no value in knowing the price paid for a can of peas at register 7 at 5:23:02 on March 3, 2002. This is transient information. Once the inventory counts have been updated and the appropriate accounts credited and debited, the information is no longer of value. Give the customer the historical record and be done with it.

Developers and DBAs need to view databases as filing cabinets. The responsibility for using and maintaining correctness of what is in those filing cabinets needs to be in the hands of those who use them. It is simply debilitating for a worker to stand there and know, when I scan this item I know the incorrect price will show up, but there is nothing I can do about it. There is a wide spread problem that is based on treating the database as some sort of shrine that can only be changed by the privileged few. The end result is reduced accuracy as users are forced to bypass the system in order to do their jobs.

Why is the price marked on the shelf different from what is in the database? If you ask Wal*Mart, they will tell you that's because the shelf has been marked incorrectly. You brought up that humans are fallible. Put yourself in Wal*Mart's shoes. Who is more fallible? The team of people who determine the price of a product based on market analysis and sophisticated planning, or the minimum wage worker who puts labels on the shelf? Even putting aside their fallibility, it is clear to see that the data in the database is checked by thousands of people constantly (If it is wrong, then thousands of cashiers will notice). The price that is marked on a single shelf in a single store does not benefit from this redundant fact-checking.

In the case that the two prices don't match, then yes, the cashier usually has the ability to override the system. The reason is subtle. It is not because the cashier has any authority whatsoever to conclude that the computer is wrong. It is a customer relations move. In a systematic, prearranged decision, Wal*Mart decided that if the shelf is mismarked, the shelf price will be honored. There is a special procedure for carrying out that discount in the system. That does *not* mean that the cashier can just on a whim decide to discount a product. The cashier reports the problem, and the shelf price is corrected. If by some strange alignment of the stars, the data in the database is actually suspected of being wrong, then it is reported up the chain of command to a user of the system who has the necessary privileges to change the data.

I can see your point, as it applies to retail outlets not operating at the scale that Wal*Mart does. If there is a mom-and-pop grocer store with one or two locations, then it definitely makes sense for the user to override any data he deems necessary. That is because not as much care can afford to be taken with the accuracy of the data. In Wal*Mart's case, the system actually provides a necessary security feature to prevent some disgruntled nameless cashier from defrauding the company.

This is getting off on a bit of a tangent. The point was not to discuss the nature of cash registers and customer transactions. Besides that, even in the Wal*Mart case, someone can still change the data - it is just not the cashier. The reason for that is corporate policy, not DBA policy.

If you prefer to abandon the retail sales example, I'm game for that. Another good case (and actually one I know more about, since I work in telecom) is call logging for telephone billing. It is similar, but we may be able to avoid some of the collateral confusion that seems to go along with retail sales.

Tread lightly, though. I am in the second month of trying to get a correction to my telephone bill. An erroneous "voluntary" service was added ($1.00 per month, but I am enough of a miser that I refuse to pay it). The customer service representatives are unable to go back and correct the mistake. Instead they are only permitted to add on credits, but due to prorated amounts, late charges, etc., they still cannot undo the problem. I expect to spend more time with customer service when my next bill arrives, but I will probably give in after that and pay the billing error just to make the problem go away. This is not an industry specific problem, but a general attitude.

I think the point I'm trying to make is that it shouldn't be possible to change historical records, regardless of whether they are correct. The customer definitely wants his bill corrected, but there are other uses of the data to consider. If the bill is wrong, then by all means, correct it - but correct it non-destructively. Instead of replacing the incorrect data, add some new data that supersedes it. The customer should be happy that way. Also, since the old, incorrect, data is still accessible, the admin who is responsible for tracking down the source of billing errors will have some data to work with. The accountant/market analyst who ran a report yesterday won't be confused by the report showing slightly different numbers today. In this sense, I think that in fact, the DatabaseIsRepresenterOfFacts. It is a fact that the customer was billed $0.30 per minute. Wrong or not, that is what we billed him - it is a fact. It is also a fact that we corrected the situation by crediting $6 to his account.

I guess that argument may not make sense if your database doesn't have any historical data. For example, if you have a database of your stamp collection, then there's no big deal for you to correct a misspelling or something like that. It's not important that we have a permanent record that you originally misspelt a word or put down that it's a 25 cent stamp instead of a 30 cent stamp.

However, if the database isn't your personal stamp collection, but instead is the USPS's stamp database, then it is probably necessary that we retain knowledge of our mistakes. If the USPS meant to issue a 25 cent stamp but mis-typed it as 35 cents, then that is something we need to know in the future. Sure, we'll correct it going forward, but somewhere there is an external artifact of that mistake (the customer who bought the mislabeled stamp). We can't just pretend it didn't happen.

Phew... perhaps we should rename this page to DatabaseIsRepresenterOfWhatIsBelievedNecessaryByTheFallibleAndPossiblyUnscrupulousBusiness?, before we end up deciding that database products should be omniscient, prescient, or psychic.


To return for a moment to the original point: DatabaseIsRepresenterOfPropositions?, or assertions, or predicates, not facts. And, contrary to Costin's assertion at the top of the page, there may be no belief that the proposition is, or ever has been, true. Each such proposition may be the subject of many other propositions, which assert, for example, whether the proposition is believed to be true, with what degree of certainty, since when, by whom, for what reasons. (These propositions about propositions are sometimes termed "meta-data", but that adds little but obfuscation to the topic. The important point is that "meta-data" itself consists of propositions with their own "meta-data"... recursively, asserting who says that a given person believes something, with what degree of certainty...)

It seems to follow from such a view that a table should define a set of propositions which may be made about some thing of a certain type(viz, its predicates), where the subject of the predicates is uniquely identified by the primary key. Strictly, this means that any table with more than one non-key column is a denormalized table (unless all the columns are taken together to form a single predicate). Such extreme normalization is often best avoided in practice, although all too often it is not even considered as an option. So DatabaseIsDenormalizationOfProposition?...

In reality, though, it seems to me that enterprises are disproportionately interested in their own records, rather than the entities that the records relate to. The transaction is more important than the customer; the accounting records are more important than the transaction itself. Interestingly, this reflects human history: in the beginning were the tally marks, then other records... in the real world, ObjectIdentity is still in its infancy...


A database is where I keep inactive data. I make sure it is named in a manner which allows me to bring it into focus when I need it. -- PeterLynch

The database is where dead objects live... HaHaOnlySerious

You can certainly treat them that way, but it is not objectively better to do so.


Dissenting Opinion

The word "fact" bothers me. It's too fuzzy, and thus covers too much territory. An expert system doesn't qualify as a "database" by typical usage, for example, because it also has things like logic expressions. Databases for the most part are not logic-expression managers. Implicit in the concept of "database" is that nouns and noun-related items are kept separated from verbs.

The term grew up in the days where a strong noun/verb separation was common and databases merely extended this viewpoint by creating "noun machines" (databases) to compliment verb machines, which were application programming languages more or less. But when OOP and to a lessor degree logic programming came to the table, this distinction didn't digest well with them. OO and LP don't "like" the heavy partitioning along noun/verb, and tend to have a problem with the concept of "database". But whether DB's are "good" or not is separate from the existence of the concept, which assumes the existence of the partition, or at least that partitioning is performed.

As another "test", a bunch of expressions in ProLog are not a "database", but could be considered "facts". When we consider what a database is, we also have to consider what a database is not. "Facts" doesn't exclude much and doesn't exclude ProLog statements.

However, I agree that using English to create a "hard" line between what is and what is not a database will be difficult or impossible and this is ripe for LaynesLaw. There are many different ways to represent the same information. Further, a rigorous definition may not match "street" usage and visa verse (TermUsageVersusRigor). --top

You are attacking a StrawMan, TopMind, because the word "fact" was not the only relevant word in the title. The word "representer" is also there, easily legible and equally important. An expert system doesn't qualify as a database because it doesn't represent facts - that is, it does not make representations of facts (nor management of said representations) directly available to users of the service.

Well, I hate to say it, but "represent" is also open-ended. AI systems do indeed "represent" facts. Our brain represents facts via chemical signals/states.

Now you're wasting our time with frivolities: our brains do not provide APIs for accessing representations of facts, nor do they even supply a consistent representation. And the AI point is simply incorrect. In general, AI systems are not representers of facts. Plans, paths, heuristic guesses, etc. are the traditional provinces of AIs, and they do not need to maintain representations even of these things (consider procedural or reactive AIs). Very often an AI references or interacts with a database, but that's just inclusion. Inclusion is not identity -- you aren't going to start arguing that cars are combustion engines, now, are you?

I don't see what having an API or consistence representation has to do with your definition. They appear irrelevant. And my brain does have an API: my mouth and fists. [regarding AI] Without a clear definition of "facts", it's hard to check this. Let's focus on one for now: neural nets. They can indeed represent domain "facts". -t

In English, the 'er' extension of a verb strongly implies a service, device, or agency that performs a task systematically (and, ideally, 'consistently'). A neural network is not a "representer" of facts. Given communications access (i.e. API, Input/Output) to an arbitrary neural network service, it is (in general) impossible to ask for it to systematically represent facts, much less manage which facts will be represented (as would be required for a 'database management service). You could create a highly specialized neural network that can serve as a database, of course, but that would be quite an accomplishment. Your alleged confusion with the definition of "facts" is a RedHerring here, but if you are genuinely interested you should review the above discussion that clarifies 'facts' as logical propositions, declarative assertions, and similar well-defined concepts.

Your "er" statement makes no sense to me. As far as the last part, they are all interchangeable. "Logic proposition" is merely an "implementation" of a fact, one could say. There are multiple ways to represent the same information. And just because a neural network can have a difficult interface is moot. I'm tempted to call it a RedHerring.

It seems you need a remedial (elementary school level) lesson in English. Do you know the difference between 'parse' and 'parser', 'compile' and 'compiler', 'print' and 'printer'. It's verb and noun. Verb and noun. Verb and noun. Further, in each case the 'noun' is providing a service that performs the associated verb systematically, consistently, and usually on demand. The same is true of 'represent' vs. 'representer'. Any given program (be it an AI, neural network, etc.) stores a few facts internally, giving these facts some internal representation, but most programs are not 'representers': they do not provide a service of systematically and consistently 'representing' things. If you look above, this was already clarified for you where I first said: "it doesn't represent facts - that is, it does not make representations of facts (nor management of said representations) directly available to users of the service." Your suggestion that the definition of 'fact' is relevant to whether a neural network is a database is the RedHerring, TopMind. It doesn't matter what a "fact" is because neural networks and AIs are not (in general) "representers" of anything whatsoever, much less a specialized subset of that 'anything' that we might decide to call 'facts'.

You are still making no sense. NN's need information to do their jobs, and that information can easily be called "facts". You provided no CLEAR way to tell the difference between information and "facts". How exactly does one tell between information that is "facts" and information that is not? "I know it when I see it" is not good enough. (Note that probabilistic facts are still facts.) Also note that I have an urge to retaliate for your social rudeness, but I am resisting. I cannot promise that will last.

In any case, we don't need a formal definition of database... we need only a definition good enough to distinguish database services (and the associated database management systems) from other common services (like parsers, compilers, pretty printers, network transport, transaction managers, pure persistence, etc.).

Fine, we have "notions", but this topic is about definitions. If you are not providing a fairly clear one, then the topic is close to being pointless (or at least raises more questions than it answers).

DatabaseIsRepresenterOfFacts is a fine definition, and reasonably clear. It leaves the field just as open as it needs to be: facts can be represented in many ways (logical propositions, functional programming assertions, heuristic rules, etc.) and can be systematically represented to users in many ways (including relational, navigational tree, association graphs). If things seem 'unclear' to you, perhaps you are the problem: if the words are unclear, go see an eye doctor; if the concepts are unclear, go get an education.

It's too open-ended; see below as far as "kinds of facts". Also, I seek a definition that is clear to practitioners, not just academic types. If that requires clarifying words such as "fact", then so be it. I know you want this wiki to be only for academics, but it's not. Ward has implied such many times.

[You only see it as "too open-ended" because of your lack of understanding. Please do not assume every practitioner, or even many practitioners, are as anti-education as you are. I'd be hard-pressed to think of any practitioner I know who would have any difficulty with DatabaseIsRepresenterOfFacts, and indeed ChrisDate delivers presentations on precisely that basis to groups of practitioners who grasp it without difficulty.]

If they are potentially that clear, then turn them into "mechanical" rules instead of the indirect round-about obtuse crap language you use. Usually if they cannot be turned into a clear set of rules for inclusion and exclusion, then the claimer is full of shit or has an incomplete notion they need to work out. You repeatedly make lame excuses to avoid clarity and commitment in the NAME OF academia. It's an intellectual cop-out. Why do you have such a hard time with old-fashioned western reductionalism and StepwiseRefinement? Everything you are involved with seems to "deserve" an exemption to reductionalism. Cooincidence? Or bullshit? I believe it's the latter. -t

[Huh? This page is abundantly clear, the sort of thing the majority of first-year database students grasp without difficulty. We can presume the majority of practitioners are university graduates. What does your difficulty with DatabaseIsRepresenterOfFacts say about you?]

Bullshit. I graduated an A student. Your "textbook" is simply shitty. Perhaps you are an academic genius, I don't know, but I do know that if you wrote textbooks, they'd be the worse around. And you seem to have no desire to fix that weakness. You seem completely unable to grasp the StepwiseRefinement documentation style and western reductionalism. Don't project the fact that you are crappy communicator onto me by insulting me. The first step to improvement is admitting you have a problem.

[I've no intention of revising my communication style to suit only you, especially as no one but you seems to have a problem with it. If you find the material presented here too difficult, go read a textbook. I recommend "The Essence of Databases" by Fred Rolland. It's intended for rank beginners. Does the phrase "diploma mill" mean anything to you?]

Almost nobody reads your shit here. You think it's all dandy and nice, but in fact people find it nebulous and obtuse, and thus ignore it. I already know the "essence". We don't need essences, we need clear rules for what is and what isn't.

[Really? What do you base that on? Who are these "people" that find my writing nebulous and obtuse? That would be you and... Who else?]

I believe via experience I have a pretty good feel for the writing style that people like and don't like. Further, almost nobody participates in your discussions when you slip into that obtuse style.


The term "facts", and the broad term "database", are both potentially misleading to the uninitiated, as evidenced by some of the debates above. It would be less contentious, and perhaps slightly more accurate, to state that a relational database is an explicit representation of true propositions under the ClosedWorldAssumption. All databases are (at least) implicit fact (insert arm-waving about definition of "fact", here) representers (what purpose would be a collection of non-facts, i.e., things we do not believe to be true?), but relational databases manifest this explicitly: a tuple in a relation is, by definition, a proposition that evaluates to true.

But a fact can be something like, "All eagles have wings". This is not the kind of information normally kept in a "database".

{Sure it is, especially in association graphs and various classes of database aimed at supporting AI or automated learning. Albeit, this sort of fact is typically not represented in a relational database. Any given basis for representation will have limits in which sorts of facts it can directly represent, and relational is more limited than many. For example, relational cannot readily represent CNF or DNF clauses, as in "((X AND Y) or (Y AND Z) or (X AND Z)) is true". Relational is limited in which facts it can easily represent in order to simplify the RelationalAlgebra and RelationalCalculus. Perhaps you should learn more about database technologies other than relational, TopMind, if you don't come across as a biased ignoramus.}

Further, a list of tasks/steps to perform can be kept in a database in which case it acts similar to what an app program may do. And an app-program may have tons of set/gets that could instead be attributes in a database. The potential overlap and re-mapping of the same info is great, if not unbounded.

I doubt there is any hard boundary condition(s) that sets databases apart. Life is rarely that simple. But, we can look at tendencies, and the common element I see is "attributes". Databases are optimized to store and process attributes, not logic expressions and not large text and not binary blobs. (They can and do, but it is not their forte. If you need a lot of them, then you use a different tool.) They are mass attribute management and processing machines. I invite you to similarly study the tendencies and typical features. What patterns do you see? -t

Elsewhere, I've suggested that a suitable definition for "database" might be "a persistent collection of values" and that a DBMS is a mechanism for setting and retrieving those values. So, I don't object to your definition of "database" in general, but arguably the attribute values are... Facts. See above re "what purpose would be a collection of non-facts?" After all, databases are organised for a purpose, and that purpose is ostensibly always to collect facts, even if the fact is "here is a random number". This page, however, seems biased toward relational databases in particular, which are explicitly representations of true propositions, aka facts. Regarding your "all eagles have wings" example, note that typical databases are not intended to record all facts, but are intended to record facts meeting given predicates.

You seem to be agreeing that "facts" by itself is too open-ended and are starting to make a distinction between kinds or formats of "facts". That's my point. I'm trying to find a way to describe the kinds of facts that databases are geared toward. And, "predicates" is perhaps too relational-centric, for navigational DB's relied on them less. -t

Huh? You seem to have completely misunderstood my point. How did you arrive at "'facts' by itself is too open-ended"? I was merely pointing out that your example was poor, at least in the context of relational databases, and seemed reflective of your own misunderstanding. You appeared to have confused "a database is a representer of facts" (true) with "all databases are a representer of all facts" (false).

Let's back up here. Is "Database Is Representer Of Facts" an attempt to define a "database"? or merely make a statement about them?


See also DenormalizationIsOk EntityRelationshipDiagram DatabaseIsRepresenterOfEntities

The later page being claimed as bunk, see CostinCozianus page (for lack of a better place).


"Facts" are in the head of the perceiver. They don't exist outside of brains. Their utility is defined by the utilizer. Thus, they are subjective. You will not find an objective math, formula, or algorithm that determines fact versus non-fact, at least not without cracking open some noggins. Perhaps some definition such as, "facts are information that can be used to make predictive models" is the closest one may get to something objectively testable (model matching reality), but most would consider it too narrow. Related: WhatIsIntent. -t

Uhm, math is in the head too. So what? WhatsYourPoint?? There is no specific three line algorithm that determines "math". Therefore, is math a scam or a UsefulLie that we should basically abandon in favor of "psychology" and "feeling"? Also, if there were no humans on the planet, there would be animals available that would be saying to themselves "the fact is, plants do exist, and I can go and eat them. Boolean=true." The animal wouldn't speak English, but he would do boolean logic in order to survive, without knowing he is using a boolean. Will I get killed if the Lion chases me? Boolean=true. So the animal runs away and survives. Our survival depends on logic and reasoning. --TongueInCheek?

I'm not sure what your point is. I didn't say math was useless. If you can use math to determine objectively what is a "fact" and what isn't, then do it! If it's not objective, then admit it.

Can you prove objectively what math is, using something else, like science or poetry? Why would you be so stupid to waste time trying to prove things like 2+2=4? If EverythingIsRelative then 2+2 does NOT equal 4, it could equal anything, because EverythingIsRelative. Your arguments are childish and are CopOuts. I could build a computer program that makes 2+2 equal some other number, and flip the numbers around (change the notation). 8 now becomes 6, and 5 now becomes 3. Because EverythingIsRelative. What you need to do is start thinking about UsefulTruth and stop twisting the truth to your own crankery (quackery).

I am not (intentionally) twisting anything. What math does is create internally-consistent models. You can have objective truths within such models. The problems usually come about when somebody tries to attach or map a given model to the real world and the fit is questionable. Spoken language has proven difficult as a tool to find models that everyone agrees on (LaynesLaw), or even that most agree on.

We don't have to have a perfect-fit model to come to an agreement, we just have to agree with the model, and then use that model to make further conclusions. Thus, objectivity is not a necessary ingredient to solve specific conflicts. Over-focusing on EverythingIsRelative can distract from other solutions to disagreements. We can kick different models around, find out the weaknesses, tune them, etc. and they grow better over time. The side-tag typing model is a good model to compare an analyze some programming languages with. No, it's not perfect, no, it doesn't cover some aspects we want it to cover, but it's better than the alternatives so far. A Yugo is often better than a bicycle. --top

The side tag typing model is not a model - a side tag is one way to temporarily store information about the type (a label). That is not a model, it is a feature or implementation detail. An MP3 file can still be an MP3 file if it has no file extension (tag) - you could tag it down using the ".mp3" file extension if you wanted. The mp3 file extension is not a model, it's just one tiny detail in the file system. It's like calling a file system, a file extension. A file system is not a file extension. A file extension is just a tag for the file system. Side tags are a GrossOverSimplification and complete misunderstanding of Types. I believe TypesAreNotTypesTheyAreTypes had some more details on this complete ignoramity.

Why is it an "implementation"? And an MP3 file with no extension may still be an MP3 if you "know" what it is, but that's because the tag is in your head (instead of in a file name).

The computer can parse the MP3 file and find out it is an mp3 file based on its binary pattern. The type of file (mp3) still remains the same, despite the mp3 not having any side tag (file extension). You seem to want a side-tag free programming language - how is this a good thing that you lose information (label) about the type? How would it benefit humanity? Apples and oranges also do not have side tags - they have certain features and properties that makes them the type of fruit they are. They could optionally have a sticker on them (tag) to help humans identify them easier. The tag on the apple is not a type model - the sticker on the apple is just a useful label like a file extension.


See Also: SemanticMapping, DatabaseDefinition, DatabaseIsRepresenterOfBs


CategoryDatabase, CategoryDefinition


DecemberZeroNine


EditText of this page (last edited March 15, 2012) or FindPage with title or text search