Standard definitions for commonly used terms about databases.
Database - Definitions from Heavily-Referenced Books
- "An updatable storage of information of an application's world and managing software that conceals from the user the physical aspects of information storage and information representation." Dr. Naphtali Rishe '92 : Database Design: The Semantic Modelling Approach, see SemanticBinaryModel
- A database is a collection of persistent data that is used by the application systems of a given enterprise. C.J. Date AnIntroductionToDatabaseSystems
- it must be noted that (unlike Rishe), Date quite strenuously objects to calling managing software 'a database'
- Nit picking here, but I'd drop the "enterprise" since a database can be used by a single small person not part of a large company (big undertaking). Also, databases are not always persistent - if one is munging around data it may be temporary.
- Now, usually when picking nits with words you jump right to an English dictionary. Have you bothered looking up 'enterprise' yet?
- Yes, and it matches my definition.. an industrious large undertaking, etc. A big undertaking is not what requires a database. In many cases databases can be used without the situation being a "big industrious undertaking", i.e. a big company vs a small person munging around small data he mined from the internet or similar.
- The word "large" doesn't appear in my dictionary's definition of 'enterprise', nor does any synonym.
- Mine says Big, Bold undertaking and similar words. Google define also says so. Oxford also says especially bold.
- {EnterpriseApplication offers some suggested definitions.}
- Oxford also says Enterprise can be a company or business as one of the selections. I see that Enterprise may also be a BuzzWord, tee hee hee. I didn't state that, but EnterpriseApplication does.
- "...used by the application systems" implies that the information is shared by multiple applications. I point this out because the implications of that single "s" may be missed.
Short & simple:
Dictionary definitions:
- a mass of data in a computer, arranged for rapid expansion, updating, and retrieval.
- a structured set of data held in a computer.
Pet Definitions from WikiZens:
- A tool or tool-set that packages common attribute processing and attribute management idioms. (I don't claim there's a water-tight definition of "attribute".) --top
These definitions are actually quite nice, because a mass or structured set of data does not imply to a certain bias such as an enterprise. A small person or anyone can store a mass of data for retrieval, expansion, updating. The data is obviously arranged.
However I'm not sure if computer should be part of the definition since one could store data via a card system or offline storage possibly without a computer - but then would it be a database?
Is the human brain a special form of data base?
Typical Services Provided by Database Management Systems
- Persistence
- Query languages or query ability (standardized collection-handling idioms, see DatabaseVerbs)
- Metadata repository
- State management
- Multi-user contention management and concurrency (locks, transactions, rollbacks, etc.)
- Backup and replication of data
- Access security
- Data computation/processing (such as aggregation and cross-referencing)
- Data rule enforcement or validation
- Data export and import utilities
- Multi-programming-language and multi-application data sharing
- Data change and access logging
- Automated "search path" optimization (user focuses on what, not on how)
Database System, Database Management System: A general-purpose software system which can manage databases for a very large class of the possible application worlds Rishe '92
AnIntroductionToDatabaseSystems defines these functions/area of responsibility for a DBMS:
<quote type=approximative>
- Data definition. A DBMS must be able to accept data definitions in source form (a Data Definition Language) and convert them to the appropriate object form
- Data manipulation. A DBMS must be able to handle requests to retrieve, update, or delete existing data in the database or add new data in the database. Generally this is done through a Data Manipulation Language
- Data security and integrity. The DBMS must monitor user requests and reject any attempts to violate the security and integrity constraints defined by the DBA.
- Data dictionary The DBMS must provide a data dictionary function. The dictionary contains data about data ( metadata or descriptors ) (DataDictionary)
- Performance It goes without saying that the DBMS should perform all the tasks identified above as efficiently as possible
</quote>
Re: "The DBMS must monitor user requests and reject any attempts to violate the security and integrity..."
I disagree with this as an absolute requirement. One can have a large and usable database [DBMS] without any formal validation or checking. If one turns off all checking in a given database, and moves that functionality to applications, the DB is still usable. One may say the "quality went down", but being a database and being a good database may be different criteria.
Certainly, and we can also call it a DBMS if it randomly forgets data, or if it randomly corrupts or transforms or invents data, or even if it randomly refuses 95% of all service requests. Maybe not a 'good' DBMS, but still a DBMS...
[Explicit constraints (including security) and higher data integrity essentially distinguish a DBMS from a file system.]
I disagree. A file system is not a database because you cannot add new attributes. If you put constraints on a file system, that would not make it a database.
Database Model: A convention for specifying the concepts of the real world in a form understandable by a DBMS. -Rishe 92
Examples: relational model, network model, hierarchical model, ODMG object model, semantic binary model, deductive database model, etc.
Why do we have a need to have some standard definitions? Because very often people tend to forget the basics about the most commonly used concepts. And instead they substitute definitions with metaphors.
Related to the above definitions, there are some principles that need to be taken into consideration:
Some Definitions from People that Haven't Written Highly Referenced Books:
A database can also be considered a master, general-purpose (or semi-general-purpose) AbstractDataType.
TopMind's definition: They are "Attribute managers". They pre-package many common attribute and collection management idioms/abstractions into a single tool.
- "A database management system (DBMS) is a collection and attribute management tool that standardizes or centralizes common attribute- and collection-processing and management idioms in an efficient and scalable manner. A database is the information related to a particular organization that is processed and stored by a DBMS." -- TopMind
- [At best, you've paraphrased Rishe and made the mistake of conflating "DBMS" with "database", and at worst you're wrong. By your definition, neither ExBase or MicrosoftAccess are a "database" because they're not scalable, for any (assumed) definition of "scalable".]
- I've made adjustments based on your critique. As far as "scalable", that is a continuous concept, not a Boolean property. (Note that MS-Access is more scalable than the DB's that existed in the 60's when the term was coined.) Perhaps that makes DBMS also a continuous concept. I see no problem with that, for life is inherently fuzzy.
- [Life is inherently fuzzy in your own mind. Please do not impose your lack of focus on us. MS-Access might well be more scalable than, say, Jim Button's PC-File. It is not more scalable than any well-known production database systems from the 1960's or even earlier. By the way, it is no longer the 1960s. Please stop quibbling in order to defend a bad component to your definition. You don't lose face giving up "scalable" and you don't retain it by hanging onto "scalable".]
- If it's not fuzzy, then please state the clear rules, such as, "It must be capable of storing at least 123,456,890 bytes" or the like. And why would the definition change over the ages? And, I doubt the average DB in the 60's stored more than a few Megs at best. Perhaps there were some FBI, CIA, or military DB's that went into the Gig range, but these were not common. And they didn't have the feature set nor flexibility of desktop-DBs. They were usually tightly tuned for the hardware of the time. I've worked with a post-60's version of IBM's IMS, and it could be a bear. I imagine the 60's versions were even more particular. (Although, they were probably more reliable once settled.) Most would agree it's NOT about "size" anyhow, but about what the tool is meant to do and/or is used for.
- [You're quibbling, and your quibble only serves to highlight the fact that scalability is irrelevant to the definition of "database".]
- I was not the one who made scalability an issue.
- {You lie. Your words (regarding DBMS): "collection-processing and management idioms in an efficient and scalable manner". You then define database in terms of a DBMS.}
- Let me clarify the context: I was not the one who made scalability of ExBase and Access an issue. I was not the one who introduced the "quibble" about those products. The discussion about the 60's etc. above was addressing the statement, "By your definition, neither ExBase or MicrosoftAccess are a "database" because they're not scalable...". In practice, scalability is a continuous concept, not Boolean. I think most would agree that infinite scalability is not a requirement of a DBMS. I doubt a discrete definition of "database" can be created (and still be a good def) and I will not draw an arbitrary size line in the sand. Perhaps some kind of weight-based definition can be built. -t
- {It is good to not draw arbitrary lines in the sand. It is bad to include 'scalability' in the DatabaseDefinition, however, if it cannot be applied to distinguish databases from non-databases. One might reasonably say a database needs to manage a variable (and usually growing) number of facts (or attributes), and that the ideal range of that variable is everything between zero and infinity, and thus scalability is an ideal property for a database (along with reliability, persistence, performance, multi-user concurrency, and so on). But you aren't doing yourself or your definition any favors by insisting that scalability be directly included in the definition.}
- It is a factor surrounding the concept. See "Attributes and Weighted Definitions" below.
- [You could remove "idioms in an efficient and scalable manner", "common" and "related to a particular organization" without losing anything, and you'd gain accuracy. I.e., "A database management system (DBMS) is a collection and attribute management tool that standardizes or centralizes attribute- and collection-processing. A database is the information that is processed and stored by a DBMS."]
- I was trying to convey that the "standardization" is beyond the organization's idioms. Aggregation (such as GROUP BY), for example, is an idiom beyond the scope of any given organization. It is common to many organizations, not just one. Otherwise a biz-specific collection idiom processor would fit. Perhaps I didn't word it well, but it still needs to be specified somehow. It's missing from the other defs also. -- top
- [There are no operators, operations or idioms common to all database systems, except (from an external point of view) UPDATE and RETRIEVE. Many implementations of certain database models - OO, network and hierarchical, for example - do not implement aggregation.]
- I didn't mean common to all. I meant beyond a specific application or company. After thinking about it, there are indeed domain-specific DBMS that have domain-specific idioms anyhow. A text search engine is a possible example. The indexed text could be considered a "database". The common theme seems to be that it is a scope beyond one application. The DB provides an *interface* that can be used by multiple applications. I'm trying to isolate the essence as specifically as possible and test it against various scenarios. The common theme so far seems to be sharability, such as sharing info and idioms across *multiple* applications, languages, and users. Perhaps it is like the DefinitionOfLife: no single sharable feature may qualify it alone, but if there is enough sharability, then it qualifies. Will ponder more...
- [I've seen a number of textbook definitions of "database" that include "shared". In that case, a I suppose a non-shared repository with all the same characteristics as a (shared) database can be called a repository, date store, etc. Such a definition, however, begs the case where precisely the same application and data are used for both non-shared (single-user) deployment and shared deployment. Do we call the, uh, "database" different things depending on whether it's shared or not? Ah, but maybe the "database" definition still holds because it can be shared? And so on. It opens a can of worms - easier to leave "shared" out and concentrate on the universal characteristics of the concepts we like to call "database" or "DBMS", because we obviously do informally apply those terms when the construct is neither shared nor shareable.]
- I suppose the definition we favor will be a kind of Rorschach test: we see in it what we want to. -- top
- [I see no reason why a simple yet fairly rigorous definition can't be found. Personally, I prefer to define it as follows: A database is a collection of attribute values. A database management system (DBMS) is that which provides update and retrieval access to the database.]
- That would describe a file system because the attribute name is the file name and the attribute value is the file contents. I would agree that a file system is on the borderline, though. -- top
- [Under my definition, a file system is a type of database.]
- It would also fit an array.
- [Fine. How about "A database is a non-volatile collection of data. A database management system (DBMS) is that which provides update and retrieval access to the database."]
- So if the array is non-volatile, then it's a "database"? And I'd exclude a file system from being a "database", based on common usage. At the least, it's a borderline case. As I describe in DatabaseIsRepresenterOfFacts, attributes are what set it apart from a file system or "expression engine". -t
- {To avoid hypocrisy on your part, "you need to provide, clear, precise rules/algorithms/formulas" to clarify exactly what "attribute" means.}
- See below.
- [Oracle, makers of the Berkeley DB, which is a mechanism for maintaining persistent associative arrays, would argue that it is a "database". Based on common usage, many would argue that a file system is a database. However, to step back a bit, the problem here is that we're dealing with a descriptive definition, which is inherently subject to varying viewpoints. As such, a scientifically rigorous definition is rather pointless. A reasonably rigorous definition is fine, for all reasonable purposes. Aside from (perhaps) political issues, e.g., "I'm not working on <x>, because I'm the 'database guy' and <x> isn't a database", I can think of no useful purpose for having a precisely "correct" definition that precisely includes all the right things and excludes all the wrong things. It is sufficient to have a reasonable definition that generally identifies the category of things we wish to deal with.]
- Okay then. My definition is "reasonable", and I believe it to be as least as good as the others.
- {I prefer: DatabaseIsRepresenterOfFacts, with a 'fact' being any logical proposition and an attributed truth value (be it boolean or from MultiValuedLogic) for that proposition. That's it. Sharing doesn't matter. The only support a Database needs to provide is a mechanism to represent facts in a manner that doesn't preclude access to these representations. (A 'good' database would need to represent facts in a manner that makes access easy, but not every database is 'good' - e.g. the HierarchicalDatabase makes access more difficult than a RelationalDatabase.) An immutable database (e.g. on a CDROM) is still a database, so update or mutability isn't a requirement. A HierarchicalDatabase is still a database, so the organization of this representation isn't particularly relevant. A filesystem is not a database, but only because files don't possess semantics as representing facts about anything (except, possibly, their own state - which doesn't count). The problem with 'attribute' is (in the normal sense of the term) that attributes need to be attributed to some entity, which will necessarily possess an identifier - whereas 'fact' or 'proposition' is a concept that is above that of an entity (statements of 'attribute' are a small subset of all possible propositions, related to BundleSubstanceMismatch). RelationalDatabase isn't capable of representing all possible facts - it lacks the ability to represent higher-order facts, truth-values other than 'true', and 'or'-style propositions - e.g. "P or X" is true, or "A if B" (= (not B) or A) is true - but it is still a database because it represents facts: you don't need to represent all possible propositions or truth values to be a Database. A Database does need to have a lifetime exceeding 'instantaneous', but doesn't need to be 'persistent'. A good DBMS allows one to query the 'pool' of facts to both learn things you already know and to learn new things (see DataManipulation), and to deprecate certain facts that no longer reflect reality, and to add new facts based upon percept and sensory analysis, and to support integrity (i.e. so your database remains 'consistent', logically). A good DBMS allows mutability and sharing and is application-independent so it can be used by many different projects and organizations. A good DBMS allows one to do all this very efficiently, and without jumping through hurdles and needing to know or remember underlying structure or representation when posing the query. But that's the DBMS. The database is much simpler: a DatabaseIsRepresenterOfFacts, and any representer of facts is a database. Period.}
- [As a conceptual definition, I'm wholly in agreement. I prefer mine as an operational definition, but I agree that 'attribute' was a poor choice of term. I shall revise it thusly: "A database is a collection of retrievable values. A database management system (DBMS) provides mechanisms to allow retrieval from a database, and may provide mechanisms to update a database."]
- {Can you give me a scenario where your operational definition would be of greater use than the 'conceptual' definition I offer? I suppose it avoids the need to identify a semantics processor that treats them as logical propositions (which would often be part of the query language support), but I've never had difficulty in practice identifying both the state and the semantics support - defining both together is what systems like the 'RelationalModel' does as a matter of course.}
- [I believe my operational definition would be of greater use any time it is necessary to precisely and accurately identify databases (or DBMSes) - say to establish a scope of discourse as part of some scholarly activity - without the risk of invoking philosophical debate over the definition and meaning of "fact". See below for a (sadly) typical example.]
- {'Fact' - especially 'logical proposition with an attributed truth-value' - is well enough defined for scholarly discourse, only requiring a choice of logic (monadic? intuitionist? bayesian? epistemological? first order predicate?) to make the abstraction concrete. It's just that top is utterly incapable of scholarly discourse on any subject. He 'thumbs his nose' at academia, presents his own opinions without support, and attempts to dodge any argument to the contrary by taking the 'psychology' approach to HumptyDumpty-style equivocation and logical inconsistency. See below for a (sadly) typical example. If it were him, your operational definition wouldn't help you one bit. He'd just say: "EverythingIsa collection of values if you bend it that way."... by which he would mean to bend his mind to a different view, inconsistent from every prior view (logical inconsistency doesn't bother people with an EverythingIsRelative mindest), each time presented with something new to examine. People like top would only be happy with a scholarly fellow arguing that "'Database' is fuzzy. You can call anything you want a database. You could even call my cat a database, and I couldn't prove you were wrong even though I might be disinclined to agree with you. There is no right or wrong; EverythingIsRelative." Anyhow, dropping that tangent: supposing you were dealing with people who actually cared about precise and accurate definitions. {See DefinitionDiscussions.} I still can't think of any examples where the operational definition you offer would be better than the one I described, which ultimately trims down to: "A database is a collection of values representing logical propositions with their attributed truth-values."}
- [All things being equal, are there any examples where your conceptual definition would be superior to my operational definition?]
- {No examples are required - or, more accurately, a more specialized definition is superior for every example if the definition still happens to encompass all ostensive examples and can still be applied in practice. The utility of any definition is derived of its ability to encompass all that is needed (no unacceptable exclusions) while being as specialized as possible (so you can derive and predict more properties based upon knowing something qualifies for that definition) and also being operative in practice (so you can examine a bundle-of-properties and determine whether it qualifies for the definition) - see GoodDefinition. I don't believe identifying the logic-semantics support as part of the database is problematic (it would typically be in the query language, algebra/calculus for deriving new facts, and even in the DataManipulation language for deprecating and updating facts). As such, it is, to my understanding, as 'operative' as your own. DatabaseIsRepresenterOfFacts is also clearly is more specialized: not just any retrievable string or value-collection or persistence-layer qualifies as a database. It is therefore of superior utility if it also encompasses all that is needed - and you've offered no objections or ostensive examples to that end (and, indeed, some agreement as a 'conceptual' definition). The def you offer does make every filesystem into a database, as you mentioned above, but you've certainly offered no opinions indicating you believe that is a good thing.}
- [I agree that a more specialized conceptual definition is superior to a less specialized conceptual definition, but an operational definition that encompasses all ostensive examples and is simple is always preferable to one that is more complex - and I'd argue that the introduction of "fact" into an operational definition is redundant complexity.]
- {I disagree vehemently. "An operational definition that encompasses all ostensive examples and is simple" is NOT "always preferable to one that is more complex". A potential operational definition is: "EverythingIsa DataBase". This is very simple. This, by nature, encompasses all ostensive examples. And this is NOT preferable. The introduction of "fact" is EssentialComplexity; there is nothing redundant about it - 'value' by nature has only intrinsic identity and projects no meaning unto the world in which you found it - but you're free to argue otherwise.}
- [Furthermore, I'd argue that any retrievable string or value-collection or persistence-layer or filesystem should fall within the definition of "database" (and DBMS as appropriate), because all such things exist within a continuum of common characteristics we usually consider only when talking about databases, database-ish things (like filesystems), and database management systems and their kin, and that generally exhibit a rough interchangeability and/or interconnectedness. E.g., a filesystem may be implemented on top of a SQL database or raw blocks; a true relational system may be implemented on top of the Berkeley DB or a SQL system, a typical SQL DBMS uses a filesystem, and so on. Such a view raises some interesting questions, e.g., where transaction handling should really be located, or where application functionality should be located in a distributed system - as well as providing (IMHO) an interesting comparative perspective.]
- {First, I wholly reject the "implemented atop of" argument. Just because any natural number can be represented as (implemented on top of) a sequence of digits doesn't mean that natural numbers are sequences of digits. You must always separate meaning from presentation (including representation); if you don't, then everything is a continuum starting with your choice of physical matter or percept. Second, the 'continuum of common characteristics' seems doubtful to me - perhaps a PowerSet of discrete features and properties, but no continuum. A filesystem has very discrete different properties than a notebook handwritten with a pen, a string of arbitrary characters possessing no context, or the opportunity to physically measure a temporal distance between two signals or the state of a transistor... but all of these are retrievable values. Property PowerSets only look continuous from the thirty-thousand foot view. Down at the logical level, they allow for clear distinctions - as many as we need or find useful. I don't plan on calling filesystems and volatile memory and textbooks and fanfiction and expert knowledge systems 'databases' simply because they carry a great many retrievable values and fall on some fuzzy 'continuum' one can only see while squinting or 'from a distance'. I'd prefer to be a bit more discerning.}
- If your definition only meant something to formal linguists, it will mostly be ignored anyhow. It would become a WalledGarden definition.
- [Only to "formal linguists???" Nonsense.]
- "Facts" is too vague and open-ended in my opinion. Then again, one could probably complain about "attribute" also. -- top
- {"Logical proposition" is not too vague or open-ended, and nobody cares about top's opinion unless top can justify it.}
- Almost EverythingIsa logical proposition if you bend it that way.
- {That isn't true even a little bit. Protocols, communications, patterns, predicates, abstractions, cats, dogs, force, energy, models, ideas, concepts, ideals, values, goals, motivations, integers, strings, types, mechanisms, strategies, heuristics, queries, authority, security, privacy, logic itself, and a great many more things aren't logical propositions. Among the set of all things, those that are or represent logical propositions form an infinitesimal fraction. And the set of logical propositions represented in a physical medium along with their attributed truth-values is smaller yet.}
- Are you absolutely certain that "cat" cannot be turned into a logic preposition? Do you want to bet money on it?
- {Neither the string "cat" nor the abstraction identified by the English word 'cat' is a logical proposition. If you created a special language or context just for it, you could say that the string "cat" is some sort of code representing a logical proposition in that language or context, but the "in that language or context" would be a critical point: you still could not say that the string "cat" is or represents a logical proposition on its own. Stronger, not even the string "The neighbor's cat is sitting on the front lawn." is or represents a logical proposition - not on its own, not before you are told to parse it as an English sentence. It is a string - simply a value with intrinsic identity, associated with a few string-operators, and some underlying representation (e.g. ASCII or UTF-16). A (string,parsing-context) pair is not the same as just a string. I'd put money and time and projects on the line for this if that is how a hypocrite like 'TopMind' (who doesn't even stake his own reputation or name on his own claims) wishes to measure certainty, but a wager would not be an effective means of convincing you: as a motivator, it would only encourage you to pull the wool even further over your own eyes so you can wave your hands wildly and make ridiculous claims with outlandish scenarios to avoid giving up money in addition to (for an ego like TopMind's, almost worse) admitting to error. Are you willing to stake your time and (admittedly, already in-the-gutter) reputation upon your ability to defend your own claim: "Almost EverythingIsa logical proposition if you bend it that way."? Can you find one consistent and non-equivocal view and context where every instance of every item I listed above (every protocol, every communication, every pattern, every 'cat', every string...) is a logical proposition? Give it a try. Personally, I expect your planned argument depends on massive amounts of internal inconsistency and equivocation, jumping from one context to another for every new instance just so you can view that instance (to the exclusion of everything else) as representing a logical proposition, but I'm willing to be surprised.}
- You cannot honestly rule it out. There may be a formula to describe "cat". In fact, maybe cat DNA is a formula, a very complicated one. For example, "if the animal's DNA fits this profile, it can be considered a cat." As far as "outlandish scenarios", outlandishness is all relative. Describing a "cat" by DNA would be an "outlandish" idea 100 years ago. Further, unless you are the owner, your job is to help the customer manage their UsefulLie more efficiently. Our customers decides on what "reality" we model, not us. I helped one lady in the marketing department process stats for marketing codes. The codes seamed arbitrary and forced to me, but it was not my scope to "fix" that. I focused instead on doing to the codes what she wanted done: deliver her UsefulLie processing engine. The "facts" are whatever the hell our paycheck givers tell us they are. --top
- I'm trying to diet with regard to drawn-out definition battles. Maybe another month. -- top
- {That is probably wise for all of us. Revisit this issue in JuneZeroEight, then.}
- Why June? "Another month" is not necessarily equivalent to "next month" anymore than "another day" means tomorrow.
- {How long do you plan on hedging, exactly?}
- What, are you my mother?
Shared-ness
This is to explore the shared-ness idea raised above. Under this, a DBMS is a tool that shares attribute-handling and collection-oriented idioms among:
- Applications - Not application-specific
- Languages - It is not tied to any one application language
- Multiple Users - Different users can use the database at the same time
As a test definition, let's assume that
two out of three of these must be true to qualify.
Attributes and Weighted Definitions
Re: "To avoid hypocrisy on your part, "you need to provide, clear, precise rules/algorithms/formulas" to clarify exactly what "attribute" means."
I don't think I can provide such. A WeightedDefinition may be more appropriate, including perhaps for "database" itself. "Value", "Data", "Fact", "Information", etc. will likely have similar problems. --top
Provide a clear rule/algorithm/formula for precisely determining 'attribute-ness', then. Or is WeightedDefinition a page you created in an attempt to justify your hypocrisy?
I never implied it was clear, unlike you and "fact".
"Facts", used in DatabaseIsRepresenterOfFacts, was clarified in the first paragraph on the page ("a set of propositions that are believed to be true"). And, TopMind, I have never insisted that a rule/algorithm/formula is needed (or even appropriate) for a 'clear' definition. You aren't a hypocrite for consistently failing to meet my standards.
{A "fact" is a proposition that evaluates to 'true'. How clearer can you get than that?}
- Probably just about anything can be written or re-presented to "fit" that. It doesn't exclude enough to be useful, and perhaps excludes nothing if one is clever enough to provide the conversion/re-representation. One may argue that re-presenting is "cheating", however almost every tool offers some form of "formatting" or "packaging" for output. Thus, transformations are only a matter of degree. Physical representations are implementations and logical representations are in our head and can be "held" any way one pleases as long as their mental model passes whatever objective tests are available. And, a few conceptual "flaws" may not get in the way of practice. -t
RDBMS don't typically store expressions, but attributes. Sure, one can
view or re-write them as propositions, but this is true of just about
any information, and thus provides an insufficient falsification test to "fact", leaving it too wide open. -t
RDBMSs generally store relationships, not attributes. You've married your DatabaseDefinition to the concept of 'entity' - something that can have attributes (as per EntityAttributeValue and EntityRelationshipDiagram). You also seem intent on marrying DatabaseDefinition to RDBMS, rather than simply ensuring that DatabaseDefinition includes RDBMS.
- I see nowhere where I am being relational-centric.
- In a paragraph immediately prior, you objected to a definition of "fact" on the basis that RDBMSs don't store expressions, but attributes. How is that not being RDBMS-centric?
- That's an inherent characteristic of databases, not just relational ones. See below.
- Okay, so it wasn't RDBMS centrism. This is ignorance on your part. What a surprise.
- Shove your rudeness up your ass with a fire-orange cattle prod. I shouldn't have to put up with such repeated and blatant personal attacks.
- You've worked in a sheltered little niche, TopMind, and arrogantly insisting on premature generalizations about what is an "inherent characteristic of databases" is earning you plenty of rudeness.
- Always childish excuses to make excess personal insults.
The broader world of databases - of which TopMind is apparently ignorant (though he may instead be a relational zealot in active denial) - allow users to manage and store constraints (X is between Y and Z, or X is the same as Y though Y is unknown), contingencies (X is true if Y is true), definitions (ancestor is parent or ancestor of parent), abstractions and heuristics (most X are Y, some X are Y, all X are Y, if X and Z then likely Y), even fuzzy propositions (X is like a chair), and so on - any sort of proposition that you might imagine to be true and wish to manage. Any given DBMS needn't manage all facts of all sorts; rather, any given DBMS will be managing some facts of some sorts. Perhaps TopMind is opposed to 'DatabaseIsRepresenterOfFacts' because he incorrectly reads it as 'DatabaseIsRepresenterOfAllFacts'. It seems silly to me that TopMind would assume a database must be omniscient. Any sane person would assume 'DatabaseIsRepresenterOfSomeFacts', which does not require supporting all sorts of facts.
- What you describe is closer to an "expert system", not a "database", by most usage. Perhaps usage wobbles enough that you could force-fit it, but my working definition, which I feel matches typical usage better than yours or at least equal, excludes "expression-bases". The term "database" has "data" in it, and when most people think of data, they usually think of attributes such as name, title, price, etc., not expressions. If IT people see that there are mostly expressions, they think ExpertSystem?, not "database". Or, at least something from the AI world, such as "rules metabase" or "logic engine" or "knowledge base". -t
- No, TopMind, it isn't closer to an ExpertSystem?. An "expert system" provides advice or makes predictions. A "database" faithfully represents facts. Databases don't need to derive new facts from old ones. Databases don't need to probabilistically predict facts from incomplete information. Databases do not need to create plans or contingencies. A database is likely to support an expert system or AI in doing these things (as an implementation detail), but that does not mean the database is an expert system. Is this getting through to you at all? And, seriously, you should never use the phrase "most people think" - you're HandWaving again, TopMind, arrogantly projecting your beliefs onto 'most people'.
- Definitions are generally considered to be defined by usage, not a panel of self-elected elites. I'm just the messenger. And having extra features does not necessarily exclude something from qualifying.
- Sure, definitions for English are defined by usage. If I had never seen 'database' used repeatedly to describe collections of 'facts' even of the more flexible sorts described above, perhaps I would be more apt to agree with your sheltered view about how 'database' is used. It seems you're the guy attempting to define a pair of scissors as a tool used in one's right hand, on the basis that most people use scissors in their right hand.
- Why should a reader "just trust" that your experience is non-sheltered or normal. With people skills like yours, I imagine you end up in pretty sheltered niches.
- There is no need to trust me to see what other people consider databases. The first line of wikipedia on 'Knowledge base' is "a special kind of database for knowledge management", and has been so for just under five years now (starting 20040102). You're inconsistent, TopMind: You want to push a name other than 'database' onto these other databases because they don't fit your personal definition, and you argue for this based on "usage". However, rather than recognizing and acknowledging how 'database' is 'used', you are rejecting its usage and instead attempting to change it. Slick idea that may be, but it's just an idea unless you can achieve adoption.
- You are too quick to use WikiPedia as a formal, literal, and careful source. The other kinds don't say "special". What's the difference between kinds of databases and "special" kinds of databases? If one was drawing a classification tree, how is "special" noted and different from other branches? If you obtained more usage data points you may have a worthy case.
- I used Wikipedia as an informal source, which is precisely what is needed if the goal is to see how informal people view things, is it not? If I went to a formal source, you'd have complained about "self-elected elites" and dismissed the source. If you distrust a Wikipedia as a reasonable source of common 'usage', you are free to search for a source of equal merit that denies the point. Your frivolous complaints about inclusion of the word 'special' don't dignify a response.
- You are correct about that. I stand corrected. However, the criticism about insufficient sample points and odd wording still stands. The "Foo-Base" section below may be related to domain-specific application of the term. -t
In any case, it seems, TopMind, as though you are frustrated about irrelevant distinctions between 'information' and 'facts', while stupidly ignoring various relevant distinctions (as between 'fact' and 'proposition', or 'fact' and 'DomainValue', or 'fact' and 'management service'). It only takes one
relevant distinction to 'falsify' a definition, TopMind. For your elucidation, 'information' is pretty much synonymous with 'data' excepting its contextual connotation: information (deriving from "inform", to teach, shape the mind) connotes something communicated
, whereas 'data' (deriving from datum - a 'given') connotes something held or stored. The difference between information and data was never in the substance, but rather in where you found it. In English, you can find like differences between 'asteroid' vs. 'meteor' or 'lava' vs. 'magma'.
If it's "all about the context", then the definition becomes an ever more complicated case-by-case Sherlock Puzzle as to grow useless to most people. Don't tell me, it will eventually rely on the definition of "intent" if we probe enough. All roads lead to your own personal Rome. My definition gets one 90% the way there with 1/100th the complexity of yours. I'm sure you are going to argue that if one wants to "do it right" and get 91% instead, they'll have to read 80 boring academic books written by your buddies, which are probably their only audience. -t
It isn't "all about the context". DatabaseIsRepresenterOfFacts does not depend on a distinction between 'information' and 'facts'. If you were literate, you'd have read the first sentence in the prior paragraph which described this as an irrelevant distinction.
When it comes to YOUR writing, I am not literate, but rather dumb as a drunk 1st grader. Your writing sucks that bad.
[Top, I wouldn't be too quick to dismiss the possibility that his writing is just fine, and that you either have serious comprehension problems or you aren't very bright. I'm betting it's a combination of both.]
You don't try very hard to be clear. You just throw it up on the wall as-is and patronize anybody who cannot decipher your convoluted internal mental model that tries to be little else beyond mere parsable as English.
[Go read AnIntroductionToDatabaseSystems cover to cover. Come back when you're done. Is that clear enough for you?]
I don't see where it attempts to state a semi-formal and compact definition. I have the 6th edition. Is there somewhere specific you wish to reference?
[In the 8th edition, section 1.3 provides the following definition: "A database is a collection of persistent data that is used by the application systems of some given enterprise." That's both semi-formal and compact. However, my point was actually a bit more subtle, which is that the whole book, in a sense, defines "database". Unless you wish to establish a definition to support a particular academic argument -- which inherently means your definition will be highly constrained and specific -- it is unlikely you will be able to arrive at a general definition that everyone can agree upon, because there are varying informal views of what a "database" is, and there is no body to legislate one definition as correct and the others as wrong. Therefore, multiple, possibly contradictory, detailed definitions for "database" will be equally valid. Thus, trying to arrive at a single, semi-formal, and compact definition that is superior to Date's is a rather pointless endeavour.]
In my opinion it's a good exercise to find and document the similarities and the differences. There is value in documenting disagreement. It may fuel a better def in the future. Wouldn't you like to see the feature tradeoff decisions and mental juggling that the authors of your favorite or shop language performed? This is a nice feature of C2 that the encyclopedia style desired by some WikiZens wouldn't cover well. -t
Foo-Base
Here's a working model of the definition to explore. We have things like "databases" and "knowledgebases" where the atoms and operations are different. Perhaps we can define a more broader concept of an "ia-base" where "i" is "idiom" and "a" is "atom". "Data" generally means the idiom of atoms called "attributes". If they had called it "attribute-base", managers wouldn't purchase it. But generally "attribute" is the better word, in my opinion.
Generally an ia-base at least facilitates the storage and retrieval of the idiom atoms as-is, meaning that you can get out what you put in without too much hassle, and in its original form, assuming another user didn't change it. Common features associated with storage and retrieval are often included, such as concurrency management (ex: 2+ people want to change the same idiom at the same time).
But in addition, it also handles common processing idioms or features associated with the atoms. (Processing idioms are a subset of idioms, here). For example, generally there will be some form of aggregation, such as counts. Counting the atoms in some way is almost always a necessity regardless of the domain of the atoms.
However, different super-domains may have different forms of aggregation. For a KnowledgeBase? (AI), an operation like "sum" may not be appropriate, but there may be other common aggregation or aggregation-like operations that a KnowledgeBase? will typically need, like maybe a probability of a truth test. Generally these common processing idioms will be common to multiple applications, or systems that use similar atoms. A GIS-base would likely support things like searching within a radius, closest-to, and filtering by objects within a bounding polygon(s) because these are common processing idioms to GIS and not just to a specific GIS app; they are something that a GIS-er will likely eventually need ragardless of which company or area they are working in.
Of course the line between "super" domain processing idioms and narrow domain idioms can be blurry. For example, many RDBMS have financial functions. Perhaps the derivatives industry wants even more of these and many that are specific to derivatives. Such would be shifting away from an "attribute-base" (database) and toward a "financial-base". (Some argue that existing RDBMS are a bit business-centric and thus are really business-bases and not data-bases.) If one particular derivatives shop wanted proprietary or single-company functions just for their particular company, it's shifting toward being an application and less a something-base. It begins to blur the line.
We could draw a hard line saying if it handles something that two or more applications would otherwise have to implement, then it's a thing-base instead of an application. But this may be premature dicing and trigger definition battles over "application". Two or more companies may also be a boundary to consider.
In short, a foo-base is a system that handles the storage and retrieval of foo's atoms, plus some commonly-used processing features or operations of foos. Commonly-used generally means cross-application or inter-application.
--top
[Huh? You don't try very hard to be clear. You just throw it up on the wall as-is... :-)]
If you have a question, just politely ask and I will do my best to politely answer. If you find it poorly-written, I assure you it wasn't my intention. With specific criticism I can apply specific fixes. -t
External Links:
- Something of the concepts and methods used can be found:
- Geographical Database
- SQL
- Concepts and Standards
- Database Management
- Collections of database definitions
CategoryDatabase, CategoryRealData
DecemberZeroNine