Belief Database Example

(Refactoring from AreTablesGeneralPurposeStructures, still out-of-sync as of writing.)

Use case: You wish to represent a collection of beliefs. Each belief is a datum - a proposition held to have a certain truth value in a set of worlds (or, alternatively, a proposition held to have a certain function of world->truth value), where time is one descriptor of world (so the world of the distant past is different than the world of the present or future). In addition to being a datum, a belief is held without absolute confidence; it is relatively easy to represent confidence in a database as a simple realnum ranging from 0.0 to 1.0; it doesn't even require much precision... 8 bits is plenty since quantifying confidence is such a fuzzy and imprecise thing anyway. For now, we'll also simplify worlds and simply give each one a string name (can come back to this if you pass the first challenge). Propositions, however, will be more difficult. A proposition is an abstract logic function over a world... e.g. P(x) is true or false depending on what 'x' happens to be in the world. A proposition can be a predicate (... make it a simple string for now ...) over some (possibly empty) ordered set of identifiers (which are world-dependent variables, and may also be strings for now), which for the moment we'll simplify to at-most-two identifiers - that is: Truth:(PredicateName?,Identifier1,Identifier2) (with some null ID). And:(Proposition,Proposition), Or:(Proposition,Proposition), Not:(Proposition), ExOr?:(Proposition,Proposition) (to allow OnceAndOnlyOnce), Impossible:(Proposition) ('I believe X is impossible', which tends towards the inference 'I believe Not:X'), Possible:(Proposition) (I believe X is possible; does not infer 'I believe X'), Probable:(Proposition,Probability) (I believe that X has Y% chance of being true), Contingent:(Proposition,Proposition) (I believe that if X is true, then Y is true). You have inference rules, too (e.g. 'Not:(Possible:(Not:(X))) => X, with same confidence'), but those needn't be represented in this collection, so don't worry about them overly much (they could be, though... it's perfectly reasonable to have one believe in a certain set of inference rules. However, that adds pattern-matching stuff and meta-level worlds that make the problem remarkably more difficult). However, to use those inference rules efficiently, you need to be able to quickly find propositions that share common substructures in common worlds in order to find: contradictions, beliefs applying to the same targets (identifiers), beliefs applying in the same worlds, beliefs using the same predicates, replications (where two beliefs support one another), abstraction points, and more. My expectations that a typical table's indexing can provide more than a sputtering bit of help on the searches, or that even several would work to well represent the collection (even as well as a straightforward 'set of value objects' or 'list of value objects'), are extremely dismal. The structure needed here needs an enormous variety of indices if it is to meet time-constraint requirements of a modern inference engine.

Is your argument related to "variants", or performance?

Use case: Interactive scene-graph - a collection of object-descriptors to be drawn and animated over a sequence of frames with constraints on ordering and occlusion, plus a feedback requirement requiring that ready access to which objects might be the target of a user's attention when the user points at something on the screen. The collection itself needs only to provide the indexing to make this possible.

I would like to see some sample data. Remember, anything that can be represented as a graph of object pointers can also be represented relationally.

Although I am no Cyc expert, initially, it seems one may be able to represent the above with just one table:

 table: beliefs
 ---------
 beliefID
 operator   // ex: and, or, probable, etc
 weighting  // 0.0 to 1.0
 param1
 param2
 param3
 param4

And I have no problem using generic strings to store parameters which may be numbers in some cases. I prefer using dynamic or type-free languages anyhow such that it does not matter to my app code. If we want to validate numericness for some combinations, we can put in triggers like this:

  TRIGGER ON CHANGE TO TABLE BELEIFS(newrec, oldrec))
     IF newrec.operator = 'FOO' AND NOT IS_NUMERIC_PARSE(newrec.param2)
        RAISE ERROR '2nd parameter must be numeric for op FOO'
     END IF
  END TRIGGER

Where does beliefID come from?'

Let's assume it is an auto-number.

and, more importantly, how do we know which propositions are in our collection of beliefs,

Don't the operators determine that? I am hesitant to make a hard-split between them in order to serve future flexibility.

and which are simply propositions that are part of other beliefs?

Add a many-to-many table

 table: belief_xref
 --------------
 beliefRef1
 beliefRef2 
 relationStrength

And why do propositions that are not directly associated with beliefs have confidence values? propositions don't have confidence values... only beliefs do.

If they are not used, then ignore them.

Do plain old propositions have 'beliefIDs'?

Perhaps I should have called them "suppositions" instead of beliefs. I don't yet see a reason to make two different tables, but am not hard against it. I tend to combine similar entities into one rather than split them for similar reasons that I don't like hierarchical taxonomies. It's worked better for me, but others may disagree. Is there any solid and significant difference between "propositions" and "beliefs"? Can there be something in-between?

What would be the algorithmic cost of querying something simple like, say, all the beliefs with predicates referencing a particular identifier? (I think it will be higher than O(N^2) due to reducing from sub-propositions back up to the belief). Oh, and you lost the world (that one is an easy fix). You've a ways to go yet, and I think you'll quickly violate several "philosophies of relational" that come from Codd (in particular, normalization forms). Your use of triggers to maintain consistency as part of the database is admirable, but I don't recall it being a general property of tables or even of relational databases. I imagine, though, you could say it in a declarative manner.

I will not comment on performance issues, other that saying that if you can roll-your-own Cyc engine then one could also roll their own relational Cyc engine and tune it to need.

As far as triggers, perhaps constraints can do the same thing. The difference between them is mostly a language implimentation issue. And I don't see it violating common normalization because there is no duplication from not using columns. (Maybe in fixed-record implementations there is such, but I am not assuming we're using such.)


Section B

I could go through the text with you and show you where I feel the wording is poor or ambiguious. However, I would rather move on rather than get into a marriage spat. Here is my current model:

 // incarnation 2.0

table: statements --------- statementID operator // ex: and, or, probable, truth, etc weighting // [DEPRECATED] param1 param2 param3

table: group_membership ---------------- groupRef // ID of group statementRef // ID of statement weighting // 0.0 to 1.0 [added 9/1/2007]

I am still not clear on the difference between "beliefs" and "propositions" and "worlds". But the above can put any of the statements in any or all of these groups. In essensce, make different views of reality.

A proposition is a logical statement that evaluates to true or false. For example: "Bob stole the cookie from the cookie jar." However, you can't evaluate a proposition without a world to evaluate it against. Bob, cookies, and cookie jars are all referents to a world. Our present is just one world, but there are many others... a dream constitutes a world, as does a mathematical construct, canon of a fiction book, etc. Bob might have stolen cookies from the cookie jar in some of these worlds but not others. In some worlds there might be no such thing as a cookie jar or Bob, making the proposition nonsensical.

A belief is proposition held by an agent to be true (possibly without complete confidence) with respect to a particular world (or set of worlds... getting into variant levels of abstraction is a bit above this discussion). The agent might be right or wrong, of course. A belief is only attached to truth insofar as the agent attempts to verify beliefs against their respective worlds, and even then it is only inductive. The only way an agent can have truth is by creating the world (e.g. "imagining" it, or creating a mathematically constructed universe).

The above tables are not ideal. Closest to a statement is the proposition. You should probly shift the 'weight' (or confidence) into the group, leaving the propositions in a dedicated table. You also need to find a home for 'world', which (right now) would do best in 'group' (at which point you have 'belief', with each tuple representing: (agent/group (believes) proposition (with) confidence (in) world)). If world is to be extended into its greater form (a possibly infinite set of truths, or an axiomatization) you'll probably need a whole new table just to describe worlds.

The requirements as specified allow you to drop 'agent/group', since you need only represent a collection of beliefs (which implicitly belong to the agent holding the collection), but there is no harm in having it. In reality, it is very, very rare that one agent can actually know what another agent believes, so there is little point in carrying around that sort of information. You very well might have beliefs about what other agents believe (I believe with 90% confidence that Bob believes with 80% confidence that James stole the cookie from the cookie jar...), but each agent has its own collection of actual beliefs, and your beliefs about their beliefs are likely imprecise at best. Now, if you were programming a game AI or something, where one agent (the controller) is modelling a bunch of other agents (the bots), then the domain is somewhat different and you might actually benefit from the 'group' section. The original specifications aren't for that domain. Even so, going more general isn't of any severe cost... even at 32 bytes per belief, the total cost will only be a few hundred megabytes plus index for a medium-sized collection of beliefs.

I don't wish to give the impression that the variant problem has been conquered. The hard part hasn't even been touched yet: access and manipulation. Representation isn't too difficult with variants (though it gets messy when variants themselves contain unbound lists of other variants, which is why I simplified the "Truth" proposition to have at most two referents, whereas the more natural definition has finite-but-unbounded number). The real problems start after you've chosen your crisp and clean representation and actually want to access or manipulate the data... e.g. to find all beliefs involving "Bob" and "Cookies". The requirements stated mention only access, so you can assume that manipulation (removal, addition, modification of confidence) is allowed to be slow (e.g. performed in batch). Access requirements are that you be able to find the -belief- fast... not the propositions, but the higher, top-of-the-tree, beliefs.

Can you write the relational code that tells me all beliefs associating (in any manner) (op:"Truth",param(Any):"Bob") with (another or the same) proposition (op:"Truth", param(Any):"Cookie")? I'd prefer you discover how navigational such a query is for yourself.

      table: statements  // VARIATION "JANE"
      ---------
      statementID
      operator   // ex: and, or, probable, truth, etc
      stmtRef1   // foreign key to this table (null if not applic)
      stmtRef2   // ditto
      param1
      param2
      // (not very pretty by relational standards, but workable)

The navigational nature of access for even the simplest of the domain requirements really demonstrates (... at least after you perform the exercise and sum the algorithmic costs) why even a whole pair of tables turns out no stronger than the humble list. The domain, however, requires more: that you be able to rapidly find beliefs (that means "faster than O(N) - the cost of scanning through the list of ten-million beliefs") that meet certain patterns on their propositions... where the patterns might be two or three 'deep' (e.g. so beliefs that Not:(Possible:(Not:(X))) can be found, or beliefs that Not:(Possible:(X)) can be found, or beliefs that X can be paired with beliefs that Contingent:(structure-similar-to-X,Y). Procedurally, this sort of indexing can be done (after some hassle, but at least a very straightforward hassle) with highly specialized indices that identify certain commonly requested patterns over beliefs. The relational representation or model, however, doesn't allow for this. It indexes keys over one table. There is nary a glimmer of an approach - to find these patterns, you need pre-indexing on what is, essentially, ((beliefs left-join propositions on propositionID) left-join ((propositions on (param1 or param2 or param3 = lhs.propsitionID))) (this allowing to identify patterns of at most 2 levels deep... perhaps add one more recursive self-join on 3 params to get the whole three levels). Even the very most flexible implementations of relational will not allow for indexing patterns that can be identified only through two or three left-joins over the proposition table combined with a left-join from the behaviors table. You'll need to specialize heavily, and whichever implementation you choose (even build-tables-and-query-processor-from-scratch) will fight you every step of the way.

I don't claim to have a suggested implimentation right now, but that does not mean that a better implementation cannot be devised. Lack of creativity on my part is not the same as "impossible" (AbsenceOfEvidenceIsNotEvidenceOfAbsence). If you can mathematically prove that certain relational expressions are implimentationally limited to O(n), then it may deserve an acedemic paper. And, I am also not dismissing a bastardized form of relational to speed up a niche. Many things don't have to be all-or-nothing. I'd probably start by dissecting your proposed alternative, and somehow work its indexing system into the new B-RDBMS. But your complaint about *current* implementations may deserve merit.


EditText of this page (last edited September 11, 2007) or FindPage with title or text search