Different ways to provide different structure "types" in relational, or whether it should be done.
Things to consider:
- Use one "kind" of structure as another as our needs grow. For example, a given record/node may participate in different structures all at the same time: tree, table, graph, stack, etc. without copying in and out of each. Also supported by structured DomainValues.
- Easily mix collection "kinds" without heavy conversion effort. For example, perform a UNION or JOIN (or ZIP or FOLD or MAP or APPEND) on some or all records/nodes of a two or more collection "kinds". (Support for both structured and destructured DomainValues may require column type conversion efforts to do any equivalent of 'UNION' two different kinds under heavy-typed systems.)
- '... under heavy-typed systems - not really. Under any RelationalModel system you cannot UNION a two-column relation with a three-column relation. The conversion effort would be pretty common here.
- That's going to be true of anything such that it's probably not worth mentioning. -t
- Ah, so then do you recant your mention of "under heavy-typed systems" since "it's probably not worth mentioning"?
- How about a rewording re "be mindful of the cost of conversions". Some things are gonna be a wash either way and some are not.
- Which costs do you mean? Expressing the conversion or running it?
- Data entry - the complexity of getting DomainValues into the database in the first place (can be much more difficult if use of surrogate IDs requires ordered and procedural entry rather than declarative entry).
- Query specification - ability to syntactically supply an embedded tree (for purpose of join, comparison, or other domain math operation) versus having to give the ID, pointer, or name of a previously-defined tree.
- Data maintenance - degree to which the RDBMS can automate sharing and cleanup.
- I posit that the requirement for GarbageCollection serves as a practical, distinguishing property between "DomainValue" and "WhatIsData". That is, if a node is useless without a reference to it from another table, then that node does not represent data about our world or any other. (Note: Needing a reference to the node is fundamentally distinct from the node needing to reference something that exists in another table!) This provides a hard, defining characteristic by which one may judge which 'potential' DomainValues are to be stored "in the cell" vs. which are to be destructured into data tables. That is, if you would need to garbage collect the value, you can put it in a cell. No 'EverythingIsRelative' nonsense is required.
- Please clarify. Whether something is "needed" or not is difficult to determine because it's based on the domain dynamics. For example, in relational things are generally not related by pointers, as they are in navigational DB's. The only required "pointer" to any given fact is being "in" at least one table, one could say. A "fact" becomes important when the customer/user wants that fact, not because it is tied to something (measurable) in the computer. We cannot tell by looking at computer-space whether a fact will be used or needed by domain-space at some future point in time.
- It's the other way around: we can tell by looking at computer-space where and whether garbage collection is occurring, and thus we can tell by looking at computer-space whether an array of bytes/tree/graph/etc. is a "fact" or a "DomainValue". The need for garbage collection is an indicator, an objective measure, a distinguishing property, not a definition. After all, whether a tuple in a database is potentially needed is determined by the customers/users, and this requirement becomes reflected in computer-space. E.g. in MySql very large strings are represented in separate 'string tables' to which references (surrogate keys) are held. When there are no more references to that very large string from customer "facts", the string can be (and must be, given finite computation resources) garbage collected. By so doing, we know that the string is a DomainValue and not a "fact". The same principle can be applied to trees and graphs, and even to integers and doubles which can be of arbitrary width and precision.
- Another issue, perhaps related, is that garbage collection is a implementation issue. If we had infinite time and space, then we wouldn't need garbage collection. From a practical perspective we need it, but we shouldn't dwell on it for our logical model until it's forced upon us by practical concerns.
- The logical model already distinguishes DomainValue from "facts". I agree that, at least for RelationalModel, the logical model doesn't involve GarbageCollection, but when working at the logical level we don't need "practical" distinguishing properties like GarbageCollection, since we have the model itself rather than the implementation. Anyhow, if you were to follow your own advice, you wouldn't destructure complex DomainValues until "forced upon you by practical concerns" (e.g. a need to express a complex structure in SQL, or the need to guarantee certain indexes are present).
- Where am I doing that?
- I assume you mean to ask "where am I not following my own advice?". You are violating your own advice wherever you destructure a complex DomainValue as a first option or choice - that is, when you do so before you first demonstrate that it has been "forced upon you by practical concerns".
- Joins - in RelationalModel a fundamental operation is join over equal DomainValues, so having surrogate IDs requires that you either guarantee by construction that equal trees will result in equal keys (i.e. perform interning by hand), or you must use special relational operators to perform equality tests between DomainValues represented by surrogate keys.
- Uniqueness - again, uniqueness testing becomes an issue if the same DomainValue (structurally) can have two different surrogate keys (when forcing representation)
- Symmetry and Composition - whether GenericProgramming can be implemented to reuse queries between different classes of DomainValues; clearly violated for joins at least if special operators are required to test for equality between DomainValues.
- Not have to re-implement operators for each new "kind" of collection added. This includes custom (domain-specific) operators as well as generic ones. With structured DomainValues this is done by both GenericProgramming (to share operators) and composing existing operators with conversion functions. With destructured DomainValues, collection operators are maintained by forcing every DomainValue into a particular relational mold ahead of time, and non-relational DomainValue operators are supported without any rewrite by a few generic operations composed with TopMind's HandWaving.
- YOU are handwaving, NOT ME. -t
- Sorry, but I'm still waiting on any concrete examples to show me how to use relational operators designed for use over trees to perform simple equality and comparison tests. Until then, all I have is your word that it can be done. That is HandWaving whether you're in blind denial of it or not.
- And I really loved how you stuck all your bullet points at the top. Classic. -t
- I was just following your fine example.
- Where I have re-ordered an existing list?
- You didn't accuse me of re-ordering a list. You accused me of putting my bullet-points at the top... which is exactly how your "Things to consider" section got to the top of this page.
- It's acceptable wiki practice to put summaries at the top. I removed any conclusions about betterment in order to not be rude. I try not be be inflammatory. It's NOT acceptable to put your bullet items above those already in a bulletted list because you personally feel they are more important. Clear? -t
- Your stated opinion about what is 'acceptable' is noted, but not respected.
- I'll accept consensus wiki norms over your moral system any day. You think odd.
- My "morale" is high despite how demoralizing it is to see one vociferous man spattering his signature across WikiWiki with unsubstantiated opinions, and I believe you're terribly delusional with regards to having consensus.
- I was talking about a consensus on topic summary layout, not my table opinions. And where have you EVER substantiated a non-hardware opinion here? -t
- Technically, what you wrote was not, in any way, a "summary". It didn't summarize anything in the topic.
- Even if true, is that a reason to be rude with it?
- I don't believe it "rude" to add bullet points anywhere in an unordered list.
- It's generally more polite to add to the bottom unless there's a clear reason to do otherwise.
- With regards to your latter allegation: excepting various WishList? pages such as NewOsFeatures and IdealProgrammingLanguage (and sometimes even on those) I offer justification with each technical opinion. Where haven't I? Point some examples out, and I'll be sure to correct them.
- Your justifications are weak. They are either too vague, or too small a factor (AKA "micro-rigor"), or depend on a BookStop. Any rigorous evidence for "x is objectively is better" is going to have to address a wide range of factors.
- Nice allegations - just make up some principles then call other people's justifications "weak" for not following them. I can see how you gained your reputation. If I were to do some HandWaving about "macro-rigor" and complain about being pointed to a "BookStop", and take pride in my "working man's ignorance" without any effort to study or better understand the concepts about which I express opinions, people would think I'm an idiot, too. And if I've made an "x is objectively better" claim without a clear and limited scope, it's more likely that you either failed to understand the scope or that you grossly misinterpreted a statement of evidence as a such a claim.
- What did I "make up"?
- You made up: the "existence" of "macro-rigor", the "sin" of BookStop, and the "weakness" of "micro-rigor" as a justification.
- This is not the topic to debate the general value of such concepts.
- Indeed it isn't. But until the general value of such concepts are widely accepted, it is also inappropriate for you to accuse others of 'violating' them. By analogy, your arguments are weak because they violate TheFooPrinciple.
- First, "widely accepted" is ArgumentFromAuthority. Second, "macro rigor" is defined as providing a wide range of measurements in a wide range of metrics, as apposed to using a single or few pet metrics (such as only "provably correct" or only "lines of code"). Why does that idea bother you? -t
- First, I should have been less generous: your 'principles' are accepted by only one person, who calls himself TopMind, and thus are unsuitable as accusations. Second, you misrepresent yourself; you describe (in MacroAndMicroRigor) macro-rigor as being used to support "net" advantage. There is no such thing as "net" advantage, and there is always a "narrow" purpose, therefore such thing as "macro" rigor neither exists nor needs to exist. And it isn't the 'idea' of macro-rigor that bothers me; it is your dishonesty and self-deception that bother me.
- You appear to assume the concept is Boolean. Let me reword the prior statement: your evidence does not provide a sufficient quantity and variety of metrics to even approach a thorough analysis from a practical perspective. As far as your "narrow" comment, I don't know what you mean.-t
- Okay, now that you've reworded your statement, it is more acceptable. (And the "always a narrow purpose" comment refers to the fact that there is always a 'purpose', a 'better for what', when it comes to analysis.) Anyhow, now that your wording is more acceptable, I'll simply ask that you defend them. So far points have been raised about sharing, queries, data entry, indexing, flexibility, ability to accommodate variations without re-implementing stuff, etc. and to me it seems that the typed approach comes up victorious on some and no worse on all others except for one-time implementation costs. How is this not a sufficient quantity and variety of metrics to approach a thorough analysis from a practical perspective?
- Your "experiments" are largely anecdotal. They are not readily subject to public inspection.
- I think you really don't know the meaning of that word, "anecdotal". And the points and fallacies of your arguments have been repeatedly presented for public inspection, here and on other pages.
- You'd like to believe that crap, wouldn't you?
- Doesn't everybody agree that structures should be flexible? (Except maybe in special niches where security or performance trump all other concerns.) If you can do it in your type-centric world, then please do. And if you demand that I master your academic-speak before you communicate with me, then perhaps you have a mistaken belief that this wiki is for academia only.
- Oh, I agree that flexibility is a good thing right up there with performance, convenience, safety, security, etc. But flexibility doesn't "feel" very flexible if one needs to jump through hoops to use it efficiently, easily, safely, securely, or (in a broad sense) 'effectively'. For example: if you'd willingly trade flexibility for convenience, then you should be HandWaving about the virtues of BLOBs rather than Tables, since BLOBs can represent anything at all (including Tables). Additionally, I'm unconvinced that "structures should be flexible" is an effective approach to achieve a requirement that "the system as a whole shall offer flexibility", though I'll admit to have never considered how to go about sacrificing all other useful properties for 'flexibility'.
- Indeed, conventions (rules) are also useful for humans to navigate and comprehend their models. We depend on them psychologically. Natural selection produced "big messy graph" when presented with the "computing problem" (the brain), and not relational, nor your pet type system I would note. Thus, let's agree we want both flexibility and some sense of order/rules/conventions. -t
- We don't want flexibility and a "sense" of order. We want flexibility and some "assurance" of order. And while making it easier to navigate and comprehend the models is perhaps a benefit, it is not one that has received much formal attention. ExpressivePower doesn't need to appeal to ease of comprehension. Neither do safety, performance, security, portability, composability, etc. Those properties aren't about satisfying our psychological dependency for order. As far as the brain being a "big messy graph", perhaps you should study it; you'll find it is remarkably well ordered. Natural selection didn't favor BLOBs either.
- It doesn't receive much "formal attention" because it requires delving into psychology, which is a softer "science" than what math-heads want to or are capable of dealing with. SovietShoeFactoryPrinciple, the purist's bugaboo. Computer-related studies were traditionally tied to math and technology studies. Perhaps if they were somehow tied with psychology studies, we might finally get something useful from the academics outside of machine speed.
- [Do you spout fiction purely to be argumentative? Go to http://acm.org/dl and search for "psychology".]
- Because they have a fee, it's hard to tell for sure, but I counted only about four that are relevant to our discussion. Four among thousands of "mathy" papers. Obviously it's not a priority.
- Do you really have to shove your insults everywhere?
- Insult? I'm sorry, I was just pointing out what I consider to be a very obvious truth. Pardon me for not considering whether or not you'd be offended by it.
- Well, it's obvious to me that you are a crappy documenter and articulator. Should I thus say it over an over?
- Yes, you should, if both (a) you are willing to defend that point whenever you are called on it, and (b) my arguments are based on an invalid assumption that I'm a great articulator and documenter and yet I keep ignoring your counterpoints to these assumptions. OTOH, the claim that I'm a crappy documenter and articulator doesn't seem to be relevant to this topic, whereas your claim that we can (somehow) use relational tree ops and such to perform equality tests is among the primary assumptions upon which you're basing many of your argument. It is appropriate to repeat (and repeat, and repeat) the fact that your assumption has no valid basis every time you make an argument that depends upon it. Why? Because it is a valid and logical counterpoint to your argument until such a time as you offer sufficient justification to support the claim.
- I tried hard to be nice and not insult you back, but you wore my patience too thin. -t
- Of course you tried. But your effort would have been much better spent justifying your claims or going through the example problem we offered from CrossToolTypeAndObjectSharing.
- I already gave a descriptive answer near the bottom. If you don't like that answer then complain THERE, not HERE. You scatter insults everywhere like a crow with diarrhea. I hope your type system is not as scattershot as your insult "logic".
- Don't worry, my type systems and insult logic only seem 'scattershot' to people who delude themselves regarding the superiority of their assumptions even in the face of counter-evidence.
- Use/share existing data and indexes if available rather than copy back and forth between "kinds" of collections. For structured DomainValues, this is done using techniques created for FunctionalProgramming and OOP such as Interning and CopyOnWrite. For destructured DomainValues, this is done by sharing surrogate IDs, and optionally some special effort application side to perform interning and to carefully deleting shared data with hand-written garbage collection.
Based on a discussion in MagicEverythingMachine and [insert later].
You do agree that tree-oriented relational operators can be added to a relational operation set (perhaps call it "extended relational"), correct? Suppose the tree operations only have to know what the "parent key" is.
And you do agree that the "domain math" can be made separate from the relational system, as long as certain criteria are met and we are not overly concerned about "machine efficiency" (for the sake of discussion). Correct? Thus, there's no problem with comparing imaginary numbers since "imaginaryX = imaginaryY" is processed by the domain math engine and not the relational engine. The relational engine only has to know whether expressions evaluate to True or False (or perhaps whether equality is met).
Thus, there is no logical reason whey tree operations cannot be added and keys cannot be imaginary numbers. Agreed?
Eh? That is a very NonSequitur argument. I agree that relational can be extended with tree operators (transitive closures and such), and I certainly agree that keys can be imaginary numbers, but you have utterly failed to argue or defend any useful point in defense of your policy that tree values and such 'should' be represented in relations.
- Main benefit: The "auto-inherit-all-collection-operations principle" (AIACOP) mentioned in MagicEverythingMachine. Strict types hinder such. -t
- ''Sigh. TopMind, your argument above does not argue or defend AIACOP; if you believe it does, then you seriously need to refresh your education in logic and reasoning.
- If one does not have to reinvent the wheel to gain existing operations from other collection "types", that is a good thing, and I don't see why I have to defend that. It's obvious. Nobody wants to reinvent the wheel. Any arguments against it would be that there are down-sides to having such feature that out-weigh the up-sides. -t
- Nobody here is suggesting regular reinvention of the wheel. Functions that transform views of common datatypes (such as lists) need be written OnceAndOnlyOnce. To the contrary, I believe OnceAndOnlyOnce makes a stronger case against your approach of avoiding nesting - an approach that does not readily support composition of queries or of the schema describing the complex structures. For example, one couldn't reuse the queries and schema for a tree-of-integers when one later decides to work with a tree-of-graphs.
- Please clarify example. -t
- One cannot reuse queries and schema for a tree-of-integers when one later decides to work with a tree-of-graphs because (as one concrete example) one must use entirely different classes of operators (simple domain equality vs. the sort of HandWaving magical relational op you've been going on about repeatedly) between performing joins over integers vs. joins over graphs. If graphs could be 'nested' domain types, this would never be a problem - there would be no need for special workarounds and writing new queries to follow the workarounds; one could simply use a few generic queries and perform the join. Is that enough clarification for you?
- Please clarify. What exactly is "entirely different"? Do you mean syntactically? That's a matter of how the operators are designed. There's lots of different options. Nesting creates "hard" barriers between items that reference-based approaches generally don't. And I agree that nesting may simplify some kinds of expressions, but it will also make others more difficult. Again, which is "best" may depend on how many invariants are in the domain. -t
- Syntactically different + Semantically different + Arity different = Entirely different. To take, say, generic relational operators for transitive closure over trees and graphs and somehow "design the operators" so it can fit in as an equality statement between two domain types is, I think, impossible - I suspect the arity of the arguments is fundamentally distinct. But since you are the person making the positive claim, that this is "a matter of how the operators are designed", I invite you to attempt it.
- As far "nesting creates hard barriers" the barriers aren't any "harder" than the barriers between one relation and another, so I hardly think it a problem. It takes an operation to combine two relations. It takes an operation to combine two domain types... not that these need to be distinct categories, nothing wrong with allowing relations as a domain type. It takes an operation to convert between domain types, including between different relations and from relations to aggregate values.
- I think your approach makes a very few expressions slightly easier at the cost of making the vast majority of them considerably more difficult. But why don't you provide a list of pros and cons for your approach; if you're promoting it as a principle, you should be able to justify it in most cases by using a list of pros and cons. Here's my list: the cons include construction of trees since one needs to associate each node with surrogate IDs, the cons include deletion of trees since such garbage collection becomes the job of application code, the cons include any case where a 'tree' or part of one might need to be described as part of a query in the same way we describe integers and strings in queries, the cons include comparison and equality and join operations between tree structures, the cons include asymmetry between operating over simple types and working with trees and the need for workarounds, the cons include significant extra complexity to specify that a schema is keyed on the structure of a tree, the cons probably include a great deal more. The pros? All I can think of: you can make your approach work with trees in SQL, and you save exactly one operation when converting tree to be represented in a relation (and then ONLY when converting to the particular representation or view you chose, and ignoring the extra operations required to construct the tree application-side). So, overall, I think your principle ain't worth a used sheet of toilet paper... it smells like PrematureOptimization driven by working with severely limited tools in the past.
- Would you at least agree to: "All else being equal, easily borrowing operations across collection kinds is a good feature to have". "Easy" meaning without have to change programming code in the collection system. -t
- [Polymorphic container operations are already a feature of many collection libraries.]
- Can you demonstrate this, such as how it would allow relational tables to use tree features and vise versa without copying? (Use same data/state and related indexes in-place.) And note the "reinvent database without knowing it" comment below. -t
- [I meant that various containers share interfaces, such that code changes are minimised when container types are changed. E.g., Tree t = new Tree(); t.insert(...); t.delete(...); Relvar r = new Relvar(...); r.insert(new Tuple(...)); r.delete(tuple); Collection c = t; foreach (Object o: c) ...; c = r; foreach (Object o: c) ...; ... etc...]
- But that's only name sharing, not implementation, data, and index sharing. You haven't answered the questions. Say we have 50 relational operations, some of them custom-made for our domain. Now if we add tree stuff and graph stuff, under my approach these AUTOMATICALLY "inherit" all 50 of these. With dedicated types of structures, for every new structure we add, we have to re-implement potentially all 50 of these if we want them available. Plus, I could UNION columns from one "type" with another without copying into and out of. For example, union a date-stamp and name from a file tree table/view with a traditional relational table. No new code. Similar thing with JOIN. Mixing is more effort with dedicated structure types. And you have not shown how these can share indexes and data without copying into and out of each "kind" of structure. -t
- RE: "With dedicated types of structures [...] re-implement potentially all 50 of these" - that isn't true. If we want them available, it would be sufficient to add a function translate the given type to a relation.
- RE: "No new code [...] mixing is more effort" - that's also untrue. You need to write the expression performing the UNION or JOIN. The need to do so is balanced by a need to write a MAP, APPEND, or FOLD expression over structured DomainValues. There is neither a disadvantage nor an advantage on your side when writing new code to combine domain values. Neither of us have a valid basis for arguing whether this sort of 'mixing' is 'easier' with relational vs. structured DomainValues.
- RE: "share indexes and data without copying into and out of" - sharing data is easy... CopyOnWrite is easy enough in imperative code, and in an RDBMS the programmers don't even need to worry about writes. Interning and GarbageCollection would allow considerable structural sharing to be performed with great efficiency. Implementing shared indexes is a bit more difficult (albeit no less so in RDBMS in general... consider joining multiple "thin" tables), since it requires knowing what should be indexed. Fundamentally it involves keeping a few 'reverse direction' arrows such that searches on component parts can find the structures and relations containing them. One must make tradeoffs between indexing and linear searches based on the anticipated queries, of course, so choice of indexing within DomainValues to, for example, perform rapid pattern-matches and efficient cluster queries, is really the job DomainSpecificTweaks to an RDBMS. OTOH, for many combinations and operations, the structured data and functions over them offers O(1) access to exactly the pieces of data one needs... the need for indexing of the sort to access or process individual DomainValues is eliminated, which frees up the RDBMS to focus on just those pieces needing support for pattern matching and cluster queries.
- By your definition of "easy" (no changes to existing code) the ability to compose a short bit of new code to, for example, convert a tree to a relation of branches then join on said collection, is already "easy". I would agree that the ability to compose language features without restrictions (SymmetryOfLanguage) is a very good thing. But I think it better if we avoid LanguageIdiomClutter; each basic kind in a TypeSystem should obey the OneResponsibilityRule, so having more than one class of types have 'collection operations' is, IMO, a BadThing (tm).
- Copying back and forth into each "kind" of thing is an option for every technique, and thus not a difference-maker. But it is ugly as far as OnceAndOnlyOnce and may not scale to large info sets. -t
- The ability to iterate across, say, the entries of a C 'struct' can be useful... but only needs to be written once per a type, and thus scales quite well because the number of types is invariably smaller than the number of operators over them. I'm going to start deleting every phrase where you use "perhaps" and "may" in an accusation. Seriously, you use those weasel words habitually to make allegations you're unwilling and unable to defend, so you might as well have said nothing at all.
- And, the OneResponsibilityRule is arbitrary, anti-reuse, and restricting. It may make purist pedants happy, but does not make for flexible, powerful, and cheaper systems (except perhaps in domains with lots of invariants). RDBMS are commercially popular largely because they more easily allowed the same info to be used for different, unanticipated purposes. -t
- I have been quite convinced that OneResponsibilityRule is pro-reuse and pro-flexibility via composition. So, to me, it seems you're talking shit about something you don't really understand. That is, I suppose, my typical impression of you, but I'll give you a chance to defend yourself: please clarify exactly how OneResponsibilityRule is "anti-reuse, restricting, and does not make for flexible, powerful systems".
- And it would be less LanguageIdiomClutter, not more, because we don't have to cross-re-invent. -t
- Sorry, not buying it. LanguageIdiomClutter is certainly higher if each primitive in the language is a slightly different way to achieve the same effect.
- Please clarify.
- LanguageIdiomClutter has a definition that favors "emulation" or "cross-re-invention" of features over providing the same features many different ways. Look at the OP sentences.
- As to whether "strict types hinder" AIACOP, that may be true. It's irrelevant, of course (since one may simply apply a function that returns a relation then do collection operations on the returned relation), but I'll agree that most TypeSystems will not support collection operators on every type.
- [Nor should they...]
- As far as "should not support every collection type", I only half agree. Some operations don't make sense on some structures/types/kinds/circumstances, which we all agree on. However, typing is *too* narrow in that regard. One has to explicitly add, perhaps hand-code, any relational operations they do want to borrow that are not in the original ADT's operations. That is not a good thing and a big flaw of type-orientation to me. You seem happy with that limit, but bothers me. Perhaps your domain has more invariants than mine? -t
- Or, perhaps, we're just (a) familiar with GenericProgramming and know those things can be written OnceAndOnlyOnce for any common class of transforms, and (b) intelligent enough to balance the need to compose new functions against the need in your approach to write new schema and new queries. Between (a) and (b) I see neither a flaw nor any extra effort in the aspects you describe... and, yet, I still can't help but imagine considerable extra effort involved with your policy (avoiding nesting) when comes time to, say, perform joins for structural equality, write up variations on pattern-matching functions, add new trees to the tables, delete trees from the tables, and so on.
- ["One has to explicitly add, perhaps hand-code, any relational operations they do want to borrow that are not in the original ADT's operations." Yes, but only once, ever, if they're designed to be generic. Alternatively, you can use a pre-written library that someone else has written.]
- But what about domain-specific operations? If you have 10 dedicated structure types, then for every new domain-specific collection operation added, you have to re-implement that up to 10 times, while I only have to implement it once. I don't have the delusion that we can forecast all possible collection operators up-front. -t
- Unless we want different behaviors for different DomainValues, we need only to implement it once as well. See PrimitivesAndMeansOfComposition. If we want ten different behaviors, then we would need to implement ten different behaviors, and you'd be in the same boat.
- If they are truly generic and don't need re-copying of data/state to use (share-friendly), then perhaps you've invented a kind of database without knowing it. It may not be "relational", but at least a form of "database", and so we are converging somewhat, but under different names. -t
- Sigh. Perhaps. Perhaps not. I don't want to hear you playing HumptyDumpty with the word "database" just so you can defend this pointless claim of yours. I'll just state my conclusions: I have no reason to believe GenericProgramming implies any sort of "database", relational or otherwise, I believe you're speaking nonsense, and I suspect you're doing so because you're rather fond of your delusion that we're somehow 'converging' with you.
- Projection. You stubbornly hang on to type-heaviness because you invested too much already in trying to make them work.
- [Nonsense. They do work. What doesn't work is trying to create and debug multiple queries involving graph-value comparisons because the graphs are embedded in tables because the DBMS doesn't support user-defined types. With user-defined types, you implement the graph type and its comparison operator once, and subsequently create and debug queries involving graphs that are no more complex than queries involving only integers.]
- You are rambling. Please flesh out your scenario more.
- [I did, in CrossToolTypeAndObjectSharing. I'll repeat it here for your convenience:]
- [If I define a type to represent a complex number, e.g., "Complex", I can define values as Complex(1, 2), Complex(1, 4), etc. If I need, for example, pairs of complex numbers in a database, I might define a table (using TutorialDee syntax) as "VAR mytable REAL RELATION {Comp1 Complex, Comp2 Complex} KEY {Comp1}". I can now easily perform queries, e.g: "mytable WHERE Comp1 = Comp2" or "mytable WHERE Comp1 = Complex(1, 5)" and so on. If, however, I cannot define a new type to represent Complex, then I'm forced to define my table as "VAR mytable REAL RELATION {Comp1Real RATIONAL, Comp1Imaginary RATIONAL, Comp2Real RATIONAL, Comp2Imaginary RATIONAL} KEY {Comp1Real, Comp1Imaginary}". Furthermore, my queries become more complex. E.g., "mytable WHERE Comp1Real=Comp2Real AND Comp1Imaginary=Comp2Imaginary", or "mytable WHERE Comp1Real=1 AND Comp1Imaginary=5". And so on. Which seems more kludgey to you? A complex number is a very simple type; more complex types merely expand the difficulty shown above, complicate the implementation of the queries, obfuscate the intent of the queries, and potentially duplicate functionality throughout the system that could otherwise be represented OnceAndOnlyOnce in a type definition. Can you imagine what these examples would look like if the type was a tree, or other complicated structure? Using a type, they wouldn't change significantly -- only the value selector would be different. Maybe replace "Complex(1, 2)" with "Tree(Node(1, Node(Node(2, 4), 3)))", but the rest of the code would be essentially unchanged. I'm not going to attempt a Tree example on tables consisting only of primitive canonical types -- life is too short.]
- Why is this relevant to this topic? I didn't forbid custom user-defined or domain-specific "cell" types. It is non-trivial *structures* inside cells that bothers me. This topic is not about the cross-language sharing issue. -t
- I don't forbid custom user-defined cell types! My customers can have any cell type they want, so long as it's BLOB! (Fixed width BLOB is the only trivial structure.)
- You appear to be mixing up different issues/topics. I did not dictate types or lack of types or flags/tags or lack of flags/tags in THIS topic. -t
- It appears you are failing to grok that arguing against certain kinds of types (i.e. those that you, arbitrarily, decide are "non-trivial structures") is, in fact, a dictation of types and, in particular, of lack and limitation of cell types. If you did grok it, you'd have just now lied to me, but you're just... ignorant of your own behavior, not malicious.
- I do make a stipulation that non-trivial "data structures" use the relational system. But what does that have to do with your "blob" comment? How is it connected with "cell types"? -t
- It's directly analogous to your behavior... you've stipulated some arbitrary limitations on cell types. Modulo your hypocrisy, you should be willing to accept the same sort of arbitrary limitations being stipulated unto you. Can you make any powerful arguments that your decisions for what constitutes "non-trivial *structures*" is non-arbitrary? You did some HandWaving in MagicEverythingMachine in an attempt to defend support for strings, but how are strings - especially variable width strings or Unicode strings - "non-trivial"?
- It is a somewhat arbitrary or situational distinction. I cannot make any hard or fast rules for what to make singular and what to make into structures. It is comparable to the OOP decision of what to make classes and what to make methods. Or whether to make categories per column in a table or use a row-wise AttributeTable. There are rules of thumb, but they are not "hard" rules. Strings are close to the border-line either way. -t
- In OOP it is the programmer, not the OO LanguageDesigner, that chooses what to make into classes and what to make into methods. If this situation is to be comparable, it must be the DBA, not the RDBMS developer, that chooses which structured DomainValues are represented within cells and which (if any) are broken up into global tables. However, I have a strong impression that your "situational distinction" is something that, if you were the lead developer of a SMEQL RDBMS, you would enforce upon DBAs. Is my impression mistaken?
- There'd be no smooth way to stop them if they want to put structures inside "cells". One of SMEQL's goal is to separate the domain math from the query engine. Achieving this goal would give DBA's a lot of room to design whatever "cell-types" they wanted, even stupid ones. -t
- It would be easy to 'stop' them: simply don't support them. Don't support constructors that return top-unapproved types. Force the users into utterly inefficient patterns such as storing complex structures in strings or BLOBs. Require plug-in modules to the RDBMS, one per very specific type, so that programmers need to go through the DBA for each variation on a structured type. Raise TuringTarpits and DiscontinuitySpikes and RedTape? on whatever you consider bad, and people are in all but the most literal sense forced to do what you consider good. So, answer the question: is your attitude here comparable to OOP where the LanguageDesigner supports both methods and objects and leaves the choice to the programmer? Or is it your intention to leave in cold water anyone who would do anything that TopMind believes is 'stupid'?
- I too could also use "convert (copy) it to a different collection type to do 'different' math on it" to your join-by-treebranch-equality puzzle. -t
- I did not imply that you cannot convert relations to other relations, but I still don't see how that helps one join over structural equality between even simple trees. I don't find very convincing your tactic of effectively saying "I can do X therefore I can solve this vaguely related problem Y".
In pages such as CrossToolTypeAndObjectSharing, what we asked is the ability to use trees in joins and 'where' clauses by structural equality, and perform other "domain math" on trees. You insist these trees should be represented in relations but, so far, you have consistently and persistently proven unable to demonstrate how one might go about supporting "tree domain math" with "tree-oriented relational operators".
The 'best' argument you've ever offered in defense of your policy - that trees-nested-in-cells can't "play in relational reindeer games" - is bogus: there is no reason that "domain math" operators cannot be extended to return relations in much the same way as relational aggregation operators return non-relations. And that's the most logical argument you've presented, albeit predicated on a clearly invalid assumption that the only tables available to relational operators are those initially in the database.
I'm tired of your fallacy, sophistry, HandWaving, and NonSequitur arguments on this subject. If you had the education and intellectual capacity to recognize valid points when you stumble across them, I'd tell you to come back when you have one. As is, I can't even offer that suggestion, because I strongly suspect I'll get more of your usual piffle.
- Projection. If are so knowledgeable, then provide the killer examples (practical, not lab toys) that show I'm wrong. That's what smart people can do. Fakers nearly claim they are smart.
- Examples have already been provided in pages such as CrossToolTypeAndObjectSharing. That you consider them "lab toys" because they aren't "typical custom biz apps" is your foolishness alone.
[Top, I want to create attribute values that can be complex numbers, polynomials, trees, and graphs. I wish to be able to query and manipulate them with the same ease that I can manipulate integers and strings. This is not achieved with "tree oriented relational operators". This is achieved with an appropriate user-defined type system.]
As far as how to join on the equality value of trees or branches, the "domain type system" could tie into the relational system (having tree operations also) to calculate equality. It would create a dependence between the two systems, but would at least provide the necessary functionality. We'd be sacrificing some independence as our "cost". (And I'd still like to see a real-world scenario for such because there may be a more direct way to achieve it.) -t
... or we could just support trees as domain types and relations as domain types allowing full independence and functionality but sacrificing Top's pet principle as our "cost".
Hard-borders between "types" has downsides, especially in volatile domains. IS-A is less flexible than HAS-A.
[Nonsense. You make this stuff up as you go, don't you?]
Projection again. I've given requirements at the top. If your approach is free of down-sides, then show how it satisfies these.
No, you are indeed speaking nonsense, especially regarding IS-A vs. HAS-A (which is an inheritance relationship issue but not a concern for types or structured DomainValues in general). And the requirements you presented at the top are far from complete. I'll add a few more things to consider.
Re: "Query specification - ability to describe, say, a tree to match against as part of a query vs. entering a new tree into a table and using its surrogate ID as part of a query."
- Suggested rewording: "ability to syntactically supply an embedded tree versus having to give the ID, pointer, or name of a previously-defined tree." -t
- Accepted.
Variables aren't the values they reference, doubly true when mutable. It isn't a matter of perspective.
Thus, how is one measuring this?
The above was explicit on that point: use a query that involves comparing against a tree described for just that query, see if it involved entering a new tree into a table and using its surrogate ID as part of the query. If you need an example rather than just an answer on how to measure it, consider:
'myStuff WHERE treeA = tree(1,tree(2,3))').
vs.
'newTreeSurrogateID = makeNewTree("tree(1,tree(2,3))");
myStuff WHERE tops_handwavy_treecomp_op(treeA,'=',newTreeSurrogateID)'
vs. performing 'makeNewTree' procedurally on application side or adding some features to support describing tables with autonum variables built in.
In all the latter cases, one ends up with a bunch of new 'nodes' in the dedicated tree table upon ending the query. This is a significant qualitative difference, and is reasonably among the "things to be considered".
That appears to be almost entirely a linguistic issue. If you note on SMEQL's opening page, it allows virtual tables, including nested virtual tables. Also, it leaves things like comparison operators to the "domain math" interpreter. Thus, the comparison system/syntax is up to the domain expression processor, not SMEQL. One could use BrainFsck in the WHERE clause (or equiv.) if they wanted to. You appear to be mixing up mostly unrelated topics. If one wanted to add shortcut syntax for tree specification, they can. SMEQL doesn't prevent it. (I agree that some kind of name-space management or dummy-name-generation technique may have to be specified.) -t
They aren't unrelated. Why not? Because you violated independence of 'domain math' and 'relational' by forcing complex domain values out of the little 'domain slot' provided by relational. Virtual tables won't help all that much here, especially given you can't get away with reusing those surrogate IDs in a virtual table. Give the problem a real try rather than using words and HandWaving to pretend it away.
Yes, I already admitted that one would have to violate *some* independence already. The heavier we want to integrate, the more we have abandon independence. That's life. It's a matter of balancing fundamental trade-offs.
No... in this case, "That's top's pet principle demanding sacrifices that we really don't need to make."
You face a similar problem with "types". If they share/integrate well, then they lose some or all of their "type-ness". And I'm not sure what surrogate ID's you are referring to. -t
What the? How does supporting effective integration and composition of types make them lose "type-ness"? As far as "surrogate IDs" I refer to those "foreign keys into the 'TREE_TABLE'" or whatever table you are using to maintain complex tree or graph structures.
I don't know what this is in reference to. -t
Perhaps a line of reasoning would clarify it for you:
- Your approach is to not store "non-trivial *structures*" inside "cells".
- Therefore, assuming we have a non-trivial structure, it must be stored outside a "cell"
- Therefore, there must be a reference from the "cell" to this non-trivial structure which is "outside" the cell.
- You wish to store values in other tables.
- It may be the same table, not necessarily "other".
- Therefore, the reference we need must be a key into another table.
- Options for keys are described in AutoKeysVersusDomainKeys.
- Among those options is a particular class of auto-key: 'surrogate keys' describe something not really for the user to see.
- Given that we're working with DomainValues (i.e. complex structures that, only wholistically, describe individual measurements and domain properties) users shouldn't be particularly interested in the keys. So the choice of 'surrogate key' is justifiably appropriate.
- I called these "surrogate IDs". This should not be a point of confusion.
- Next, assume we want to perform a query. It is, after all, something we do often in a relational system.
- And, as part of this query, you want to find all trees structurally identical to a particular other tree... perhaps a tree described by 'tree(1,tree(2,3))', or perhaps something larger.
- Further, for inductive purposes, assume we have one of your magical 'domain math' operators to perform the equality test. I honestly don't believe you can make it work, but I'll humor you since this argument doesn't depend on the presence or absence of that feature.
- This leaves us a problem: we must, somehow, pass into this 'domain math' operator both a description of the query tree and a description of the tree against which to compare it - a tree that happens to be in an external table referenced by a surrogate key.
- Now, I assume we do not wish to implement two versions of the tree-comparison operator: one for comparing trees to a query string and one for comparing and joining trees by surrogate ID. Thus, for symmetry, our domain-math operator either needs two surrogate keys into an implied table, two (table,surrogate key) pairs, or two non-relations (e.g. having us convert a tree into a string by use of a query).
- The first option, using two surrogate keys into an implied table, always requires the addition of a tree to the table just to perform the query. The second option may do so, but one might get around the problem with some suitably flexible domain operators.
- Either way, it is reasonable to issue this concern as a "Thing to consider" even given your elusive 'domain math operators'.
I'm (slowly) working on a pseudo-code illustration of joining on tree equality. But before that, as far as how "embedded" trees could work, my approach would be similar to how
TqlColumnTable's work: the parameter is really a table, but in different clothing. A "shortcut" syntax is created to generate a dummy or virtual table, sort of like an anonymous function in FP. Or it could be a pre-processor command.
For example, suppose we did this to SQL:
SELECT * FROM $tableShortcut("tree","1,2,(3,4,5)")
Before the SELECT statement executes, the "tableShortcut" pre-processor or "generator" function command "runs" and sends the two parameters to a special function that may resemble:
function tableShortcut(env[], params[]) {
var indic = params[1]; // process indicator
var shortcut = params[2]; // shortcut syntax
var rt = env['result_table']; // result temp table name
if (indic=='tree') {
while(row = treeParse(shortcut) {
runSql("insert into % (%,%,%)", rt, row[1],row[2],row[3]);
} // we assume 3 columns here to simplify the example
}
}
We could make a dedicated tree command, but this approach is more flexible and can be used with graphs or what-not. If your domain is tree-heavy, then perhaps a dedicated function would be warranted.
Can you clarify above the target schema, where all the surrogate keys are being produced and forwarded into other parts of the tree, and perhaps use a slightly less trivial input example? Also, it is unclear by looking how 'rt' is returned; is that name special in this MyFavoriteLanguage pre-processor?
As far as "rt", I've used different languages that returned system-generated temporary file names. I don't know how they worked, I just know that such technology exists. Incrementing a sequential number is one approach. In this case it could keep track of the temp names and clean them up after the end of the query or query set. (Since they are virtual, there may not be a disk-cleanup step anyhow.) I'm assuming our "kit" provides that service, but we could have a dedicated function:
rt = makeTempTableName();
As far as how to turning a string expression into a data-oriented tree, there are many ways to do it and I don't think we need to get into that here. Get a compiler/interpreter book (
BookStop, he he).
Saying "there are many ways to do it" still leaves among the "things to consider" which approaches can be achieved effectively and the development of a language to actually achieve it. However, I'll humor you on the issue for now; I prefer you focus on demonstrating the join for tree equality.
Parsing text to tree structures is a "solved" problem. I don't know why that step seems to bother you a bit. Is it how the parent-ID's are computed?
Keep in mind that we're doing a comparative analysis here, not a capability analysis. You say parsing text to tree structures can be done, and that's true, and relevant, but it is only a partial answer. There are a number of properties that make big differences in terms of performance and convenience. These include: (a) how are fresh surrogate keys obtained and associated with nodes?
- Please clarify. Do you mean how an Ess-Expression-like string is turned into a tree?
- No, I mean: how are surrogate keys obtained from the database? and how are they associated with nodes such that the children refer to the parent or vice versa, or perhaps more complex structures in a graph?
(b) can nodes with fresh surrogate keys be entered all at once or must they be entered one at a time?
- Please clarify. Data-entry utilities/UI's are a side-issue as far as I can tell.
- The costs and message-passing associated with data entry are not a side-issue. The ability to define new trees declaratively without concern for order-of-entry within a query vs. procedurally and concerned about order-of-entry within a query is not a side-issue. This is part of your so-called "macro-rigor". You're trying to ignore it in favor of your own "pet metrics."
- So macro-rigor *is* a useful concept after all??? See below about order of entry.
- No, it isn't, which is why I describe it as 'so-called'. I consider costs and message-passing and such "micro-rigor" issues, and I've always been paying attention to a wide array of such issues that are relevant to overall performance, convenience, flexibility, security, safety, etc.
(c) do nodes need to be entered in any particular order?
- I don't see why they would need to be. If the branches are ordered, then some sequence number would probably be needed. The utility function that converts strings to table-based trees can do it any way it pleases. (Although sticking to certain conventions is recommended for cross-matching utilities.)
- Relevant to your recent statement: do you need to re-implement these utility functions once per "kind" of collection object? Or, worse, once per "type" where a tree of integers may need a different parser than a tree of graphs?
- Graphs would use references (graph ID's). We wouldn't need to create a different kind of tree thing. Anyhow, your example seems more and more stretched as far as matching real-world problems, at least the kind I encounter. Maybe you like asking questions in a nested way because you think nested, but for the most part the real-world is only lightly nested at best, and relatively nested in practice.
- I'll agree that you probably don't encounter these problems. That's because you don't write systems software, do any work in robotics or communications or science, write video-games, etc. For the most part, you don't know the real world, and you shouldn't be saying what it "is" or "isn't".
- Perhaps those industries are just stuck in a non-relational mindset out of historical habit. It's hard to tell without actually trying it. Anyhow, why would business software (allegedly) be immune from such needs? If we answer that, we may learn something important about domains. -t
- Businesses have a really big advantages over these other domains that allows them to get away with a lot: (1) businesses often don't run continuously, so the RDBMS can often be shut down to perform nightly processing.
- Many existing RDBMS do not require a nightly shutdown. True, it may help things when you can do it, but is not a requirement.
- I did not suggest it was a requirement (there is a difference between "can" and "must"). It is, rather, an advantage of the domain that one "can" shut down to perform nightly processing. In many domains, forbidding nightly shutdown is a hard requirement. Thus, there is no option to ever do any 'batch processing', or to freeze things so one can clean up garbage from maintaining complex values in dedicated tables.
- (2) businesses can centralize their data, often assuming a system that won't involve disruption and distributed ownership. These together offer a lot of freedom to get away with a patchwork of duct tape solutions when it comes to everything else, such as representing complex values in strings. They aren't immune, though; businesses would also benefit from reducing the patchwork nature of their current systems and making the RDBMS do more for them. Anyhow, I can guarantee we 'try' to use relational in the other domains, but the existing RDBMSs are insufficient, and RDBMSs are too complex to create for any single project. What we need: (a) subscribe to a query (realtime insert/update/delete messages; DeltaIsolation), (b) represent communications structures and derived information (commands, requests, conditionals, sensory data, permissions, certificates, guesses, confidence) in an RDBMS in an ad-hoc manner and the ability to perform complex functions to break them down, join them, and relate them, (c) realtime operations, including realtime cleanup of garbage and fast and predictable data-entry costs, (d) cross-RDBMS queries for data fusion and to support disruption tolerance.
- Anyhow, rather than pushing this off to the utility function, answer: if the user were defining the utility function, do nodes need to be entered in any particular order?
- I generally would want to store the order of entry/specification just "in case" we needed it, and then simply ignore the "sequence" value if we don't want it.
- Sorry, you're answering a different question. The question is whether the nodes need to be loaded into the database in any particular order. Storing information about the order in which the data was originally described (sequence numbers and such) is a separate issue.
- On a large scale, this wouldn't be very efficient, but on a large scale nobody is going to enter trees via long ess-expression-like strings. This is why the practical world matters. The probabilities of having to stretch things in the way your lab-toy problems do are often small because the realities of the practice often offer or suggest a different way to go about it when things go too far in a certain direction.
- On a large scale, people won't enter trees via long EssExpression strings by hand, I agree. What computers produce and work with is an entirely different issue. The practical world is a lot bigger than custom business applications.
- Then show them. Anyhow, we can have an option flag that tells whether to "keep" order info or ignore order info. Done.
- Sigh. You're still solving a different problem than the one I stated. It isn't about keeping order information or not, it's about assigning 'node id' and 'parent id' in particular order or not. And show "what" exactly?
- We don't need to assign node-id and parent-id an any particular order.
- That might be true if you relax sane consistency rules, e.g. that 'parent_id' always refer to an existing 'node_id'.
- Sometimes we want to delay checking and sometimes we don't. It's nice to have that flexibility. After all, would you want your text editor to try to keep fully checking your code before you even finish typing your module?
- I suspect that, even with checking delayed, you'd have ordering problems. If node_ids are assigned an auto-number, inserting a child node before a parent node means you'll need to go back and fix up the parent_id to point to the parent by hand, which additionally implies that your algorithm needs to hear about and track the node_ids for each node. Either way, you're still under entry constraints and performance limitations that simply don't exist when dealing with structured values. As far as a text editor checking code... well, yes, ZeroButtonTesting would be nice.
- If it's based on an ess-expression-like string, then we probably should parse through the string first, count nodes, generate an intermediate structure, and then table-ize it. That way we SeparateIoFromCalculation.
- This particular section was discussing data entry, as in "data" - values being related to with some persistence from other tables - not creation of a temporary tree for "treeEqual" and such. I don't believe there is any RDBMS language that allows you to add a table-ized structure to another without any risk of conflicting IDs without a bunch of advance work to at least reserve some IDs, which means extra message passing.
- I didn't say the end-user would have to do all that. You misunderstood. The utility operation would do it (if necessary, for there may be a count-based-index mode and an auto-index mode). Besides, most trees would probably come from some data-entry UI or screen widget, not ess-expressions. I would judge the utility's merits based mostly on this assumption because it's what I actually see in the field. I would not optimize it for ess-expression entry. -t
- Who do you mean to call the end-user of an RDBMS? The programmer, or the person using the program? I would say the latter, but I doubt programmers will often be entering trees via a CRUD screen. I suspect in your approach, the programmer (user of the RDBMS) will usually need to provide these utility operations. Perhaps you have a different perspective because you work mostly with CBA, whereas I primarily do systems software and and so see a different user as primary.
- What do you mean by "adding" here? You seem to be envisioning something odd.
- Say I have two tree tables, treeTable1 and treeTable2. treeTable2 was produced by "table-izing" a particular EssExpression via a utility function, and has its own set of node_ids numbered from 1..N for N nodes. treeTable1 is a permanent tree table, carries about ten thousand trees, and has node_ids numbered from 1..~100k. Now I wish to update treeTable1 to contain the tree described in treeTable2. That is what I mean by "adding". The difficulty in doing so involves mucking around with node numbers.
- Most our tree operations, such as "treeEqual" shouldn't care whether we are using auto-number ID's or pre-assigned ID's. In different situations, one may be preferable over the other in terms of convenience or performance. This is flexibility at work. You have the same issues. You cannot create a RAM pointer to a parent that does not exist without goofy place-holders. Plus, what happens when you want to disk your trees? Do you reinvent a different node saver that is custom to trees and another that is custom to stacks and another that is custom to lists? That's not reuse. This is a rather nitty issue anyhow. I want to focus on software engineering issues such as productivity and maintenance effort, not how RAM pointers are managed. Further, RDBMS could use RAM address space as unique ID's if they wanted to, but most have found this problematic with non-transient info. It would be silly in general, but might make an interesting *option* for speed-sensitive apps. In short, it could play the same game you are playing. -t
- For the most part, when it comes to value structures, the parents point to the children rather than the other way around. More relevantly, if the structures are supported by RDBMS, then the user need take no special action at all - no setup, no teardown, no inefficient workarounds. And for persistence, the problem isn't as great as you believe; one doesn't use RAM pointers; one uses pointers that represent either areas on the disk or an indirection thereof (i.e. representing files or arenas and an offset into them), plus memory-mapped IO. Use of memory-mapped IO allows one to get the speed of RAM+virtual memory, and an address space larger than any existing RAID disk. Finally, with user-defined types the language knows how to serialize all of the UDTs by their construction; nothing is reinvented per type. It is my impression that you're making accusations from ignorance. Anyhow, complexity of data entry is a productivity issue ("parse through the string first, count nodes, generate an intermediate structure, then table-ize it" is considerably more complex for the user than simply handing the RDBMS the EssExpression-like string). That complexity in turn introduces maintenance issues. Other things I have raised, such as garbage management, is also a productivity and maintenance issue. If you want to focus on those things, then do so, but stop pretending that I haven't been.
- The "parse" comparison is misleading, as already stated. It's not the end user that does all that, but the library. More on this at RelationalTreesAndGraphsDiscussionTwo.
- If your fav type-centric languages do all this and have all the abilities we expect of a DBMS (I doubt it), then why not just use those langs as-is or with minor tweaks? -t
- Most languages don't have the necessary combination of support for RelationalModel programming, concurrency, persistence, AtomicConsistentIsolatedDurable, distribution, and security to act as RDBMS replacements. I would like to see that change, but I don't believe it's a matter of minor tweaks. For example, memory-mapped IO is a fine way to implement the RDBMS persistence, but is a SecondClass facility in most languages; making the whole language use it is very much not minor. A dedicated RDBMS tool capable of readily supporting a wider variety of domains without the complex hacks and workarounds you promote so vociferously would still leave us better off than we are today.
- And if they did have them, I'll bet you'll start to see why nesting is problematic. Nesting does not scale well because tree-ness does not scale well.
- Structured values, including hierarchies and graphs, scale well enough. Relational, unfortunately, does not (it's really difficult to reconcile RelationalModel with DistributedProgramming), but I still favor relational as a data model. Anyhow, I wouldn't use structured values for information. I'd use them only for DomainValues, which I can distinguish by need for GarbageCollection or by noting where introducing ObjectIdentity is entirely artificial.
- I would be interested to hear more about alleged faults in relational with regard to DistributedProgramming, but probably not in this topic.
- Re: "with user-defined types the language knows how to serialize all of the UDTs by their construction" - That sounds odd. Their creation technique/syntax shouldn't generally affect their output layout. Sounds like arbitrary binding of input and output. Please clarify. -t
- UDTs in an RDBMS would necessarily be described as part of the DataManipulation+Data Definition language, similar to how SQL has ways to describe new tables. The physical layout in the RDBMS persistent storage would be entirely up to the RDBMS and its optimizer, but would still be determined based on the UDT definition in the DML. Translation and serialization of these structures in communications between systems would also be determined automatically from the UDT description in the DML, possibly with modifications by the communications protocol (e.g. to allow lazy transmission of large values in a query). When I say "the language knows", what I really mean is "all protocols and APIs based on the DML/DDL" can automate the serialization, persistence, etc. of these values without any additional support from the user.
(d) how are reference loops handled? (e) how many message trips between application and database to enter tree? (f) if tree is temporary for query, does it dirty database (leave 'data' behind if not explicitly deleted)? (g) how much needs to be rewritten per type or per variation on a structured DomainValue type, to support the parsing? (h) do you support variables in structured values filled as part of the query from a nested query? (j) how will errors be detected and reported?
Now, I keep bringing this point up because I believe this is a point of severe, almost crippling, weakness in your approach relative to supporting structured DomainValues in a FirstClass manner. Data entry is required on a very regular basis as part of queries, for updates, for inserts, and for deletes, so there is no denying its relevance. I feel you keep trying to sweep this issue under the rug called TuringCompleteness (which is completely unfair unless you also sweep your presumed advantages under the same rug), so I keep pulling it back out from under that rug, dusting it off, and making certain it is clearly visible.
That said, I'm willing to put that big issue aside for the moment while you work on another big issue I have with your approach: joins for structural equality between DomainValues.
I'm gonna take my sweet time just to make you keep asking.
Another entry for ObjectiveEvidenceAgainstTopDiscussion?
Finally maybe you found one with merit ;-P
(Based on list above)
Here's my list: the cons include:
1. Construction of trees since one needs to associate each node with surrogate IDs, the cons include deletion of trees since such garbage collection becomes the job of application code.
One can add a tree deleter to the "tree kit". Whether it's part of the DB or not is mostly a political/economic decision, not a root paradigm/philosophical issue. What does this have to do with "surrogate ID's"?
Regarding your first point: adding a tree deleter doesn't make the problem go away. The problem is figuring out when to apply such a device and what can safely be deleted, especially when dealing with "sharing" of nodes. If you attempt to solve this problem generally, you'll learn you need to mark some tables as garbage-collected and others as 'root' - or at least that was my experience. This division of tables as collected or root only enforces the idea that GC is a dividing "practical" property/requirement between DomainValue and WhatIsData.
As far as: "what does this have to do with surrogate IDs", well, there were two separate 'cons'. The only reason you're confused is because you butchered the original format of the list (by somehow merging the first two entries and the second two entries) when reformatting it for your response. The first 'con' with your approach is that you need to associate each node in the tree with a surrogate a IDs, by which I refer to the typically auto-numbered 'key' for each node. The second con is that you need to explicitly manage garbage collection. The third con is "any case where a 'tree' or part of one might need to be described as part of a query...". The fourth con is "comparison and equality and join operations between tree structures". I haven't a clue how you kludged it so badly.
- You munged lots of fragmented thoughts together here. I suggest you isolate them and flesh them out some more. Perhaps hard encapsulation would simplify "garbage collection", I don't know. But probably at the expense of easy sharing. Generally I model things with two kinds of tables, which I will call "domain tables" and "temp tables". Domain tables generally match the "real world" one-for-one: there's one record for each real-world equivalent. If it disappears from the real world or you don't want to track it anymore, then delete that record. Temp tables on the other hand are transient copies to simplify a particular process and disappear upon task completion or deadline. I don't know why you are getting caught up in "garbage collection". -t
- Blaming on me your inability to dissect a simple symmetric list where each item clearly starts with "the cons include" is quite pitiful. And the "temp tables" are solving an entirely different problem. If you think otherwise, then try applying them in a situation where the 'temp tables' need to carry and relate trees/graphs to trees/graphs in the 'domain tables', when adding/removing items from domain tables that relate trees, etc.
- Your writing style is poor. If you want to create an example, then create an example. Just do it all the way or not at all. These half-ass examples are half-ass (surprisingly). -t
- If you want more examples, finish what's on your plate first.
- You ain't my mommy, and your lab-rat food is not fit for real humans.
2. Any case where a 'tree' or part of one might need to be described as part of a query in the same way we describe integers and strings in queries, the cons include comparison and equality and join operations between tree structures.
I will agree fixing such mucks up the "separation of domain math from relational engine" ideal, but giving up that ideal to make this possible will not result in anything significantly less that what you propose.
- Please justify that claim.
(I agree a tree-only DB may make expressing some such things easier, but the flip side is that you don't get to integrate them with relational tools/operations without reinventing the wheel. It's an inescapable trade-off between specialization and generalization. Specialized things do specialized things better and generalized things do generalized things better. That should be a no-brainer.)
Nobody here has argued in favor of a "tree-only DB", so the rest of your argument in that line seems to be a StrawMan.
As to whether a specialized system might do a specialized thing better, that seems intuitive but intuition is often incorrect. Related: InventorsParadox. Example: the recent re-implementation of OCaml concurrency synchronization primitives via a generic message-passing facility in Haskell (the result was more flexible, more composable, and far more efficient).
3. the cons include asymmetry between operating over simple types and working with trees and the need for workarounds.
Relational has DiscontinuitySpikes in the dichotomy between "cells" and relations. That I will agree to. But ADT's have it with regard to sharing information inside the types between types. It's a trade-off.
DomainValue types in Relational may not be mutable. As I understand it, only when working with 'mutable' abstract data types does the dichotomy you speak of exist.
4. the cons include significant extra complexity to specify that a schema is keyed on the structure of a tree, the cons probably include a great deal more.
Example?
I'll give an example problem:
CREATE TABLE my_table (
// field1 TREE - oh, bleep, that doesn't work
field1 INT // foreign key into 'tree_node' table
field2 STRING
PRIMARY KEY (field1); // oh, bleep, that isn't quite enough!
// I need a relation where the primary key and indices are
// over the structure of a tree, not its surrogate ID (field1)
);
Compare:
CREATE TABLE my_table {
field1 {type T: tree(left:T right:T node:INT) | leaf}
field2 STRING
PRIMARY KEY (field1);
}
If you don't believe your approach doesn't have significant extra complexity, then make your example (the top one) as simple as mine.
I focus more on usage simplicity, not declaration simplicity. If you crank the "definitional simplicity" weighting all the way up and all the other factors all the way down, then perhaps what you suggest here would be simpler.
- I'm not presenting this as the "one" victory point. I think that, with the possible exception of "implementation simplicity", all other factors weigh in favor of types. That is why I have five other 'cons' in that list, many of them deriding the "usage simplicity" of your approach. Need for workarounds, application-side garbage collection, describing structures as parts of queries, describing join and equality operations, need to fetch surrogate IDs, etc. - these are areas where your approach makes "usage" quite painful.
But to achieve usage simplicity, "tree" is simply one of many possible views of a "structure". I want a relativity engine, not hard IS-A's. Encapsulation hurts sharing, plain and simple. You have not been able to escape that.
- Hmmm... well, I'll agree that "encapsulation hurts sharing" (that's what encapsulation is supposed to do, prevent sharing), but it isn't relevant. We aren't dealing with mutable structures, and we require intrinsic identity, so there is no encapsulation, so your line of argument about encapsulation hurting sharing here is a StrawMan... 'plain and simple'. There is no need to "escape" your arguments when you aren't aiming them properly.
And in general, nothing stops one from putting pretty syntax on top of a relational system to simplify common domain-specific things. If you do X often, then create a sub-language or API that makes X easier and shorter.
- Nothing stops one from putting pretty syntax on top of SnuspLanguage to simplify relational things, either. Appealing to completeness is hardly germane.
Further, here's a declaration in
DynamicRelational:
// declaration follows
There, nothing, nada, zilch. Assignments create columns and tables automatically. Dynamism doesn't require definitions. Thus, if you really want to reduce definition verbiage, then go dyno.
So, you failed to accomplish every single task I set forth ('key' on tree structure, 'index' on tree structure) and called it a victory for DynamicRelational? Well, you certainly have an impressive imagination. When you return to reality, perhaps you can argue something that is less delusional, and show me how to index on tree structure in DynamicRelational.
I don't even know 80% of what the hell you are talking about for last few days. Your tongue took a trip to the Bahamas. As far as dynamic, YOU gave "simplicity" as your criteria. As best I could interpret your little metric, I bettered your example. So what's the problem?
Perhaps you should focus on understanding. Instead, you toss a few insults then offer opinions when you, by your own admission, don't understand 80% of what was said. Offering an opinion on any subject before you know the subject is an act of idiocy.
- Perhaps you should focus on learning to write clear. Otherwise, we'll spend an hour per word.
- I'd learn quickly enough how to make things clear to uneducated laymen like yourself if they were interested in understanding rather than pushing an agenda.
- Oh, so you don't insult here? If you think that, you are delusional.
- I didn't say I didn't insult. Indeed, I think insults are about the only thing you do understand, since you don't know, don't get, and aren't seriously interested in understanding any of the "technical" stuff.
- Lame excuse for poor half-ass writing.
- I'm interested in whether it's my writing that's the problem or your technical illiteracy and AggressiveListening. I do make an effort, and I recognize my writing is far from perfect, but I believe your confusion is more often your failure than my own. I can't trust your biased opinion; I'll need opinions from neutral third parties.
Let's collect a dozen or so semi-realistic semi-real-world examples and test it against those. Your examples tend to be contrived, seemingly as tools of exaggeration to magnify some real or perceived weakness. While perhaps a useful tool probe things, it's often a poor test of the tool/idea in reality. -t
How about you prove you can solve the 'contrived' example that has been pared down to its simplest form before you take on anything more challenging? Your attitude seems to be "if it ain't CBA, it ain't realistic", which I find petty and narrow-minded.
However, a few realistic examples: (a) a database tracking SCTP messages allowing queries over sub-chunk data. (b) MetaModelling? databases where one is representing both data and inference rules (this is an especially big field), (c) a database of databases where each inner database represents a possible future (used in AI), (d) mathematics number-crunching databases where one wishes to perform joins over pairs of matrices or graphs that meet particular combined functional properties, (e) sets as identifiers (e.g. set of short strings) while still needing to join on sub-features like matching all but one element, (f) associative memory databases, where indexing and key access on graph structures is critical, (g) problem-solution tables in functional memoization, which requires the key be a problem consisting of a function and some arbitrary values, (h) behaviors databases for units in a video-game where one needs both variants, conditionals, structure, (i) source transforms and equality descriptions for patterns in data-driven optimizers, (j) a hardware database supporting drivers that needs to describe and classify hardware in terms of IO formats and the memory structure for DMI, (k) a hardware database that needs to contain complex-valued modifiers to optimizations or usage rules for specific pieces of hardware - putting in a string is option, but makes ad-hoc queries more difficult.
- How about we look deeper into the video game one. The others seem to have big domain knowledge learning curves.
- You'll be disappointed if you think "behavior databases" have a short learning curve. How about you look into the so-called "contrived" examples which have no domain learning curve at all?
- So we have the ugly trade-off between realism and brevity in set-up? If we take the contrived ones, then how can we test usability in the real world? I don't seek to maximize ability to handle lab-toy problems.
- I'm not asking you "maximize" anything; you're still falling on your face when it comes to "solving" a simple problem. Attempts to solve or optimize for a variety of "practical" cases without also solving and optimizing for "simple, lab-toy" cases generally result in MentalMasturbation, HandWaving, BrochureTalk, and WishfulThinking. Your attempts are not an exception.
- I did NOT suggest skipping lab toy examinations.
- Hmm? Well, I must have seriously mistaken your repeated complaints and whining here and elsewhere about 'contrived' and 'lab toy' examples. You won't go back to complaining about those before you solve them again, will you? We can look at the video game behaviors database after solving the lab toy problems... i.e. without "skipping" them.
- My complaint is about over-emphasizing their importance. We won't get into the meat of issues, the real stuff, until we do.
That's a short list of semi-realistic semi-real-world examples, many of them quite realistic (seeing as I've needed them but had to go other routes like OOP). Regardless, until you can solve a simple, contrived example, I am seriously not going to believe you'll solve anything more complex. My impression right now is that TopMind is looking for any excuse to close his eyes and ignore the white elephant already stomping all over his ideals. Well, that 'contrived' white elephant is still there, smashing your arguments to pieces. It doesn't need any help from the ten elephants (a)-(k) listed above.
I've actually compared trees before by serializing them into text and then comparing the text for equality. Fancy relational or type systems are not needed to do that. I agree it may not work in all cases, especially large trees, but for a specific little side problem, it did the job. -t
Arguments of that form, where you appeal to completeness and complex workarounds as though it were a point in favor of your current system, are remarkably unconvincing.
Complex? It depends on what tools are already available. That was not a reason to build a full tree kit/DB from scratch. I already had a serializer in that case.
Continued at RelationalTreeEqualityExample and RelationalTreesAndGraphsDiscussionTwo.
See Also: RelationalAndTrees
MarchZeroNine