Continued from MultiParadigmDatabase
(TRDBMS = Traditional RDBMS, AKA Oracle and MS-SQL-Server style)
"Type" enforcement is generally through trigger-like constraints and validation. For example, integer could be enforced using a rule that only allows digits and no decimals. (Pre-defined types generally tend to favor specific languages, which we want to avoid. But for optimization purposes, how stuff is stored internally does not change the basic qualities of a MultiParadigmDatabase. If a column is designated an "integer", then it may indeed be stored as binary. This is just an implementation/optimization issue and does not change what the DB user sees. Essentially such a transformation is considered "hidden compression".) Related "type" info: TagFreeTypingRoadMap.
- In other words, your "types" are going to work just like every popular programming language's types? Or are you planning to make the type constraints and validation unusually awkward?
- No. More like Perl's and CF's type system, not Php's and JS's. And operators like "+" won't be overloaded, like they are in CF. (Storing a tag byte(s) with every attribute not only takes up more storage, but creates the awkward situation where a given attribute (by name) may be one type in one row but a different type in a different row of the same table (or equivalent). Thus, it's best to skip tags. Any type "determination" will be done by parsing the value.)
- Addendum: HiddenTypeTags are always possible as a performance aid. It could require more resources during writing though.
- Historically, DBMSs have avoided dynamic typing for two reasons: 1. Parsing values to determine types on a row-by-row basis represents a potentially unacceptable per-row processing performance hit. 2. The possibility of type errors at run-time -- such as discovering that numeric value in attribute 'Quantity' on row 2,392,381,029,381 of 34,982,039,482,039,042,389 has an invalid character in it after two hours of processing -- is considered an unacceptable risk.
- Regarding 1, typical queries don't need to do so such. For example, we are not overloading operators such that say "+" doesn't have to parse to determine if it's going to add or concatenate. 2 is part of typical trade-offs between static and dynamic. It's not news. Further, MPD as described does allow optional constraints. Thus, it can be "type anal" if we tune it to be. (If you want a lot of that feature for your domain/project, then a traditional RDBMS is probably a better fit. PickTheRightToolForTheJob. MPD makes dynamic easy and TypeSafety possible, but not necessarily the easiest.) -t
- Operators like '=', '>', etc. are usually overloaded. Some DBMSes optimise these by treating all values, for comparison operator purposes, as bit strings, but a dynamically-typed DBMS that supports collation sequences could wind up doing per-row type checking for something as simple as SELECT * FROM blah WHERE x > "blat"
- Not in Perl. I don't agree with Perl's comparison syntax, but the principle is the best known way to avoid overloading. Overloading is a bad habit of many scripting languages and the authors should all be spanked for following stupid tradition. I agree certain numeric-centric computations may be more expensive, depending on the implementation trade-offs chosen (see below), but it's not because of overloading. Overloading can almost entirely be avoided by thoughtfully designed operators. Related: ComparingDynamicVariables. -t
- {Regarding 1, about performance of parsing may be true, but dynamically and statically typed DBMS are very different animals, and performance will be better in some areas, worse in others. I used to doubt that dynamic DBMS would ever match the performance of SQL, simply because of the incredible amount of work that has been continuing to be done to improve SQL databases. But there are examples of dynamic databases that appear to remove that doubt - Intersystems and Openinsight, both proprietary, claim to outperform SQL, even while emulating SQL in the case of Intersystems. Regatding 2, it appears to assume that dynamic typing is somehow more dangerous than static typing because there is no type-safety enforced by the DBMS. You use a hypothetical error situation as an example. An error in either is still an error.} --PeterLynch
- Beware of conflating significant architectural differences, or inevitable performance differences between products, with dynamic typing vs static typing. Intersystems' Caché is an ODBMS and Revelation Software's OpenInsight is a multi-valued DBMS. That means there are significant architectural differences that might explain significant performance differences from the usual gaggle of SQL implementations -- many of which don't have a primary focus on performance -- independent of whether they use static typing or dynamic typing.
- What is notable about static vs dynamic typing in a DBMS, is that static typing in most DBMSes (whether SQL or otherwise) means type checking is only performed when a query is parsed. In a DBMS with dynamic typing, type checking often needs to be done whilst queries run, which can mean row-by-row type checking. My point #2 is not that "dynamic typing is somehow more dangerous than static typing". It isn't. What dynamic typing in a DBMS means is that a type error resulting from data stored in the database may not be discovered until processing has been running for a significant and unpredictable length of time, which often requires that the error be corrected and the process re-started. The real problem here is time and possibly resources, not safety. Whilst type errors can cause exactly the same behaviour in dynamically typed programs, generally the belief is that once a bug in the code has been fixed, it's fixed for good. The general perception about DBMSes is that type errors can conceivably re-appear any time the data in the database is changed. For many purposes, in many shops, that is inherently and inescapably unacceptable. Furthermore, there's a general belief throughout the database community that not only does dynamic typing exhibit the problems mentioned above -- at least in DBMSes -- it offers no benefit that makes it worth enduring the problems.
- It depends on how you define "checking", and also depends on how values are stored. For example, a value that contains only digits could be "compressed" into binary integers, along with a marker at the front of the byte set indicating it's compressed. If an operation requires numbers, only the marker has to be checked, not the entire byte set. (This is similar to a type "tag", but it's not exposed to the user in any way. It's mainly a compression technique.) It's true this would slow down saves in order to speed up read-side numeric computations, but CPU processing is less likely to be a bottleneck than disk or retrieval I/O. I won't address complaints about the "evils" of dynamic types here in terms of run-time checking outside of performance issues. They are similar to the complaints about dynamic languages in general, of which there are plenty of topics already, yet people still want to use them for a wide variety of tasks. -t
- Even if you're only checking a simple type reference like a single-byte marker, that's more processing overhead than is required in a DBMS with static typing. One issue with dynamically-typed DBMSes is more significant than with dynamic programming languages: the fact that type errors resulting from data in a table may not be discovered until significant time has passed and processing effort spent. That does not occur in statically-typed DBMSes or dynamically-typed general-purpose programming languages.
- I don't see how checking one byte per row per math-applied column is significant overhead. Yes, it is slightly more processing than a static DB would have, and I readily admit that. Dynamicness will usually require at least some efficiency/performance sacrifices. I'm not arguing otherwise. If you find a relatively common case where the sacrifice would be more than say 30% without an easy fix/work-around, then I'd like to see such. Otherwise I'm not interested in 20% margins. If it's a high-volume and high-math app or domain, then go with a TRDBMS. Problem solved. (Note that I generally expect the MPD to require more CPU than a TRDBMS and one should purchase equipment based on this.)
- What's a "math-applied column"? Checking one byte per row is significant overhead when dealing with repeated queries against a table with a significant number of rows, where "significant" means a sufficient number of rows to result in processing overhead that causes undesirable cost -- such as energy consumption or CPU load. If you want a "dynamic database", use the usual business tool for solving that "problem": a spreadsheet.
- As far as the second half of your statement, generally type issues are gradually "locked down" as the system/app matures. Type flexibility gives us the nimbleness wanted by many domains/customers, and as the app/system settles down into "a rhythm", constraints can gradually be added. It's less pressure for BigDesignUpFront. Further, often the data entry and validation for any given table is through a single app, and as long as only that app touches the new data, it's already been scrubbed by the app. Granted, this is not the ideal, but in practice it can keep the data clean.
- In industrial settings, that's largely nonsense. "Gradual 'lock down'" and other such ad-hoc development is typical of end-user computing and/or casual application development; it's otherwise needless. Your so-called "nimbleness" is a myth borne of poor IT practice. Far greater "nimbleness" can be achieved with reasonable TypeSystems, without loss of TypeSafety.
- Also keep in mind that every row automatically gets a non-reused (by default) "row ID", which should be the most common way to cross-reference (join) rows. This ID is read-only to apps and query users such that it's always numeric (or at least cannot be twiddled with by outside apps or casual DBA typos).
- Why would I "keep in mind" such a needless and pointlessly arbitrary requirement? Again, you appear to be re-inventing the spreadsheet.
- I beg your pardon; I don't propose features for zero reasons. Spreadsheets are popular, actually; especially when in the early or draft stages before requirements settle enough to impose "stiff" regularity and structure. I have even built production spreadsheet/database hybrids of sorts where the requirements straddle both tools. Many existing RDBMS automatically generate/use an internal (intended) unique identifier anyhow, regardless of whether "outside" users make use of it. We might as well piggy-back on that feature. (The internal ones may be re-cycled, which is probably not a default behavior we'd want; although doing away with recycling may impose performance overhead.) If we are talking about a sub-set or special case of a MultiParadigmDatabase, the DynamicRelational database, then there is less need for automatic ID's, at least as a default feature. The spectrum is generally this:
<--- Requirements are Fuzzy ----- Medium ----- Requirements are Settled --->
Spreadsheet...MultiParadigmDatabase...DynamicRelational....Traditional RDBMS
.
- Since neither MultiParadigmDatabase or DynamicRelational have been implemented (nor does anyone appear to be rushing to fill that apparent market gap), I suspect the industry will continue to use the following, as it has for the past three or so decades:
<--- Requirements are Fuzzy and/or low-volume ---------- Requirements are Settled and/or high-volume --->
Spreadsheet.............................................................................Traditional RDBMS
.
- The fact that they are speculative products is made clear on the parent topics. (If it's not clear, that's the place to adjust rather than use repetition as a (poor) solution.) I see no reason to repeat that information here. In my opinion, there is a need for something in between. For one, spreadsheets have weak or no concurrency handling. And the fact that there have been few if any similar attempts is not a reason to not propose a project. Past attempts appeared too different from existing DB's to relate to. My strategy is make it closely resemble existing RDBMS except where changes are needed to provide dynamism. In other words, keep existing RDBMS idioms except where doing such would conflict with dynamism. Other attempts didn't appear to have this goal, but rather started with different data structures or data "shapes" all together and had unfamiliar query languages.
- The fact that they are speculative products, of largely-unspecified design and theoretical foundation, makes discussing them pointless. As for concurrency handling in spreadsheets, have you tried Google Drive's spreadsheets? It might make you rethink the "weak or no concurrency handling" claim.
- What's "largely unspecified"? Ask questions if you need clarification; I cannot read your mind to know where there are gaps in your understanding or interpretation.
- I see nine bullet-points of design attributes on MultiParadigmDatabase. I see no language specification, no interface design, no data model, no architecture, no sample database schema designs, no anything that gives me the slightest indication of what it would be like to use a MultiParadigmDatabase. I see one code example (heading: "Example constraint/type dialog") but the context isn't clear. Is a MultiParadigmDatabase standard SQL where the only type is String? Or is it something entirely different with only a glancing resemblance to SQL? The nine bullet-points don't tell. I'm sure you have a rich picture of a MultiParadigmDatabase in your imagination, but until it makes it onto a WikiPage it stays clear to you and a mystery to the rest of us. It took DateAndDarwen multiple books and a language specification (TutorialDee) to explain the well-understood RelationalModel with StaticTyping; what makes you think a MultiParadigmDatabase (the RelationalModel with DynamicTyping?) can be understood from nine brief bullet-points?
- See MultiParadigmDatabaseQuestions for responses. Keep in mind TutorialDee started out as a rough idea and gradually was refined and implemented over time. Why create an impatient double standard for other database draft suggestions? It makes it appear you personally don't like me.
- Prior to TutorialDee, TheThirdManifesto was a detailed paper on desirable characteristics of a faithful implementation of the RelationalModel. That was followed by a book that provided a complete grammar, definitions, and examples of TutorialDee in sufficient detail to implement it. It was used to develop the first version of the RelProject. There was never a "rough idea" of TutorialDee that wasn't sufficiently complete to understand it in great detail, so there is no "impatient double standard". What you've provided on MultiParadigmDatabase is a rough idea in insufficient detail to implement it unless, as I've suggested, a MultiParadigmDatabase is a DynamicallyTyped implementation of the RelationalModel, e.g., a SQL implementation where the only column type is String.
- Okay, I may be mistaken about the history TutorialDee. I know you are not satisfied with the level of detail I've given so far, but to keep repeatedly complaining about it multiple times repeatedly redundantly in multiple places is unnecessary and grows quite annoying such that I feel like calling you some really nasty names right now.
- Why do you react emotionally to having the clear inadequacies of your exposition highlighted? Don't you wish to improve your descriptions?
- I suspect your repetition is also an emotional response of sorts. A good Vulcan wouldn't do that. We are both flawed beings.
- I agree with your last statement.
- Regarding "SQL implementation where the only column type is String", that would not satisfy the on-the-fly-object-creation without some additional changes.
- I see no mention of "on-the-fly-object-creation" (which is what, exactly?) in the nine bullet-points at the top of MultiParadigmDatabase.
- From bullet list at top of DynamicRelational: "these features are kept [from MultiParadigmDatabase]...Dynamic schema via "Create-on-Write" - Columns and tables can be created just by putting or updating data in them." - I agree I need to clarify this at the MultiParadigmDatabase topic itself, but the statement "Each row is a relatively independent map" in MultiParadigmDatabase pretty much dictates such behavior. I don't know of an alternative internally consistent interpretation besides on-the-fly column creation. I'd be interested to know what alternative interpretation your brain supplied you when reading that.
- I didn't interpret it all. It didn't tell me anything useful.
- As far as "theoretical foundation", that has no objectively measurable meaning, and I will treat it as a personal opinion. Further, most new tools come about because of a practical need, not because of a "theoretical foundation" (including the spreadsheet). And Google Drive spreadsheets don't supply SQL or SQL-like querying that I now of. -t
- SQL databases use the RelationalModel as a theoretical foundation. Typical expression evaluation in popular imperative programming languages uses simple algebra as a theoretical foundation. SETL uses SetTheory as a theoretical foundation. Without either a clear specification or a theoretical foundation, the reader is forced to guess what a MultiParadigmDatabase may or may not include. For example, does a MultiParadigmDatabase require SQL or SQL-like querying, or would something akin to MicrosoftExcel's filters be sufficient?
- I agree that some tools are based on a so-called "theoretical foundation", however, that does not mean that all are; and many, if not most, are not. Thus, it is not a prerequisite. And various SQL- or SQL-like examples are given in both the MPD- and DR-related topics. If they are not sufficient to address specific issues or questions you may have, ask with explicitness. And I'm not sure what you mean by "require". I have no military to force anybody to do anything. One can take any OSS product and pull SQL or anything they want out of a copy/fork and they won't go to jail. Further, it does not appear that dynamism conflicts with relational theory (any more than existing RDBMS). If you know of a specific violation, please point it out. Vague innuendos and vague accusations are unhelpful and make you look like an arrogant dick.
- A theoretical foundation, if referenced, can be a substitute for a certain amount of explanation. Stating that a language has a SetTheory basis (like SETL), for example, removes the need -- at least in concept -- for detailed descriptions of what is otherwise an arbitrary collection of distinct operators and structures. In the absence of such a theoretical foundation, a detailed description is needed. See my comment above re "I see nine bullet-points ..."
- I suspect all practical implementations are going to require lots of nitty gritty description to completely describe the syntax and commands. If you implement "raw math" you don't get anything usable and competitive for field production by normal developers. Lisp is probably the closest any ever came to such and still be almost practical.
- That's certainly true of the specifics of practical implementations, but identifying the theoretical foundation can shortcut some lengthy descriptions about the idea. Stating that it's (for example) "the RelationalModel with DynamicTyping" says a lot.
- TupleDefinitionDiscussion has a discussion on whether "dynamic tuples" violates relational or not. I don't see any clear-cut violation identified. (Although a PageAnchor seems to have gotten misplaced. Hmmm.) In other words, I don't see where somebody proves relational must have Characteristic X and then proves that dynamic rows lack this characteristic X. (The auto "row_id" makes them unique, so uniqueness is not an issue.)
- Did anyone claim that "dynamic tuples violate relational"? By the way, what is "relational"? Do you mean the RelationalModel? What's not clear is whether your DynamicDatabase?/MultiParadigmDatabase uses "dynamic tuples" or not, or what "dynamic tuples" mean in the context of a DynamicDatabase? or MultiParadigmDatabase. As often seems to be the case with your ideas, your definitions are distributed over various otherwise-unrelated pages and embedded in various debates. You may be able to keep track of them, but the reader cannot. Without a definitive WikiPage to define DynamicDatabase?/MultiParadigmDatabase, it seems unreasonable to expect the reader to divine their characteristics from various debates and documentary research.
- I have already agreed the documentation could be better. However, it's premature for a complete specification because that may limit implementation options. For example, suppose Martha wanted to make a DynamicRelational database using mostly parts from the MySql code-base because she was familiar with it. If I used mostly conventions from PostgreSql instead (outside of "dynamic" features), it would probably be much more difficult for Martha such that she may abandon the idea. I don't have a free army of elves to build it to my every whim. Thus, pre-dictating things such as the character used for string concatenation would unnecessarily limit one's implementation options. I'm trying to establish the general gist of MPDMS/DRDMS without premature explicitness that limits implementation choices. Perhaps I've failed at that goal thus far, but your criticisms are not giving me a decent alternative to use as a documentation style reference example, in large part due to the premature-explicitness problem. Language explicitness would solve one problem but create another. -t
- As far as what to formally call/classify the "data structure", I believe DynamicRelationalQueryExamples gives enough info to either propose a formal description, or to be able to ask questions to narrow down the intent.
- And note that by incrementally adding constraints/types, a MPDB can morph into something pretty close to a traditional ("static") RDBMS, although its dynamic roots may impose performance penalties versus a direct RDBMS. As I've said elsewhere, one would typically only use a MPDB if flexibility is deemed more important than performance concerns and/or hardware costs. BigDesignUpFront is not always possible and/or practical. -t
CategoryDatabase, CategoryPlanning, CategorySpeculative, CategoryCriticism