One of my complaints about types and objects is that they are difficult to share across different tools and languages, at least without some kind of pre-defined conventions or standards. The more type-ish or more OOP-ish it is, the more difficult and involved is the sharing.
I'd welcome suggestions or getting around these limits, or perhaps an agreement if they are an inherent part of types and objects. Markup languages and delimited tables are usually just easier to share (and partially aided by standards such as ODBC and HTTP).
[Eh, "sharing across different tools and languages" is difficult for ANY system without using pre-defined conventions or standards - you can't even share "plain text" across tools without conventions or standards. Markup languages and delimited tables ARE standards for serialization of structured information. If you like 'standards' for serializaton of object-structured information, consider YamlAintMarkupLanguage or JSON as a serialization medium. If your goal is service sharing, such that the object must remain at its home, then you would need something like CORBA just as you currently use ODBC for DBMSs... though HTTP is also well enough suited for the service sharing between object systems (each object = URI; one can get/update/delete/etc. the object).]
I don't know that OOP makes this more difficult than anything else. My LISP code won't work in Javascript, COBOL won't share with PHP, Pascal and VB don't see eye-to-eye, and so on. CORBA (particularly for objects) and standardised WebServices are perhaps a step in the right direction, and XML was briefly (and naively) touted as the universal solution. Currently, DBMSes (and their standard interfaces like ODBC) using canonical primitive types do tend to provide a common point of contact -- at least for many enterprise business applications.
In terms of advanced type support in DBMSes, DateAndDarwensTypeSystem implies the potential for being shared across languages -- particularly because of the explicit elimination of any notion of "object identity" (there are only values) -- but this doesn't diminish the inherent complexity of implementing the system in each host language or environment. This is also an issue for CORBA and objects. Of course, once it has been implemented on a given platform, the issue essentially goes away.
The success of CSV, HTML/XML, and HTTP are largely because they are text-based and type-light.
[I think this claim unjustified. You certainly lack evidence to attribute their success to their being 'type-light', and the fact that there are many text-based and type-light structured data formats that haven't achieved 'success' suggests that being text-based and type-light isn't 'largely' a cause for success. As far as what 'type-light' means when working with XML Schema and such (which are pretty heavy on formalized structure and validation), I've no real clue.]
It's an opinion. But let's approach it the other way and look at the top successes of type-heavy and object-oriented sharing techniques and standards. I don't see much success in this area. CORBA is probably the best try, but has been a yawner.
- What rigorous, quantified metric is represented by "has been a yawner"? Actually, CORBA was and still is very successful by any measure, and is only now giving way to ServiceOrientedArchitecture largely based on WebServices. These are unquestionably typeful, being based on standardised XML schemas.
- I have no rigor, but neither do the type enthusiasts. So there! And CORBA is considered a joke by most, except expensive vendors defending their buzzword-filled bloatware.
- Who would be this "most" that considers CORBA a joke?
[I dunno... in
my opinion mime-types, codecs, XML with their schema, Yaml resolved type-tags, WSDL, etc. all seem to indicate the type-heavy approaches are alive are part of present and future state-of-the-art. Data types are by nature assertions that data has certain structure and properties. Validation of these structures and properties at runtime is essentially
TypeChecking. I believe your implied assertion that 'XML' is 'type-light' was already in error. Indeed, I suspect the 'success' of CSV/HTML/XML is in part because they were MORE type-heavy than both unstructured opaque and ad-hoc structured plain text alternatives.]
- They ['They' = XML Schema?] may provide for such spec-wise, but in practice such features are not used much.
- What are you referring to? There is no question that CSV/HTML/XML are used heavily; as stated above: much more than "unstructured opaque and ad-hoc structured plain text alternatives". Do you have statistics to back up "not used much"?
- [I suspect he's discussing XML Schema in particular. As far as "not used much", he's just waving his hands and pulling statistics out of his arse as usual (just as he did when claiming the success was 'largely' due to CSV/HTML/XML being 'type-light'). I can't speak for the world, but in my own work I see XML schema "used" plenty often. It is used to both specify the 'type' of XML data required by a code unit, and to help direct the writing of code to process that type (often via AutomatedCodeGeneration).]
- This is an informal discussion, so please stop accusing me of "waving my hands" just because I don't provide peer-reviewed journals. Your counter claims not Nobel material either, bub. Otherwise, I'll wave my middle finger up your snake nostrils. Tell me when I am NOT allegedly "waving my hands" or being bad to compress your insults and accusations. (And code generation is often a sign that you are using a bloated coding style. See CodeGenerationIsaDesignSmell.) --top
- [I don't even state my claims as though they were strong enough to be 'Nobel material', whereas you just speak out of your ass and invent convenient statistics by habit whenever you feel it's an informal discussion. This is still an intellectual discussion; such dishonest behavior isn't acceptable, and fully deserves a mention of "hand-waving". (And I agree: CodeGenerationIsaDesignSmell. I use the code-generation DesignSmell because the OperatingSystem has a MissingFeatureSmell and doesn't support OnceAndOnlyOnce representation of the 'integrate-a-parser/serializer' pattern. The only way to get away without integrating a parser is literally to have data presented to me in pre-parsed form... i.e. as would be offered by a typed FileSystem and language integration.]
- I didn't "invent statistics". That is a fscking false. I am tempted to call you a "liar", but will give you the benefit of the doubt and hope it is just gross slothfulness on your part.
- [You didn't invent any statistics? Then please provide your sources for the following: "such features are not used much", "success of CSV, HTML/XML, and HTTP are largely because...", "considered a joke by most". These are statements about statistics. If you didn't invent them, where did they come from?]
[Today there is some awkwardness because there is no 'intervening'
TypeSystem between tools. Support for well-typed
FileSystems is not at all impossible. Today's requirement is that you work with rather opaque strings, forcing every application to share or build a library for serializing to and from structured files, using unreliable dot-extensions (.ext) as type-tags in the filenames; the end result is awkward, inefficient, ad-hoc, and painful... but apparently still better than unstructured, unvalidated, insecure input. The desire for typed filesystems with structured data
clearly exists; it is why dot-extensions and XML and such are so successful and pervasive. I fully expect that, at some point in the future, the
FileSystem will store structured data - not opaque binary, but rather be reflective and fully apprised of this structure - allowing rapid access and manipulation to well-typed attributes that can be coerced to strings and back for viewing and hand-editing (as well as supporting transactions and versioning and similar features). A typed
FileSystem would in turn pave the way towards supporting typed communications (typed FIFOs, for example) typed process IO, and well-typed workflows (so, when you issue a command-line piped workflow, it can be checked for safety at that time) allowing for far more optimal workflows implemented with far fewer serialization and parsing pains between processes.]
- File "types" are a whole other subject, especially when versioning and feature-support-sets are involved.
- [To the contrary. For "Cross Tool" operations, file "types" - i.e. files with known and required structure - are FULLY HALF the subject (the other half being typed protocol). You brought them up with CSV, HTML, XML, etc. Please think about it carefully before dismissing it out of hand. Components within a strongly typed language communicate via passing and/or referencing of data items of known structure, verifying and implicitly coercing this structure either at need during runtime (DynamicTyping) or upon constructing the program (CompileTime StaticTyping). Components in a strongly typed operating system (tools/programs/etc.) would, therefore, do the same, but would do so using the 'data items' of the OperatingSystem - these being 'files' in today's systems. The equivalent of DynamicTyping in such a system would be implicit verification-at-need, and the equivalent of StaticTyping would essentially be an OperatingSystem that can reject command-line programs (commands and pipelines) because they are ill typed.]
- [I won't deny that other FileSystem features (like versioning, transactions, FIFOs, distribution, encryption, security, efficiency and such) are also important, in much the same way as they are useful features in programming languages. However, I believe the importance of these features does not diminish the relevance of file types to the subject of CrossToolTypeAndObjectSharing.]
- File "types" are "flat" types. That is they they are not generally nested or interweaving. They are not a system or network of types/objects, but rather *base* info necessary for communication (without AI to guess format of contents).
- [I can think of a number of exceptions. Examples include AVI (audo-visual interleave packages up multiple base types (named codecs) together in a well-defined structure), the <OBJECT> fields in HTML (stating other types and files to embed/include within a page, using MIME-types), XML (each schema is itself a system of types, and schema can reference other schema by URI to compose larger systems), Yaml, etc. I'll admit that support for nesting is more the exception than the rule. I certainly agree that the system of file types in use today is far from formalized, lacks a global standard mechanisms for type composition, are somewhat unreliable, and are rather awkward in their application. It could be better. But to claim they aren't 'interweaved' or don't form a 'system' of types - that is, not even admitting to a 'inconsistent' and 'at-best semi-coherent' system - seems to me an error.]
- [So what makes you bring up type-composition as though it were a critical feature in this discussion? Do you believe it a prerequisite for the 'type-heavy' adjective? In your mind, is a system with a thousand distinct 'flat' types (as is common to files after removing the exceptions) more or less type-heavy than a system with just three composable types (string,map,sequence)?]
[Over time, cross-tool types will become ever more pervasive and less awkward. Programmers have made clear their desire for types since they started using dot-extensions. Dealing with opaque strings is painful and ad-hoc, as is repeating the IO effort between each application, so programmers want to do much less of it and automate as much as possible, creating standards like XML and YAML to carry their structured data, and dedicated libraries to work with it. They'd prefer to even be lazier yet and not need to include libraries just to perform basic typed input and output. They'd prefer some greater optimization for space and speed, especially in pipelined workflows where the output is serialized to string just to be parsed again by the next process. Besides, type-safe command lines would just be darn cool, doubly so if the use of types was integrated into the command predictions and tab-completion (so 'mplayer <TAB>' lists only the files mplayer can accept). The reason it hasn't happened yet has a lot to do with inertia: almost every
FileSystem tool would need to be rewritten; language libraries would need updating if they are to treat files as more than BLOBs; programs would need to specify the sort of IO they expect. Also, a
TypeSystem would need to be chosen - which is likely to create something of a battle between the
FunctionalProgramming guys (including me) and the
ObjectOrientedProgramming guys (including the majority of everyone else). Nonetheless, while the effort to make it really work is staggering, it
will happen. Someday. But, until then, we'll just limp along with structured text, schemas, mime-types, codecs, and unreliable '.ext' type tags embedded in filenames.]
There's also the type-light/free camp that fight the bloat and over-engineering they see/perceive from the OO/type-heavy camps. We will fight for the PowerOfPlainText and dynamism. (Perhaps such techniques are industry-specific or app-specific such each industry will settle on the best match. I won't necessarily dispute that, I just don't want another zealot to stomp out my favorites without 100% proof.) -- top
A "zealot to stomp out my favorites ..." Huh? Have you ever had a favourite stamped out?
[Hmmm? Sounds like a fool's battle to me. You may believe you're somehow resisting bloat by fighting for PowerOfPlainText and dynamism. In truth, you are actually fighting to maintain waste and bloat by resisting OnceAndOnlyOnce factoring of structured data management into the space between utilities. The actual result of your favored camp's approach is kludgy, awkward, inefficient, and chaotic. The need for structure is part of the EssentialComplexity of computation, and forcing tools to use SimplySimplistic types (like strings or BLOBs) just forces programmers to 'hide' the structure inside the simpler type. This has the following consequences, all of which can readily be observed in our history and in present day:
- each tool ends up creating or at least integrating components to parse these simplistic types into structure then later serialize it for the next tool in the chain. Not only does this bloat the system (costing a great deal of memory and time); it also costs programmer time (to learn and integrate the necessary component) for every tool made.
- I don't know what you are mentally comparing here. Do you mean different standards for representation? A proliferation of standards, such as XML, CSV, etc. is an independent issue. If everyone wants to standardize on FlirtDataTextFormat, that would be great as far as I'm concerned. Further, for the most part each app or tool has different needs such that they are not trying to generate a big generic thing for everyone, but rather specific info for a specific consumer (app). If you want a central big-picture, then use a database. --top
- I literally mean programs integrating different program components (e.g. XML libraries) to parse and serialize these structures. This is not 'independent' of the proliferation of standards. If everyone has their own standard, and you need to 'share' with ten different tools (as a producer, consumer, or both), then you'll need to integrate ten different serialization and/or parsing components. Integrating these parsers and serializers is AccidentalComplexity. It is perfectly reasonable that you could tell the OperatingSystem to do something like 'open_file("filename", as_type:<type_descriptor>)'. Instead of having all these tools centralized and integrated OnceAndOnlyOnce at a place in the system everyone uses anyway, every individual tool needs to integrate them thereby introducing inefficiency, bloat, and a great deal of extra work for the programmers. It's almost a perfect example of SimplySimplistic. (And this has nothing at all to do with centralizing the data itself or databases.)
- I'd rather stay off the subject of file systems. If you wish to discuss it, be my guest, but I won't necessarily participate further.
- {If you wish to avoid the subject of file systems, in order to discuss CrossToolTypeAndObjectSharing -- emphasis on the sharing -- you must either be discussing DBMSes (which are subject to essentially the same type and object sharing issues as file systems) or you must be discussing other mechanisms like Berkeley sockets, HTTP-based systems, Web services, CORBA, .NET remoting, Java RMI, RPC, etc. Is that what you intend?}
- FileSystems are the common medium by which CrossToolTypeAndObjectSharing occurs. Even network links are designed to emulate the FIFOs found earlier in FileSystems. Saying you'd rather stay off the subject is like jumping into the ExtremeProgramming page and saying "I'd rather stay off the subject of YagNi and DTSTTCPW."
- because the hidden structure initially lacks conventions, a wide number of incompatible ad-hoc structures and formats will be created - e.g. a new input and output format for each tool... even each iteration of a tool. This has the effect of making it almost impossible to integrate tools. Even when programmers eventually realize they need standards, they'll just create a lot of incompatible standards (like HTML/XML/Yaml/JSON... though JSON is compatible with Yaml 1.2).
- Example? What force is stopping "conventions"?
- Examples: GCC error output, Unix configuration files, Microsoft Word file formats. No "force" is stopping "conventions". But, since you don't start with any, everyone makes their own. Eventually they start attempting to collaborate because they realize their tools need to integrate, and it's easier if not every tool has its own 'standard' (one that it might not even maintain from version to version). Unfortunately, the longer they waited the harder change becomes (due to coupling, explained below).
- Again, the existence or lack of conventions/standards is orthogonal to the issues of concern here.
- Again, your stated belief is incorrect. The existence or lack of conventions/standards is of significant concern for both CrossToolTypeAndObjectSharing and for the realities of how systems develop (organically vs. structured). Dismissing it as orthogonal is just naive.
- until standard components and serialization forms are in place (like XML/Yaml/JSON) tools in a workflow (such as a pipeline) will need to be designed to parse the output formats of the previous tools in the workflow, and (if not the final tool in the chain) will also need to be designed to output as the next tool in the workflow. This, not incidentally, 'couples' the tools that are used in common toolchains and consequently strongly resists switching tools to use the new input and output standards. This is in addition to inertia from TechnicalDebt sheer mass, especially since it is rare that any single group is capable of or interested in maintaining every tool in the toolchain. The end result of this is that you can't easily get rid of these "old" conventions and ad-hoc structured texts... not without hitting the big do-over button and essentially starting a new OS from scratch. This sort of coupling is common in Unix/Linux.
- Tools that interact with multiple other tools end up carrying and integrating parser and/or serialization components for each of those they interact with, essentially multiplying bloat and programmer effort.
- Example please.
- Example: Trivially, a script that reads a file and produces output formatted for to another tool will need to carry a parser for the former and a serializer for the latter (2 components). If someone later requests output to another tool or input from a different structure file (e.g. XML in addition to the original), then that same script will carry integration and bindings for a total 4 components (2 parsers, 2 serializers). This isn't that unusual; I've worked on several 'organically grown' applications that needed as many as 6-10 parsers and serializers each, and where more than half of the code-bulk and programming effort is just in integrating XML utilities and creating the ad-hoc parsers for the other ones.
- I am still not getting what you are trying to convey. The example is not specific enough to point out the flaws in assumptions. I don't have enough info to evaluate the decisions made by the alleged clowns who dumped a mess on you one raining day in November.
- The problem of working with multiple software tools is directly analogous to interacting with multiple hardware tools... see Martian Headsets: http://www.joelonsoftware.com/items/2008/03/17.html . It's a combinatorial problem. There are lots of pictures there, and Joel is much more skilled (and more determined) than I at dumbing stuff like this down.
- No, I mean specific to exchange formats. I am not proposing no standards and I am not proposing bad standards.
- Exchange formats are interfaces. Standards are just exchange formats that people agree upon and describe in an RFC somewhere. Why do you need something more specific? Are you looking for a DifferenceThatMakesNoDifference? What matters is that exchange formats have these problems, and that they are exacerbated (multiplying bloat and integration effort) when one tool interacts with several others.
- But unlike (modern) hardware, software has a high potential to 'adapt' automatically - e.g. open_as_type(file|stream|socket,type-descriptor). The OperatingSystem can 'know' several type-coercions (and have more added at need) and these would become OnceAndOnlyOnce relative to the application & tool layer (i.e. equivalent to have a universal-adaptor that can, at need, be taught to adapt new things) thus dramatically simplifying the applications and tools. Instead of each application integrating dozens of languages, they just integrate a few 'types' they use for communication. And this also has large numbers of other advantages: the OS can tell you which tools won't work together, you could create tools that -receive- the type-descriptor and adapt to it (for display/query/etc.), and the OS can optimize communications because one doesn't need to serialize to plain-text and back between each tool (indeed, given a common type system and shared-memory communications, it would be possible for tool to tool comms to be nearly as efficient as within-application communications. Of course, all this has as a prerequisite that the type and structure isn't opaque to the operating system.
- as the 'hidden' structure of SimplySimplistic types is opaque to the medium for these types (e.g. the OperatingSystem, the FileSystem, and the common tools to access these), it is impossible to create create utilities that can (by observing the type) automatically display, edit, access, implicitly type-coerce/convert, etc. this data in structured form. Fortunately for strings (but not BLOBs) you CAN take advantage of the PowerOfPlainText for display and editing and searching purposes. But even so, tools to edit the structure as text are (with some exceptions, such as e-macs modes) 'dumb' to the structure and offer limited additional utility (e.g. collapsing views, refactoring on structure, querying for every component with a certain pair of properties, removing or inserting whole 'items', data functions applied to files, justification and indenting, etc.).
- You are imagining something odd and weird, not something I do. Compiler heads won't be happy unless they compile the whole entire damned world first to make sure there are no detectable errors. However, nobody can move for 300 years until its finished. You don't need dynamism when nobody is allowed to move. --top
- Just because you don't imagine it doesn't make it "odd and weird" Top. And "compiler heads" are perfectly happy to just compile the programs that are presented to them, which they often can do in far less time than it takes to notice that a compile occurred. Admittedly, that includes command-line statements where each statement is essentially a micro-program. In any case, types are not about compilers. Data types is about knowing, i.e. reflecting, the properties of the data, and (closely related) data representation is about finding and accessing the important properties in a conveniently efficient manner. Compilers take advantage of types to achieve TypeSafety, but that is not at all the only advantage of types. In many ways, knowing the type and thereby having access to the underlying structure for purposes of queries/manipulations/display/editing/coercion/etc. is truly of greater value. You appeal often to the PowerOfPlainText because it's easy to view and edit - which it is ONLY because the 'type' - e.g. ASCII - is already known to the users or assumed/guessed by the tools. Typed data can be made even easier to view and edit and otherwise work with... doing so only requires that the tools that accept the file be also able to (reflect on) the type and structure of the file. --AnonymousDonor
- I don't want to hear another TypeSaftey?-cures-cancer-and-saves-puppies speech again. It's not an issue that we will settle here, probably never. --top
- That's fine; I've never made a TypeSafety-cures-cancer-and-saves-puppies speech before, and don't plan to make one in the future.
[In any case, you're free to 'fight' this if you wish... all you need to do is find solutions to the problems that types solve without using equivalent types. If you don't, your entire camp's resistance is hardly going to be felt relative to the massive inertia from such systems as Linux and Windows today and the
GoodEnough patchwork solutions (e.g. XML). We - and I can only guess, but I'd bet that most systems engineers and OS designers ARE in the camp opposite yours - will just keep on pushing in the direction we see the greater advantage... albeitly at a sedate pace.]
[Anyhow, you needn't fear; your "favorites" will be there in that brave new world of typed FileSystems. If you desire to zealously stick your structure inside 'plain text' and parse it back out, we won't coerce you. Instead, we'll coerce the type. Personally, I can't imagine why you'd want to (except for those circumstances where you're serializing across the network or to a tool not designed for the new FileSystem), but I wouldn't stop you from doing so.]
You seem to suggest I am for parsing. I am not. Ideally the "atoms" would be clearly delineated such that no parsing would be required. In practice there is no such standard yet. ODBC is probably something closer to what I have in mind, but it lacks some dynamism. Text is merely the low-hanging solution right now. --top
[Databases only help you because they already support the more complex structure. E.g. every relational database supports (at least) a map of sets of maps of strings... and even that 'minimal' relational database rarely avoids need to perform parsing (e.g. the moment it comes time to add integers together or check to see if one date is larger than another) and occasionally introduces a bunch of extra complexity when you attempt to avoid parsing (example: if you want tree-values or set-values in a column domain, you could represent by identifier reference to a table... but then equality testing and comparisons of tree-values becomes a royal pain, as does cleanup.]
- To a point. If you want good and implementable sharability, then a standard/convention that allows some kind of "data structure(s)" that separate atoms without concerns over parsing is helpful. It is my opinion that the "type" information should use the existing structure rather than add additional complexities to get it. For example, a data dictionary can be stored in a relational fashion, or at least as part of the database. The "cells" themselves don't need "types" separate from the existing structure. I suppose a standard could be set up where each cell can store type information (a "side flag" ;-), but if we don't want to hard-wire the requirement that each cell only have one type indicator, then the multiplicity starts to resemble either a messy graph or a database. Rather than put a type database-like-structure *inside* a data database-like-structure, it's best to keep them the "same thing" and dispense with a dark hidden type "underworld". If the type underworld looks 80% like the data-structure used for data values itself if we make it flexible enough, we might as well byte the bullet and consolidate them to avoid reinventing the wheel and/or making hard partitions that force arbitrary choices or paradigms (such as a forced data-versus-type dichotomy). --top
- [Hey, I'm all for getting rid of the dark hidden type "underworld" (aka the so-called plain text). I believe that structure and type should be readily exposed instead of buried in a string. And if by "keep them the 'same thing'" you mean to support data cells containing full databases (i.e. supporting a record of relations, as opposed to 'database-like-structure'), then I'm all for that, too. Indeed, that sort of SymmetryOfLanguage is a feature I find very appealing. However, that doesn't really get rid of need for types which, in such world, would be 'schema'.]
- I am not sure what you mean by "plain text" in this context.
- [I mean plain text. As in the stuff they talk about in PowerOfPlainText. As in strings that embed data. As in text that has dark, hidden, underworld structure. I'm curious: what ambiguous possibility came to your mind?]
- As far as nesting stuff in "cells", I find that a no-no. I'd rather use explicit (named) references than physical nesting. If you want to convert to hierarchies internally for the app, that's fine, but excess nesting is a pain for exchange purposes. Perhaps this is another relational-versus-navigational fight. And it brings more consistency if references are used throughout rather than a mix of nesting and references. Explicit references makes the accounting and trouble-shooting easier. For example, if there is cyclical nesting, it might blow the stack of the loader. A relational approach won't have this problem because any nesting is via references, which are not going to blow a stack during the initial load stage. Traversing a screwed-up tree is then a separate act and app-specific and can be dealt with at that level. It's a form of SeparateIoFromCalculation.
- [I find it "brings more consistency" if references are used to identify entities about which statements can be made (e.g. a particular book) rather than being used to identify both entities and complex domain values (e.g. a set value, a tree-value, or even a big-messy-cyclic-graph-value). This has nothing to do with relational-versus-navigational; this is about doing relational the way it was defined instead of using hacked, inconsistent approaches to domain values. If data exchange is a concern, that can be solved OnceAndOnlyOnce at the serialization and connectivity layer rather than bloating up the applications and queries with the logic to solve the problem.]
- I don't like the idea of trying to turn the exchange system into an artificial intelligence database or "grand world modeler" as a default or assumption behind it. That is a different goal animal.
- [That would be a different goal indeed... in fact, it's so different that it's off-topic. Nothing written above requires the exchange system (serialization and connectivity) become an AI or world modeler.]
- Relational is against nesting in my book. I don't know what brand of relational you assume, but embedding creates a dark underground second world, forcing the dichotomy issue again.
- [Your 'book' is probably a fantasy novel. Get a better one... DateAndDarwen's' perhaps, or maybe EfCodd's papers that describe the mathematical basis for relational, neither of which are "against nesting" (for domain values). It is true that it is often inappropriate to use complex types - in particular, when you're doing so in order to represent a conjunction of facts - but that is a problem to be solved by use of education and 'best practices', not by BondageAndDiscipline. You only end up forcing people to jump through hoops in order to use complex domain values: they're forced to embed them in 'plain text' or use artificial tables just for storing structure, either of which are horrible for queries and data management. I don't know which particular fantasy world your 'book' describes, but at a guess it's a SimplySimplistic one where EssentialComplexity can be reduced through sheer and stubborn ignorance of it.]
- Again, "plain text" is not an issue here that I see. What are examples of "artificial tables"? If we have some stuff that is embedded but some that is not embedded, then we have a forced dichotomy. If a "structure browser" can assist in converting a view of a low-level ("artificial"?) table to something that looks like an embedded table/object/type to you, the viewer, then we don't need to hard-code this view into the standard or elementary structure. The low-level table can still represent the thing. This allows the *view* to show you want to see rather than some arbitrary table-versus-embedded choice forced on one by the structure designer. You have not shown a need to have embedded versus non-embedded things in the absolute sense. I say, if the viewer can transform it to our pet view, then let it rather than the other way around. It's a cleaner underlying model. Don't make presentation issues become hard-coded or arbitrary design issues unless necessary. If we can find a way to have a uniform representation structure, let's not squander it because we want to think of it as embedded on Tuesdays. --top
- Top seems to contradict his words above in CantHideFromNulls: "The further you deviate from the user's view, the more effort/complexity/repetition to translate back and forth. It's that sample (almost). I'm not giving up simplicity for somebody's purity obsession. --top" [Adjusted quoting for clarification]
- My reply is in CantHideFromNulls.
- The RelationalModel admits both complex types and nested relations, on the basis that a given tuple/relation heading should be able to reference any type, including tuple and relation types. It is only SQL and other less-than-relational DBMSes and table-based systems that are mostly limited to primitive types in tables. Date and Darwen have clearly defined the semantics for nested relations and tuples, and are in the process of further refining the semantics of their type model. Implementations of these will end the need to create awkward and complex database schemas to represent typeful concepts. The net result will be simplicity, not complexity, because representing complex types through schema definitions and accessing them through complex queries is far more complex (almost by definition) than manipulating them through effective type systems. By way of analogy, manipulating complex types strictly via schemas is like manipulating integers and strings strictly via bit-level operations. While I can appreciate your notion that complex types could (apparently) be represented internally as tables in order to provide varying user views, I can't imagine why a user would want to see some alternate internal representation of (say) a complex number, polynomial expression, geometric shape, time, date, geographic location, or other commonly-used non-primitive type. However, alternate "views" or external user-space representations of types -- called "possreps" (for "possible representation") -- are defined within DateAndDarwensTypeSystem, but do not require any user exposure to their internal representation, nor would they benefit from it. You might wish to use possreps to define a geographical location that can either be expressed in terms of latitude & longitude or in terms of postcode; or represent a temperature as degrees Kelvin, Celsius or Fahrenheit, but in neither case do you need to know how these are represented internally. The semantics of a geographic location, temperature, complex number, polynomial expression, or whatever, can be entirely defined by its operations, without reference to its internal representation. If there is an apparent need to access the internal representation of a type (outside of the type definition itself), it is a limitation of the type definition and not a justification to deprecate complex types.
- [Nice analogies. This is a better written response than my own.]
- Thanks!
- Re: "because representing complex types through schema definitions and accessing them through complex queries is far more complex (almost by definition) than manipulating them through effective type systems. By way of analogy, manipulating complex types strictly via schemas is like manipulating integers and strings strictly via bit-level operations." - Hogwash! One can use wrappers or custom accessors to provide a custom view. But anyhow, its NOT the job of the exchange system to become on OOP/type language itself with complex TuringComplete operators. That is overkill. You are trying to force-merge your pet type system into a place it doesn't belong. Who needs a language when the exchange system is a language? It's like trying to turn reg-ex's into a complete language parser because somebody rational hadn't stopped you, saying "enough!". While it may be worth while to add such to a high-end RDBMS, which is what Date et al. were focusing on, its NOT for the exchange system/format. --top
- [Uhhh.... Top, he isn't talking about 'exchange system/format'. In case you've totally lost your bearings, we've been on the subject of Databases in past section ("ODBC is probably something closer to what I have in mind..." -> "Databases only help you because..." -> "As far as nesting stuff in "cells"..." -> "The RelationalModel admits both complex types and nested relations" -> "worth while to add such to a high-end RDBMS, (...) its NOT for the exchange system/format." -> "Uhhh.... Top, and he said it was for the exchange system/format where, exactly?"). As far as where those "complex TuringComplete operators" would apply: query and data manipulation expressions as part of the DML (to support whatever manipulations the domain demands). Exchange (e.g. serialization, connectivity, delivering a query result) is far simpler, and certainly doesn't require a highly complex language (not even for a system that supports a variety of composable types) though some extra complexity could be favored for optimizations (e.g. use of cursors, or pipelining promises to fetch larger values on demand).]
- If you've wandered off the topic of this wiki topic, then perhaps its time to create a different page.
- [And maybe it is. However, this topic is still of direct relevance to CrossToolTypeAndObjectSharing. Communication always happens in a medium, and it is perfectly reasonable to consider such common media as FileSystems and DataBases.]
- If file systems and database systems are off-topic, what is on topic?
- [And the need to produce tons of wrappers and custom accessors and custom views seems to be AccidentalComplexity from trying to avoid support for more complex structured values. It smells like AddingEpicycles.]
- Indeed. I find Top's argument here almost completely baffling. The overhead of learning a user-defined-type syntax is almost invariably lower than the difficulty and error-proneness of writing and using kludgey type-checks on strings, tables, or whatever, throughout a system. Of course you can use "wrappers or custom accessors" to hand-craft your own types out of language primitives -- with tables, strings, integers, booleans or whatevers as the internal representation -- but why would you want to go to all that work? Wouldn't you rather have a clean syntax that makes the process easier?
- You keep saying something is bad and kludgey, but are not clear on what exactly is bad and kludgey. Until you come up with a clean gravity formula/model like Newton did, Epicycles are the best game in town. (And actually are a form of regression, which is a useful technique.)
- Let's take a trivial example: If I define a type to represent a complex number, e.g., "Complex", I can define values as Complex(1, 2), Complex(1, 4), etc. If I need, for example, pairs of complex numbers in a database, I might define a table (using TutorialDee syntax) as "VAR mytable REAL RELATION {Comp1 Complex, Comp2 Complex} KEY {Comp1}". I can now easily perform queries, e.g: "mytable WHERE Comp1 = Comp2" or "mytable WHERE Comp1 = Complex(1, 5)" and so on. If, however, I cannot define a new type to represent Complex, then I'm forced to define my table as "VAR mytable REAL RELATION {Comp1Real RATIONAL, Comp1Imaginary RATIONAL, Comp2Real RATIONAL, Comp2Imaginary RATIONAL} KEY {Comp1Real, Comp1Imaginary}". Furthermore, my queries become more complex. E.g., "mytable WHERE Comp1Real=Comp2Real AND Comp1Imaginary=Comp2Imaginary", or "mytable WHERE Comp1Real=1 AND Comp1Imaginary=5". And so on. Which seems more kludgey to you? A complex number is a very simple type; more complex types merely expand the difficulty shown above, complicate the implementation of the queries, obfuscate the intent of the queries, and potentially duplicate functionality throughout the system that could otherwise be represented OnceAndOnlyOnce in a type definition. Can you imagine what these examples would look like if the type was a tree, or other complicated structure? Using a type, they wouldn't change significantly -- only the value selector would be different. Maybe replace "Complex(1, 2)" with "Tree(Node(1, Node(Node(2, 4), 3)))", but the rest of the code would be essentially unchanged. I'm not going to attempt a Tree example on tables consisting only of primitive canonical types -- life is too short.
- Reply at Page-Anchor Compound_Element_Types.
- (RE top) [When I say 'artificial table' in a DBMS, I mean a table that doesn't store any data. As an example, in MySQL there are tables dedicated to storing and managing LARGE_TEXT, essentially using garbage-collected references to values under the hood (~2000 bytes per row, IIRC). But at least for those tables the system is providing support. A need to maintain an 'artificial' table above the hood (as is demanded when representing trees or graphs by use of references) is very smelly - essentially a LanguageSmell caused by poor design decisions in the DML or DBMS implementation. Doing so forces users of the DBMS to essentially perform garbage-collection (by hand) and also blows up queries to even do something as simple as a join on the complex value. As a general test: the need to use references to values (which, by nature, possess intrinsic identity) is a serious LanguageSmell - be it in a DML or elsewhere.]
- MySql's internal tables are an internal issue caused by internal chunkification needed for efficiency. I don't see how that is relevant to what the exchange format user sees.
- [Having them under the hood is fine. And the reason it is 'relevant' is because you asked for an example of an 'artificial table', and it's the first one that came to mind that was readily recognizable.]
- [And "plain text" was raised as a valid issue. All sorts of structure can be 'buried' in plain text, and doing so is essentially one of the ways people will 'dodge' those BondageAndDiscipline limitations you naively believe are saving people from themselves. For escaping these SimplySimplistic limitations of which you're so foolishly fond, one has at least two options: (1) plain-text (structure in a string), (2) explicit artificial tables (using references to values). Both of them are horrible.]
- You still need to be more specific. I know that one "shouldn't do bad things with text", but that is not very usable as is.
- [Specific reasons why it is a 'bad thing' to put structure into text: (a) It's opaque; that is, nobody else knows about the structure. (b) Attempting to tell them about the structure for purposes of validation or attribute access requires some formal language in which to describe it... aka 'type-descriptors'. (c) If you need to tell with each query, you violate OnceAndOnlyOnce (and also introduce inefficiencies). (d) Since the structure is undeclared, it is not possible to create generic programs that can examine the type-descriptor to usefully display/edit/create/query any structure. (e) Notice that by using plain text you didn't actually avoid the need for types; all you gained was creating extra work for yourself and everyone else, as is common for most SimplySimplistic approaches.]
- You really need to provide specific examples. I disagree with your generalities as stated.
- [Disagreeing with a statement merely because you don't like how it is stated is fallacy. If you disagree, you should provide valid reasons for it. And I've named plenty of examples of structure into text (XML, Yaml, JSON, CSV, HTML) and you've seen plenty of others ("((tree,values),'in a string')", "(555) 867-5309", etc.). The reason I state things as generalities is because they apply generally, so use that brain of yours and find counter-examples if you feel I'm being over-general - that, at least, would be logical.]
- [RE: "I say, if the viewer can transform it to our pet view, then let rather than the other way around. It's a cleaner underlying model." - I say a 'cleaner model' needs to be 'clean' for far more than just 'views'. Your choice of a 'cleaner' model is seriously smelly when comes time to do data manipulation and ad-hoc queries.]
- By what measure of smelly?
- [Among many others (a) inefficiencies, (b) inconsistencies with other values, (c) 'artificial tables' make for extremely difficult queries, esp. for equality and update over parts of the complex value. Even equality between tree-values is a pain, and it's among the easier of them. (d) all the normal smells with working with representation instead of the thing represented - a lot like working with integers at the bit level.]
- Slogans, no details found.
- [Those are details; you asked for measures of smelly, and I provided a detail list. If you want more specific details, ask more specific questions.]
- Further, if you mix references and nesting, then it may not match what the app needs. The app may want references for parts you nested, and nesting for parts you referenced. If everything is references, then that arbitrary choice is removed and one just uses standard/generic/typical reference-to-tree converters if one wants an internal tree. It's a form of standardization of representation. Standards are usually simpler if you remove unnecessary artificial overlapping choices from them.
- [If you want to allow applications to request references to values (which by nature possess intrinsic identity) then that mechanism should be consistent across ALL values, even integers and strings, as part of the query language. I can see how such a decision would be useful in optimizing data transfer, and could even be combined with connectivity to later fetch the full value by its reference. But the system should be consistent about how such features are applied.]
['Parsing' is gathering structure from a simpler type. If you are to avoid parsing, then the data must be presented to you (when you receive it) as the more complex type. I don't think you can have it both ways: you can't avoid both types and parsing unless your domain is full of very simple problems. Attempting to avoid both parsing and types is a
PipeDream. In many domains such
SimplySimplistic notions will bite you in the ass and tear a few chunks out. Anyhow, going back to the
FileSystem: many people think the
FileSystem should be replaced with a
DataBase. I don't really disagree. But
either way, you'll be: (a) doing parsing (integrating parsers/serializers with every tool) OR (b) doing types (inside those atoms & across communications) OR (c) both OR (d) finding yourself a job that doesn't involve programming. Your choice.]
Sounds like we may be on the verge of LaynesLaw-ing on "types" again.
[Not likely. We aren't engaged in a philosophical battle over definitions. Neither of us believes that being the one to define 'WhatAreTypes' would be significant to the argument. The notion of supporting values of declared structure in filesytems and databases is well distinguished from the notion of supporting nothing more complicated than strings.]
Strawman: "just strings". That is misleading. Validation techniques and other attributes-about-values are perfectly possible, and without adding complicated syntax/requirements, but rather ReUse? of the existing schema system for consistency and concept conservation.
- What is "concept conservation"?
- [My guess (NearestFittingContext) is that he was referring in general to avoiding LanguageIdiomClutter and keeping the set of language primitives to a minimum (these 'primitives' being 'concepts' about which you must be 'conservative'... as opposed to the more obvious interpretation of conserving concepts so they don't get lost (e.g. via keeping a concept database... or never DeleteWiki)). A laudable goal avoiding clutter may be, but Top's designs toward achieving it (such as resisting support for types in databases) are too often simplistic and introduce other LanguageSmells (e.g. violations of OnceAndOnlyOnce essentially duplicating the equality operator for the artificial 'Complex' type the moment it is used in two or three different join-queries).]
[I didn't say "just strings". And I agree, validation techniques
can be applied... but there is a critical point to make: you need to validate
against something that tells the validator what a 'valid' string looks like. That 'something' has all the necessary properties to be a type-descriptor. The ability to perform validation requires types. And you won't magically escape the need for the normal requirements of types in order to use such techniques... though I don't believe "complicated syntax" is a prerequisite for using types - that's just you being a type pessimist.]
Re: "The ability to perform validation requires types." - That's a rather bold claim. Care to justify it in another topic? --top
PageAnchor: Compound_Element_Types
Regarding the complex-number issue, I will agree there is somewhat of a need for "compound element" types/values; and that using ID's to reference and represent such compound values as tables can be a bit obnoxious. Let's see if we can work something out. For the sake of representation, let's represent them with curly braces. Ex: {123, "foo"}. The individual elements (let's call them sub-elements) can then be specified and constrained just like "regular" types/columns. (Later I may work up an example of how these could be specified in a standard way using FlirtDataTextFormat.)
Where the exchange format/system ends it obligation, however, is using expressions and/or a TuringComplete "engine" to define operators or validations. (One can add it as an extension or add-on, but should not be a core standard.) There should only be basic optional validation such as range, size, required status, and perhaps basic character sets.
--top
[You've recognized that rejecting structure leads to much kludge and OnceAndOnlyOnce violations, but aiming your solution at such a trivial example as complex numbers has produced a simplistic half-measure. Composite 'flat' types as you propose do extend the set of domains in which a Relational Database can conveniently be applied, but said solution would still end up repeatedly punishing everyone who needs deep-structured values.]
Example?
My Complex number was intended as a mere starting point, a trivial example that simply illustrated the kind of problems inherent in exposing the internal representation of a type via tables in a schema. What about types represented as a tree, graph, or lattice? What about types that require an internal representation involving varying numbers of elements, varying types of elements, or varying relationships between the elements on a value-by-value basis? What about complex (possibly procedural) constraints? What about distinct types that share a common internal representation, but that are distinguished by their operators and/or constraints?
Data exchange formats are not a problem -- we already have those, e.g., XML, YAML, CSV, etc. and assorted ad-hoc representations. The problem lies in accurately reconstructing type definitions at the various communication end-points of a system. This is not solved by simplistic data exchange formats. As noted above, a type may be more than just the data structure used to represent a given value. This is only solved by creating type definition standards that (a) do not lose any information about the type definition when they are shared; and (b) do not unecessarily expose users to internal representations that may (quite appropriately) vary depending on their location in a system.
- Re: "The problem lies in accurately reconstructing type definitions at the various communication end-points of a system." - You have to remember that different kinds of languages and tools will be using such info. You seem to be suggesting that the only way to achieve uniformity is to make the exchange system *be* an application language itself (with involved types and operators), or something very close. That's an Apple view of operating systems: if you control the hardware, then you don't have to worry about OS compatibility: its all "integrated". While it is an option to consider, it is far beyond the goal of "sharing", per title. I smell scope creep. --top
- The only way to achieve full type sharing among various languages and tools would be to use a portable, standardised type-definition language. Otherwise, crucial information about type definitions will be lost when transported from system to system. Remember, a type is not just its internal data structure! The SQL standard, CORBA's IDL and various XML schemata are already a step in that direction -- but only in terms of achieving cross-system portability for some (largely) non type-portability purpose. I see considerable value in creating a sublanguage specifically for defining and sharing types, and creating libraries to parse and manipulate types -- much like the (e.g.) general-purpose XML libraries that are nearly ubiquitous now. However, I have no idea what this has to do with "an Apple view of operating systems." It's far closer in concept to existing portable standards (for other purposes) like HTML, XML, IDL, SQL, CSS, and so forth. As such, it is hardly "far beyond the goal of 'sharing'". Rather than "scope creep," it is precisely about completely implementing type-sharing without any loss of crucial information. Isn't that exactly what this page is about?
[Good points, as usual. But Top, I'm discovering, is incapable of abstract thought or he'd already know all of this from prior discussion. After all, you had even named types represented as trees in the very same paragraph he was responding to, and yet he can't even think of it; he needs an 'Example'.]
[Top, consider ordered tree-values used as primary keys to a column in that same situation where complex numbers were used. What I want to be able to do is say: "myTable WHERE tree1 = tree2" or "myTable WHERE tree1 = Tree(Node(1, Node(Node(2, 4), 3))" or "INSERT INTO myTable(tree1,tree2) VALUES (Tree(Node(7,22,Node(11))), Tree(1,Node(2,4),3))". I also want access to component operators: "myTable WHERE contains_pattern(tree1,Node(*,Node(2,*),3))". What I want to be able to do is define trees and their operators OnceAndOnlyOnce so that I don't need to build them into each query. However, in a DBMS not supporting types, my options are limited. One way to obtain what I want is to put every tree-value into a string & use parsing for every operator; this is a solution with plenty of its own problems (esp. when it comes to operators over structure), but it is not the solution that you'd promote. What you promote (based on discussion here and how you've responded to past challenges) is more analogous to breaking down 'Complex' into two columns is essentially creating a separate 'Nodes' table... perhaps:
TABLE Nodes TABLE Nodes
-------------------- -------------------
ID Integer Autonum ID Integer Autonum
Value1 Integer Type Char // 'N' for Node; 'L' for leaf
Value2 Integer OR IDParent Integer // parent node
Value3 Integer Position Integer // ordered position in parent (for ordered trees)
Value1Type Char // e.g. 'N' for node Value Integer // for leaf
Value2Type Char // 'I' for integer PrimaryKey (ID)
Value3Type Char // 'x' for unused Unique(IDParent,Position)
PrimaryKey(ID)
[Each of these solutions has disadvantages. The one on the left requires building the tree procedurally from the bottom up (so you have the Node IDs for Value1, Value2, Value3), while the version on the right requires a top-down approach (so you have the IDParent). The version on the left has severely limited count of nodes (just three), but potentially allows for sharing structure (e.g. Node(1,2,3) could be shared among many trees). The version on the right will generally require a unique structure for each tree (since the structure must be unique to IDParent). But enough about the trees. In this case, you'd use:
TABLE myTable
---------------
tree1 Integer
tree2 Integer
ForeignKey(tree1 into Nodes)
ForeignKey(tree2 into Nodes)
PrimaryKey( ... actual tree value of tree1 ... ?)
[Hmmm... I've already encountered a problem. I can't find a reasonable way to state my P
rimaryKey as being unique based on the tree structure rather than on the tree identifier. That's somewhat upsetting, but since I know that Top doesn't give a damn about 'protection' like that, I'll let it slide for now. What matters to me is
OnceAndOnlyOnce,
SeparateIoFromCalculation, and otherwise avoiding kludge. 'Pet Views' aren't the only thing that matters to me; just as important are difficulty of forming queries to insert, update, delete, and request data based on the tree-value, and the ability to include the 'Tree' type as a foreign key into other tables (should the need arise).]
[Top, take any one of the tasks I wish to perform - e.g. "myTable WHERE tree1 = Tree(Node(1, Node(Node(2, 4), 3))" - and find a solution that is just as elegant and reusable as this typeful one, except where your solution uses a Nodes table instead of 'Tree' typed cells. You can make your own Nodes table if you wish. Can you do this?]
Are we talking about exchange systems or RDBMS? I have no problems with the idea of adding traversal and graph-node operations to a relational query engine. I've even proposed some SMEQL-like operations of my own.
This section is specifically about representing types in table/relation/relvar-based systems, which certainly applies to DBMSes but could equally apply to other systems as well. The underlying concept -- the undue complexity forced by lack of appropriate user-defined type support -- is applicable to any language. While adding traversal operators and graph-node operations to a relational query engine would certainly improve the query engine, it does not completely address the issues that are introduced by a lack of true type-definition facilities. Merely having (tree?) traversal and graph-node operators does not address the issue of, for example, a type that requires a large number of varying elements that happen not to be a tree or graph. Only true type-definition facilities can eliminate (for example) the complexity of duplicating type operations (such as testing for equality) in every query against a collection of table attributes (or even whole tables) that represent a single type.
[Apparently we wrote responses at the same time again. Yours is as thorough and accurate as any I could offer, and I agree with your statements. Mine, below, is also to Top.]
[Both RDBMS and Exchange, Top. Look carefully at the example: sending 'Tree(Node(1, Node(Node(2,4), 3))' as part of an 'insert' is exchange. Using it for the query is more relevant to the RDBMS. I don't believe the two can be so easily divided as you seem believe they can.]
[I don't mind if you add traversal and graph-node operations to a relational query engine (after all, I'm asking for solutions)... BUT I feel like you're just assuming (cue visual of Top waving hands) that such a utility would somehow save the day and make everything better. I don't believe it. My intuition is that actually trying to use these graph-node operations to perform a simple equality operation will still be kludgy, will still violate OnceAndOnlyOnce across queries (repeated syntax), and will probably be difficult to share or optimize, too. Please show me how it would help out, since you believe it would. You like concrete examples, and I offered you concrete example problems. Can you show me how this solution you suggest solves them?]
This appears to be a case where the domain needs a tree-oriented query language to communicate tree updates between them. It also smells like a "lab example", i.e. somewhat artificial. Yes, I do want an example, meaning something realistic from the real world. If that makes me bad somehow, so be it. I am bad person who wants an example to test the practicality of this.
[Tree and graph values are useful for representing composite identifiers in almost any system (especially if you want to follow the ZeroOneInfinity rule), feature associations for a fuzzy memory engine, problem-solution pairs for memoizations, pattern-transforms for data-driven optimizers, component-features and functions in an RDBMS-based GUI engine, and so on. If you were to consider whichever forces introduce a 'complex number' problem, you'd quickly see that vectors and matrices and sets are all useful types for comparisons. Trees in particular are just one instance of the more general problem you're seeking to ignore by calling it a "lab example".]
[Why don't you tell me more about how this "tree-oriented query language" of yours that will magically solve all the problems? I've given the subject some thought myself, and while I believe traversal operators are a fine idea, I'm still convinced they leave unsolved the problem presented above. Why don't you show me your version of "myTable WHERE tree1 = Tree(Node(1, Node(Node(2, 4), 3))" using this "tree-oriented query language" of yours?]
In practice, there may be some tree-ness to a given app-to-app transfer, but it may be specific enough that we don't need a general-purpose tree-query-language and can make a little domain-specific sub-language (or sub-convention) to do a somewhat specific update/transfer. For example, updating some branches in file folder directories usually does not need a general-purpose tree query language. A dummy file name generator plus a DOS script generator could do the trick. Tree-oriented query languages have been re-invented multiple times, but have never really caught on because the need is not that common. I've seen a lot of biz apps over the years; for I've been in the industry since the mid 80's, before PC's took over as the domonent biz platform. [This topic is TooBigToEdit, my spailchecker stopped working on it.] --top
[Yes, I know all about how you'd prefer to repeatedly create a complex system of scripts and applications to solve problems that could be solved OnceAndOnlyOnce in the DBMS. Hey, I've got an idea in the same vein: we don't really need a DBMS... let's just use flat files to store the data and DOS batchfiles to update them! Never mind that we'll need to re-invent this solution multiple times. And people like DosMind will be there to fight tooth and nail to prevent more general solutions from entering the field saying things like: "show me an application that needs a generic solution and can't get by with batchfiles and flat files!" and "it is simpler to transport flat files around; we really need to keep exchange systems simple!"]
If you have a general solution to a common problem, then show about 5 to 10 realistic examples that demonstrate it is common, and then present the solution. You are only claiming it, not showing it. Claims are the easy part. As far as showing up DOS-only advocates, I'd have them cross-reference (join) a million records, and their solution would either be too slow or take up lots more code than SQL. I wouldn't need indirect round-about mumbo-jumbo justification, but rather speed or code volume would be there for them to actually witness with their own eyes. (If they question the need to join a million, I'd give them actual scenarios from my time as a marketing research query writer for a cable company with a million+ customers.) If by chance they find a way to get DOS to do such easy and fast, I'll pat them on the back and say, "Well done. If you personally like it, go with it." --top
[And what I'd do is ask them how good their transaction support happens to be. In any case, why don't you handle just the problem in front of you before asking that I cook up 5 to 10 more.]
You haven't presented a realistic non-sys-soft domain context. It's a text-book puzzle at this point.
[I do not need to do so. We have goals that communications and storage media utilities like FileSystems, OperatingSystems, and DatabaseManagementSystem?s be domain generic while avoiding LanguageIdiomClutter. In this context, we also wish to provide as much CrossToolTypeAndObjectSharing as possible for both manipulation and IO. Given these three goals, a general-purpose feature like a standardized TypeSystem that can be used to obtain the desirable effects in different domains is inherently better than producing a ton of domain-specific solutions that cannot interact (or be shared) because they are modular and unaware of one another. Because of this, I do not need to prove the system better in every domain; I only need to have reason to believe that it is no worse than the existing solutions in most domains while providing useful features in at least one domain. Proving no worse is trivial by simply ensuring the types in use by existing solutions (e.g. Blobs, strings, dates, integers, etc.) are available. Proving the useful features in at least one domain is the reason that the tree-example is provided.]
[In any case, if you're like most people you should be able to learn a lot by actually working through a "text-book puzzle". I'd appreciate it if you spent at least half as much effort mentally applying yourself to the example as you do in seeking excuses to avoid it.]
I'm a practical guy and I don't think this is a fault. Often times "lab-toy" examples exaggerate the need or usefulness of certain techniques by making a series of unrealistic assumptions. They can be fun to play with, but I tend to focus on the practicality of things more as I get older. There is plenty of work to do to discover better practical tools such that I don't need to seek out artificial problems to keep my curiosity satisfied. I've seen volumes of dusty IT research journals at my local university, and was appalled at the money and time being wasted on silly lab toys. (Perhaps 1 in 5,000 will result in the next big breakthru, but I'm not here to play lottery.) Thus, I'd like a realistic industry scenario before I consider this worthy of a practitioner's time. The academic puzzle-lovers can thus take over at this spot. -t
As someone who works in academia, I'll be the first to admit that many published papers are rubbish, and that a significant proportion of "research" (I use the term loosely) is little more than a way to retain employment and/or gain promotion and/or avoid teaching and/or obtain funding for office & lab toys. Yet, without the 5000 efforts, there wouldn't even be one big breakthrough.
- I suspect that a lot of things we use would have eventually been "discovered" organically. Even IBM-card processing machines kind of resembled relational and DB operations such that one machine may filter, another machine group and sum, another join, another union, another sort (sort and join may have been using the same machine), etc. But without a parallel world to test on, it's only speculation for either party. It's my opinion that acedemics over-emphasis their importance in IT. Perhaps it's human nature to magnify our role, regardless of what it is. -t
- [I suspect a lot of IT people think like you do and would encourage technologies to stagnate, every single one of them being unwilling to spend the money to try something new until someone else proves it is better. I can't decide whether they're simply stingy or they're afraid of learning new things. Fortunately, the managers above the IT guys have the exact opposite interest... they need to make changes in order to justify their own existence. This sometimes causes industry to fall for a silly fad, but it does prevent stagnation.]
By the way, realistic industry scenarios were given above -- e.g., "feature associations for a fuzzy memory engine, problem-solution pairs for memoizations, pattern-transforms for data-driven optimizers, component-features and functions in an RDBMS-based GUI engine [...]" Unfortunately, I think the problem is that these aren't realistic scenarios for the industry you work in (or you don't recognise that they are), so the impact is entirely lost on you. Fortunately, it isn't lost on those of us for whom such scenarios are relevant and cogent.
- That may be the case. A "generic" cross-industry sharing device may be elusive if different domains have different needs. Otherwise, I hope someone finds a "typical" biz scenario to illustrate their allegedly generic Share-A-Tron. -t
- [The tools you have today are optimized for "typical" biz scenarios. They simply don't support sharing across the enormous range of other domains, and often fail to support 'atypical' biz scenarios.]
That being said, let's digress from the topic of "sharing standards" and continue at
RelationalTreesAndGraphsDiscussion.
I'll provide the requested five (actually, six) examples, as commonly-used business-oriented types. Imagine two equivalent DBMS systems. System A provides user-defined type definition support. System B does not. Assume that for each of the commonly-used, business-oriented types that I will list below, I've already used System A to define the types for you. Assume that values of each type can be selected via a typical operator-invocation or function-invocation syntax. E.g., to instantiate a value of type Money as $12.34USD, I would use Money(12.34, "USD"). Assume equality and ordinality test operators are provided, so Money(12.12, "USD") = Money(12.12, "USD") returns true, Money(14.13, "USD") = Money(12.12, "USD") returns false, Money(15.01, "USD") > Money(15.00, "USD") returns true, and so on. Issues of currency conversion and the like may be ignored for the sake of this illustration. Here are the types:
Now let's assume the existence of the following RelVar (think "table", if you like) in System A:
VAR myvar REAL RELATION {attr1 Money, attr2 Date, attr3 Time, attr4 GeographicLocation, attr5 ComplexNumber};
Let's assume we wish to perform the following queries, in TutorialDee syntax:
INSERT myvar RELATION {TUPLE {attr1 Money(15.23, "CDN"), attr2 Date("12 Jan 2003"), attr3 Time("23:13"), attr4 GeographicLocation(55, 130), attr5 ComplexNumber(3, 2)}};
myvar WHERE attr1 = Money(12.34, "USD")
AND attr2 >= Date("Last Tuesday")
AND attr3 = Time("12:22PM")
AND Miles(Distance(attr4, GeographicLocation(57, 63))) < 4
AND ComplexNumber(5, 6) = attr5 * ComplexNumber(3, 4)
Assume an operator Distance(x, y) that returns a value of type Distance that represents the distance between two GeographicLocationS x and y, and an operator Miles(d) that converts a Distance d to an integer number of miles. Assume the '*' operator has been appropriately overridden for ComplexNumber.
Now, create equivalent or simpler queries using System B that are as expressive and intuitive as the above. You may presume any representation you like. Pay particular attention to attr2 and attr5...
Once you've done that, we'll proceed with the discussion.
Are we talking about a data exchange format/standard or a query language? If your needs require a query language instead of mere data exchange, then I am not against user-defined types in a query or DB system (see DoesRelationalRequireTypes). Data exchange and query service are two different issues as far as I can see. You seem to be mixing up the two. If the implication is that query languages can replace data exchange formats; well, that's a different issue that we can addrress separately. --top
The two are not as separable as you appear to claim. How does the query get to the DBMS in the first place? How do the results get back to the client?
FlirtDataTextFormat with Compound_Element_Types. Done. Now we can go home.
- [Does FlirtDataTextFormat include support to query the DBMS in the first place?]
- FLIRT is not a query language. I suppose it could be used to transfer a query, but it would just be passing the query text from point-A to point-B as-is. Thus, the issue of overloading the asterisk above is not its concern. The scope does not include defining operators (on types), query languages, or TuringComplete languages.
- Actually, TutorialDee is a TuringComplete language, but that can be disregarded, if you like, for the sake of this exercise. I'm not sure why you're mentioning the overidden asterisk in this context, and the data transfer format is completely irrelevant. This exercise is about types and type representations when using a query language, as DBMSes are commonly used to facilitate data sharing.
- I agree that using a query language is one way to facilitate information exchange. However, I didn't come here to talk about query languages. We can leave that to topics about query languages since the issues don't appear to be specific to data-exchange. I've already agreed that user-defined "domain maths" should be addable to relational engines, at least one trying to target more domains. (But IMO a functional style add-on would be preferable to an OO style.) --top
- Hmmm... I would have thought your fondness for database-oriented solutions would make this example particularly appealing.
- Maybe in a different context. I'd have to switch mental hats to think about query language design.
- But... This isn't about query language design. It's about implementing types in schemas vs using a type system.
- Please elaborate on "implementing". We need to clarify the scope here before we end up on a wild goose chase.
- ["Implementing types in schemas" means using extra columns or even whole auxiliary tables to support domain types. For example, if you need 'tree'-structured values, you'll create auxiliary tables for the node structure. Comparatively, "using a type system" would refer to simply supporting structured tree values as a column/attribute domain, thus avoiding the need for auxiliary tables.]
- [We're focusing on DBMS in particular because it's an easy place to show you how your decision affects upstream and downstream users. Unfortunately, you've not paid attention to these effects (and have naively been calling them 'orthogonal') because you are too fixated on transport representation (which is a solved problem, even for complex structured values. XML and YAML are readily available, among many other possibilities.) We have been attempting to open your eyes to some of the AccidentalDifficulty your SimplySimplistic approaches inflict upon others. But you don't care... you've an agenda to defend your pet FLIRT format for transport representation, and FLIRT can't handle anything more complex than 'Compound_Element_Type'. So you put on your blinders and claim this AccidentalDifficulty to be somebody else's problem. And you're right, it is somebody else's problem: the DBMS customers', and that of anyone interested in CrossToolTypeAndObjectSharing.]
- [I notice that you balked at the idea of FLIRT usefully carrying a query. Do you seriously believe that with real type sharing we'd be delivering flat query strings? Well, we probably would be... but we'd also, for certain, support semantic structured query values. Why? Because it would be the natural thing to do in a CrossToolTypeAndObjectSharing world, because it would offer greater performance, reduced complexity, and DBMS and LINQ integration by avoidance of parsing and serialization, better support for application-side query optimization and debugging, greater security through reduction of ambiguity, and so on. Even this is a solved problem; it would be trivial to describe queries in YAML or XML if RDBMSs were willing to accept it that way.]
- Something like modified ODBC may be closer to what you seem to be envisioning. I don't know what the ODBC standard provides in terms of custom types. If you are interested, then I'd suggest obtaining a copy of the ODBC standard and see if it can be modified if it does not support such.
- [Yes, if CrossToolTypeAndObjectSharing is to be solved then things like ODBC (and their associated libraries) would, of course, be modified to accommodate. For transport-layer solutions, it could be as simple as merging in YAML (which is already designed to work with languages), and perhaps supporting a few optimization paths (e.g. aliased values that can be pipelined in later to avoid passing in deep structure when it isn't immediately necessary). But this isn't so much "what I'm envisioning" as it is a simple matter of course; it isn't as though I was imagining no changes are needed. 'CrossToolTypeAndObjectSharing' implies a lot more than transport. Even SQL would need updated to recognize new operators, schema types, etc. And the DBMSs would need to be updated in order to optimize storage of structured types (they can use auxiliary tables under the hood). Etc. Modified ODBC would just be part of the transport protocol, and doesn't solve the problem on its own.]
- That's why I suggest that type-ness and object-ness makes sharing more difficult.
- They may indeed, if your sole focus is on, say, getting data from point A to point B so you can generate a printed report. However, taking the requirements of an entire system into account, by pushing down complexity in your area (such as by using only strings), you may be pushing it up in every other area -- such as at the point of data entry validation, where processing is needed, and when data needs to be stored and retrieved -- when the raw strings need to be converted to and from some computable form, such as a date, a complex number, or a tree.
- That might be true, but complicated standards are also an impediment to sharing. Rather than invent a complicated standard that tries to anticipate everything up-front, perhaps something like XML would be more appropriate if you want to share info on complex types. It lets you define your own atoms in your own way more or less. And I did not propose that "just strings" be used. That is an exaggeration.
- No intent to exaggerate; "just strings" is an example, which is why I wrote "such as by using only strings" instead of "by using only strings". I'm not sure how "complicated standards" would be an impediment to sharing if values are appropriately (and hopefully invisibly) marshalled and unmarshalled at the endpoints of communication, as is done with, say, CORBA. Furthermore, I would expect a language standard for transmitting values and types to be simpler, due to its specificity, than some generic XML solution. The TutorialDee syntax for DateAndDarwensTypeSystem, for example, is considerably easier to read (by humans) than its XML equivalent.
- The idea I had in mind with FLIRT is DAG-based type definitions, with the compound feature I mentioned, where the final leaves must be one the standard base types: string, num, int, data/time. It would not define operators nor a query language, at least not as part of the standard. I'll post a small example soon.
- That seems reasonably appropriate for sharing values, but how would type definitions be represented if you do not support operators and/or constraints? Given a FLIRT data file about which we have no prior knowledge or external metadata, how, for example, would you distinguish a complex number with an internal representation of (r float, i float) from a geographic coordinate represented by (x float, y float)? If this page is about type and object sharing, I see nothing yet that would support either type or object definitions; so far, FLIRT only appears to support internal representations of values.
- I am not sure what you mean by "distinguish". Are you proposing a universal official repository of types? Otherwise, if one shop calls the type "complex" and another "cmplx", I know of no magic way to say they are the same thing other than a guess-a-tron.
- [In this particular case, one could simply apply semantics-bearing tags to the values accessible for pattern-based function dispatching. So geographic coordinates become: coordinate:(float,float) while complex numbers become complex:(float,float). An operator like + might be overloaded (complex:(a,b) + complex:(c,d) = complex:(a+c,b+d)) but for coordinates it might not make any sense... perhaps one adds a heading and distance to a coordinate in order to get another coordinate. In a column that only accepted 'complex' values, the complex tag could be stored in zero bits... so storage issues are not a concern. However, things are a bit more complicated if you decide you want something more complex than semantic-representations, such as sets (need to ignore ordering) or cyclic values (e.g. a tuple that can contain itself), etc. This is discussed below somewhat under page anchor can_of_worms.]
- Defining operators should be outside the scope of this in my opinion. For one, some languages may not use "+" or may not allow overloading of it. I will limit my participation to excluding definitions of operators. If you are going to make the transfer protocol into a language, you might as well use the app language to begin with as the transfer protocol. Why go 90% there and then stop?
- [You misunderstand. The point above is not about "defining operators". It is about ensuring mechanisms exist to avoid loss of semantic information that happens to be critical for dispatching to the correct operators.]
- [That said, functions are values too. In my opinion, the ability to transport and store (and type-validate) function-values is perfectly well within the scope of this discussion.]
- Well, we'll have to AgreeToDisagree. It's my opinion you are over-engineering this thing.
- Over-engineered trumps under-capable. For any given mechanism, it's better to have facilities you don't need than to require capabilities you don't have.
- I disagree for a standard. Complicated and convoluted standards are less likely to be accepted.
- Incomplete and inadequate standards are unlikely to be accepted either. There may have been a time when a new CrossToolTypeAndObjectSharing standard could have become accepted despite only admitting primitive and simple composite type values without constraints, type definitions, operator definitions, or support for complex types. These days, such features are so commonplace -- and the need for sharing them so prevalent -- that I can't imagine a new data exchange standard being given a second glance without them. This is especially true given the number of alternatives already in existence.
- But why invent 90% of an app language when a few extra features will make it 100%, and then dump the prior app language (for new projects at least). It doesn't make sense. It's a duck minus the beak. Or pull a JSON: take your app language, and remove some of the execution features (by convention or deletion), leaving only or mostly the definition portion.
- [Answer at page_anchor: 90%]
- [Perhaps after you're finished taking your "sweet time" solving the problems you've been presented and come up with good answers as to exactly why this extra engineering is still a bad tradeoff even given the complexity of those solutions, I'll AgreeToDisagree. Until then, I can't consider your opinion well-informed, and I'll just have to plain old 'disagree'.]
- [As for the "official repository of types" idea: requiring all types be official would create political problems, but it isn't bad to have a shared repository of common types (that can be named by URI). I favor simply using structural types rather than nominative ones (see NominativeAndStructuralTyping) because structural types can be shared quite easily by describing them with plain-old values from the same system. One also gets storage and such (e.g. DataDictionary) for free.]
Huh? You wrote, "if you have a general solution to a common problem, then show about 5 to 10 realistic examples..." I have provided five specific examples illustrative of the problem of type sharing (particularly in the context of DBMSes) that has formed the bulk of discussion on this page, and that will illustrate the distinction between user-defined type support in general -- i.e., regardless of context, whether communicated or in a DBMS or language -- and its lack. Do you have a problem with that? Furthermore, take a closer look at my examples -- are mere Compound_Element_Types sufficient?
Where do you see a potential problem spot? We can focus on that first.
Please, please work through the example, otherwise this will most likely turn into another lengthy exercise in futile rhetoric. I think my point will be much more clearly understood by discovering it, rather than having me explain it.
Fine, but I'll take my sweet time.
Good! I hope you'll find it an interesting (and maybe even enlightening) process. I know it was for me -- skeptical as I was at the time, coming from a mid-80's BigIron business system background -- when the necessities of certain projects drove me in the direction of typeful programming.
page anchor: can_of_worms
RE: Ideally, such tables can store arbitrary data types from the language, or at least a significant fraction of them, in order to avoid all the AccidentalDifficulty associated with your choice of either parsing/serialization or value composition/decomposition/collection when interacting with the tables.
If a DB typing system must be able to match the language, then it would have to have a super-set of type system abilities and thus risk turning into a monstrosity.
There are always development risks. It is true that you can err on the side of overgeneralization of the TypeSystem and introduce extra complexities into the language library. But if one errs in the other direction, one risks turning the applications themselves into monstrosities as they perform complex workarounds, and one risks later requiring a major revision to the TypeSystem to support the type. Fortunately, one can design the original type and value system with the possibility of upgrade firmly in mind, and thus design to reduce the cost of upgrade. Also fortunately, the closer one can get to representing the 'ideal' type, the less translation effort is required. So, like I said, the ideal is to get support every type in the language, but supporting a significant fraction of them is still better than supporting only a few.
For example, although many languages seem to use trees or DAG's for type references, its possible that a language may allow cyclical definitions. So what happens when you go to import a cyclical language type into a DB type system that only accepts DAG's?
I'm assuming you're really asking what happens when you suddenly need to support cylic values when you previously supported only DAG values. There is a difference. A tree is a cyclic type (Tree X = leaf:(X) | node:(Tree X, Tree X). But a tree doesn't necessarily support a cyclic value. An example of a cyclic value would be a set that contains itself as an element.
In this particular case, you'd need to workaround until the TypeSystem and transport system could be upgraded to your requirements. As noted above, the closer you can get the better. For a workaround, what you'd essentially do is use a few tags to represent the semantics for interpreting the value (e.g. 'fixpoint:(name,value)' to say that name=value in value' or 'recursive_bind((name1=value1,name2=value2),value)' for a more letrec-style.) Then, for every application that uses this type you'd need to integrate an extra library layer that can translate between cyclic-values at your language layer and these 'semantic' values for the transport and storage layer. This translation is AccidentalDifficulty, as is the need to distribute your library and integrate it as an intermediate in your applications.
These sorts of workarounds are bad for CrossToolTypeAndObjectSharing for a number of reasons:
- tools designed to interact with DBMS that don't know about these values (i.e. don't have the library) won't understand their semantics when comes time to test for equality and such. Functions won't work on them naturally. Etc.
- if you need to compose two special value semantics within the same values, your libraries may cause problems.
- it is likely that if you needed it, then someone else will too, and you won't know about each other. This will result in reinventions of the same solution, but using slightly different naming conventions, becoming incompatible, and resulting in extra sharing complexities such as translation layers.
- After you do standardize it, you could still be stuck with the legacy translation stuff for some time.
To avoid these problems a little bit of
BigDesignUpFront is appropriate here, just as with language design of other sorts. One should at least support the values one can be seen being passed around today. But one can still err on the side of not being general enough, so long as one designs for upgrades in the future. When you notice people reinventing things, it is time to seriously consider refactoring it into the official standard. Perhaps
ThreeStrikesAndYouRefactor should be applied more globally.
One can design the type system and transport layer for extension. As a simple example for an extensible transport and representation layer, you could reserve all 'semantic value' tags starting with the letter '%' for purposes of this upgrade path. This would allow you to add such things as '%set' and '%lambda' and '%recursive_bind' and so on as the need arises without worries about colliding with someone's homebrew semantic value types. When comes time to add a new semantic type, one simply finds a way to marshall and unmarshall to this type for each language library. Since it was invented before (as above) you've probably got a darn good idea how to do it for the popular languages. The TypeSystem itself doesn't need values annotated with these special tags (plain old tags work for type-descriptors), but it will need to have any new type-descriptors integrated with the validators.
It is worth noting that YAML already has such an upgrade path. YAML is a fine example of engineering that could be applied well to this purpose.
page_anchor: 90%
But why invent 90% of an app language when a few extra features will make it 100%, and then dump the prior app language (for new projects at least). It doesn't make sense.
As you aren't a language designer, I'll accept that "it doesn't make sense" to you. But that extra '10%' (a statistic that is more hyperbole than fact) can make enormous differences on how 'sharable' something is between systems.
Sharability in a computation system might be loosely measured in terms of such things as:
- stability - How long can the value usefully be persisted? References into a running program (like C pointers or microsoft HWND values) are pretty darn volatile. They might even become invalid before the program is finished running. It helps for sharability if values don't expire.
- accessibility - how globally useful is it? Not all values are globally useful. For example, if I have you the local-net address: http://192.168.1.1/byte/me, it isn't particularly useful to you who is outside my local network. It helps for sharability if values don't require your computer be hooked into a particular network.
- completeness - can it be fully understood without context? Incomplete values require you keep around extra context to know how to use the value. For example, the pointer 0xC001BABE is incomplete without also knowing the process address space from which it came. Sharability is helped if values are complete.
- semantic preservation - is the shared thing the same on other people's system as it is on yours? If you share an object, and they mutate the object on their system, will you see the mutation on your own? Sharability is all about semantic preservation (in some senses you can't really claim you've shared the 'object' unless it has the same semantics on their system as it did on yours).
- anonymity and security - can the thing be shared and forwarded even under constraints of anonymity and security? Sharability in general is higher if something can be shared even under such constraints.
- performance and maintenance effort - how much computational effort is required to share in the first place? how much space overhead? can the sharing system be used in embedded systems? can it be made realtime? how much effort is required to maintain the sharing, especially of mutable objects? how much handshaking is involved? The system itself is more sharable if they don't impose high or unacceptable costs, or if one can control the costs well enough for embedded and realtime use.
- exclusivity - can the thing be shared by more than one user at a time? or do you time-share instead? Sharability is helped if one isn't restricted to time-shares. It is quite difficult to usefully store a time-shared object in a database, for example.
- special efforts and AccidentalDifficulty - Is the programmer required to go to special efforts to share while preserving semantics? If so, then that thing isn't sharable with systems that didn't include that special effort. How much special effort is required? Writing specialized and parsers, composing and decomposing something by hand, etc. can all take a great deal of programmer effort that makes sharing more expensive than it needs to be.
Application languages often contain values, types, and objects that involve a great deal of interaction with their environment - tings like
closures that access mutable state,
lambdas with free variables, and
pointers. Other values have potential to mutate as you use them, such as
stateful objects; these are especially difficult to store or share, as doing so essentially requires either continuous cache maintenance or time-sharing... either of which is difficult to achieve if one wishes to persist the values. Other values imply special dependencies for their use - e.g. scripts require embedded interpreters, filenames require one be on the right filesystem, BLOBs require codecs, anything that requires
explicit post-processing (e.g. parsing or translation). These dependencies and interactions can make these values and types far less easy to usefully share.
A LanguageDesigner aiming for CrossToolTypeAndObjectSharing, even if it is for just between tools written in one language, will make a trade: giving up features dependent on the source context or environment and obtaining features that reduce dependencies on the recipient's environment. Sharing lambdas is much better than sharing scripts. Sharing structured values is much better than sharing BLOBs. Etc. A LanguageDesigner aiming for an application language, however, can make some different tradeoffs in order to buy performance or take advantage of known features of the environment. This really isn't a problem for sharing so long as one can automate the translation.
Obtaining sharability may require rejecting that 'extra 10%'. So be it. If one can take a powerful sharing language, intelligently add 10% to it without touching the value or type-descriptor semantics, layer in some convenient programmer syntax, add some standard libraries, and end up with a powerful and high-performance application language, I'd say that's a good thing - a proof, I would argue, of the long-term viability of the sharing language. But I doubt even in that case you'd be able to readily share all objects in this new language, much less the actively running applications. "Why invent 90% of an app language when a few extra features will make it 100%"? Because JustIsaDangerousWord, even when you're talking about 'just' adding 10% more.
I'm not sure how your list quite relates to this topic. Let's approach it from this perspective: given a relatively full-featured app language, which features would you want *removed* to make it a satisfactory for sharing?
That is a fair question. Here are some of the top things I would target for removal:
- syntax: you don't need a fixed syntax for sharing, and sharing syntax is a little difficult to do usefully (you can only share it with humans, really). Leave representation to the implementations, and focus on semantics.
- semantic volatile state: references to inherently volatile state, such as data held on a stack, lack stability and are thus unsuitable for sharing. If the optimizer says otherwise, that's fine.
- second class ANYTHING: second class types, second class functions, the stack, etc. If you can't create it at runtime, then it can't be shared from others and it needs to go... or be made FirstClass (but I'm talking about removing things here).
- nominative types: types are only available by name in a scope local to the application code, and thus fail the accessibility criterion. If typing is to be included, then structural typing shall be favored.
- variable typed data: in C language, as an (*ahem*) 'feature', everything from characters to integers can vary in width willy-nilly at the whim of the implementation. Similarly, little-endian vs. big-endian is just AccidentalDifficulty waiting to happen (as with UTF-16LE vs. UTF-16BE). If there is an issue of serialization, it needs to be well defined.
- representation typed data: no need for int16, int32, byte, etc. Just 'Integer' will do for sharing. If you need to limit it, use a constrained type - something like 'Integer x # (0 <= x <= 65535)' As with syntax, let the implementations choose an appropriate representation, thus avoiding AccidentalDifficulty and allowing greater optimizations.
- united states: if the language supports stateful objects at all, then that state should be semantically divided from those objects, which would then become value objects. The state is replaced with global references to external cells, which contain one value each (which could, in turn, again be a value object referencing state.) This results in a system where you have value-objects, cells containing one value each, and nothing else. At this point, the objects can be stored in databases, shared arbitrarily, etc. and they operate essentially as stateless services. Semantics are preserved so long as they don't share cells (and exciting new possibilities of composing mashups by uniting cells becomes available). The cells can be accessed by their global references when the data contained by them is utilized, and a distributed state service could easily track these cells as simple tuples: (URI,<value>,security,type-descriptor,other,meta,data>). Usefully, this service can provide automatic cache management, transaction support, and garbage collection should the language benefit from them.
- IO Monad: it's rather difficult to obtain semantic preservation when sharing something that represents your whole world but contains nothing well defined. ;-) Of course, only one language has it. But it needs to go.
- impure functions: functions with side-effects are difficult to share for reasons of security and anonymity. When you share an impure function, and semantics are preserved, it calls home to do such things as print. Ugh. This can be resolved by having functions calculate up new programs or procedures to later execute in the application language.
- synchronized message passing: even on a local machine there are hidden costs for synchronization (with regards to complexity of delegations and potentially cyclic callbacks). The performance hit of attempting to achieve synchronized message passing rises dramatically for objects shared between processes and machines, so don't bother.
- threads: I know how to share a stateless service (a value object) with asynchronous message passing that might reference a few cells. But threads are beyond me. By nature, you can only transmit static sequences of bits over a wire, and you can only store static things to a file. Threads aren't static. You can't share them. They break semantic preservation.
- locks and mutexes: if you need concurrency management, switch to optimistic SoftwareTransactionalMemory and/or support a workflow language. Locks and mutexes are nigh unsharable: they're too easy to break. I'll refer you back to the security and anonymity constraint here. Mutexes are just insecure.
I'd also add the
any type, for where you don't care. A lot of languages don't have one.
Here's a draft of increasing "levels" of complexity of information transfered. Note that some are not necessarily pre-requisites of those lower on the list, for its only an approximate hierarchy.
- Data values
- Basic schema (column name, base-type)
- Basic validation (size, range, reg-ex, etc.)
- Custom types
- Compound types (like Complex)
- Type operator definitions
- Type constraints
- Type operator implementations
I'd suggest ending at either compound types or operator definitions. Beyond that requires expression and code execution/evaluation.
--top
[You honestly believe that 'reg-ex' doesn't require some fairly arbitrary evaluation? Based on this list, my suspicion is that you think 'simple' that with which you're familiar and 'complex' that with which you are not. There is no clear technical reasoning behind your hierarchy.]
You can make your own hierarchy or feature list if you want. Implementing type operators is a huge ramp-up in complexity to most rational people.
Really? I find that surprising, to the point that I suspect we're thinking of different things when you mention "type operators". What do you mean?
Here is a schema that can represent types up to "compound types" in the above list. (RunTimeEngineSchema may suggest ways to represent operators and parameters). Text representation (serialization) is not considered here.
types
-------
typeName
sequence // integer
parentType // either another "typeName" or base type
notes
// primary key: typeName + sequence
typeAttributes
--------
typeRef // foriegn key to "types" table
sequenceRef // foriegn key to "types" table
attribName // examples: maxLength, lowRange
attribBaseType // text, number, integer, dateTime
attribValue
The "typeAttributes" table allows somewhat open-ended attributes. Some may argue that creating tables as needed is the "proper" way to do it, but many shop arrangements don't make that task very easy. Thus, an
AttributeTable is assumed instead.
Base types are: text, number, integer, dateTime. The parentType must be either another "typeName" or base type. Circular references are not allowed. If attributes appear multiple times in tree/DAG path, then the lowest level one takes precidence. Here's an example of a compound type:
typeName...sequence...parentType
--------------------------------
coordinate.....1......number
coordinate.....2......number
(Dots to prevent
TabMunging)
If you are manually assigning sequences, then perhaps increments of 100 makes it easier to insert new ones later. (Remember the BASIC line-number days?)
I'm curious how and where the semantics of the type attributes, e.g., maxLength, lowRange, etc., will be defined. Are these intended to be canonical, agreed by the endpoints of the system, or something else? Also, is the above model (and that on RunTimeEngineSchema) intended purely to be illustrative, or to be implemented?
A standard set could be defined, similar to reserved words, but would not prevent custom ones. As far as illustrated versus implemented, I would consider it illustrative at this point. But if you see something that would prevent implementation, please point it out.
For the custom type attributes, how and where would you define their semantics?
What is an example? I stopped short of defining type operators. A "note" column is available for a longer description. I suppose we can add a note column for attributes also.
Let's imagine you've defined an attribute called oddNumbersOnly, intended to limit values to odd integers. If it's strictly an attribute name (or described via human language in a 'note' column), it might be meaningful to a human but not to a machine. Therefore, where would the semantics of this constraint be defined, thus permitting values of this type to automatically be treated differently from ordinary integers? Or, would this not be automatic, the presumption being that communication endpoints must have pre-agreed, already-in-place machinery to recognise and appropriately handle "oddNumbersOnly"?
- Again, I purposely stopped at a certain point in the "level list" above and made no attempt to have a formal place to define and implement operators or constraints.
- How, then, does your approach improve upon existing mechanisms for transmitting values and simple composite type definitions?
- XML is not very relational-friendly. (Nor does it have a standard for defining type operators etc. I would note.) My proposal can also be XML-ized if need be.
- How is XML not relational-friendly? Or, conversely, how is FLIRT (which I presume you're suggesting, as the above table model is obviously not a data-exchange format in and of itself) more relational-friendly than XML, given that the conceptual elements of FLIRT can be trivially represented in XML and leverage the numerous pre-existing XML parser libraries? No need, therefore, to write any FLIRT parsers. And, should "relational-friendly" be a priority in a format intended (presumably) for general-purpose data exchange? Indeed, I am not aware of any XML standard for exchanging rich type definition information, but your approach doesn't either, so I think we can consider them equal on that count.
- Your "rich types" are a little too rich, bordering on an app language. You essentially want an app language with a few sandbox restrictions. Why not just take something like Java, Eiffel, or Ada, and chop out the portions you don't want to create a type-heavy exchange system? Ada-Express anyone?
- What does "too rich" mean? A language derived from a general-purpose programming language, but altered/culled to particularly suit data-exchange purposes without loss of semantic information, sounds like a good idea.
- I already gave my opinion on this above. You are welcome to propose and promote the idea and see if it catches on. My bet is that shops not using the base language that its based on will ignore it, preferring something like XML instead.
- When I next need such a thing for my own purposes, I will build it and release it. If other developers need similar capabilities, maybe they'll use it too. Or maybe not; that's not my concern. I don't intend to limit the capabilities of my solution in order to suit the limited needs -- or the limited vision -- of others. That said, I don't expect the typical developer would even see the underlying language (any more than the typical CORBA developer examines IIOP) as its use would presumably be hidden behind APIs.
- I'd rather get such info from/as relational tables than gazillion set/get API's, which is what most of it would amount to through API's since most of it is declarative in nature. Massive set/get is a smell in my opinion.
- [Violations of OnceAndOnlyOnce when writing query strings is a smell in my opinion. Massive application-side query processing, such as parsers or procedures necessary to complete a query or ensure consistency, is a smell in my opinion. Needing to modify the database just to query it is a smell in my opinion. Perhaps you, with your head buried deep in the proverbial CBA sands, remain obstinately and blissfully ignorant of how much existing RDBMS's stink for applications in systems programming, sciences, arts, games, and mathematics where we need databases that catalog and index everything from sparse matrices and enzyme geometries to behaviors and beliefs. Whatever smell you're imagining doesn't even compare. The only way to make you understand it would be to force you to efficiently handle arbitrary-width strings (say, up to one billion characters) in a database that only supports only finite-width bit arrays... or, perhaps, to have you solve a few 'text book' problems about which you're taking your "sweet time".]
- You are envisioning some odd stuff. Why would it need to be modified? How about some examples.
- [You haven't done your homework. Supposing you wish to handle the "myTable where myTree = Tree(1,Tree(2,3,Tree(4),5),6)" problem, and you happen to represent tree structures by folding them into the schema and representing a bunch of flat 'nodes', then you essentially have two options: move the node table to the application for processing (possibly iteratively, to filter as much as possible), or to copy the tree value into the node table so you can then use the same solution as you use in solving "myTable join myTable as myTable2 where myTable.myTree = myTable2.myTree". Essentially, you need to pick between "massive application-side query processing AND violate OnceAndOnlyOnce" or "perform updates just to issue a query". Now, can you please start solving those problems you said you'd take your "sweet time" solving? Or was saying you'd get to them an outright lie?]
- Again, I'm not here to define/demonstrate a query language.
- I see nothing here about defining or demonstrating a query language. It's merely used as an example; the issues in question could be demonstrated with a general-purpose programming language, but a query language makes them particularly evident, and query languages (and by implication, DBMSes) are currently the primary means by which (at least) business data is shared. Thus, it is directly relevant to CrossToolTypeAndObjectSharing. And, Correspondent #2 raises a good point: You did appear to agree to work through the exercise provided. Do you not intend to do it?
- As already discussed, you seem to need a hierarchy-friendly query system, which is beyond the goal of a data exchange system in my opinion. I see no need to revisit this.
- [The only hierarchical type mentioned is 'trees', which are illustrative but are not the only complex type mentioned. Relations between flat collections, sets, simple inductive types (e.g. unions), and simple coinductive types (e.g. records) and various combinations of these things also qualify, and many also suffer from the above problem. Tree-based query operators won't help for the majority of these problems. And, importantly, they don't help with the above problem regarding updating a database just to query it. Support for structured values, on the other hand, do solve the problem.]
- [As far as: "[exchange of complex data values] is beyond the goal of a data exchange system in my opinion", how do you justify this claim? Keep in mind that the above Tree(...) structure is intended for use as a value, just like an integer or a string. It is only the complex workarounds you require by insisting that values shouldn't be nested that push these structures into the schema. If exchange of values is "beyond the goal of a data exchange system in [your] opinion", then what IS the goal of a data exchange system, in your opinion?]
- I am generally against nesting complex structures directly within the nodes of another complex structure. (Although it can be virtualized, as already described.)
- The benefit of type-definition mechanisms is that complex value structure can be encapsulated, so that it appears homologous to a simple, primitive value. Thus, you can nest a complex structure within another complex structure, but it appears -- in every respect -- to be a simple structure nested within a complex structure. Without appropriate type-definition facilities, complex values can only be represented by un-encapsulated complex structures consisting of primitive values. Thus, you still must deal with all the complexity of nesting complex structure within complex structure, even though you've attempted (sort of) to avoid it! With appropriate type-definition facilities, complexity is localised to the type definition itself; everywhere else, it appears to be a simple type. Without these, you are forced to deal with complexity at all points in the system. While it is possible, in principle, to "virtualize" (?) types, you are in effect manufacturing type-definition mechanisms. It is obviously preferable, in that case, to have appropriate type-definition mechanisms in the first place.
- Encapsulation generally assumes an absolute view of things. In my experience, relativism of view is preferred, at least in my domain. Abstraction is obtained by applying need-specific transformations, not an absolute "this is your one single interface" approach that encapsulation gives. Now it may be true that a more hierarchical view (composition) of abstraction is preferrable in other domains, and those domains probably want support for it. However, cross-tool behaviorism is tricky to pull off without inventing an entire TuringComplete language to support it.
- Certain type systems support "relativism of view". This has already been discussed. See sections with references to DateAndDarwensTypeSystem on this page.
- [You refer to the 'possreps'. These do allow useful user views for data.]
- [The encapsulation discussed above does not preclude arbitrary views of content. It, instead, captures representation. Believe this: in your experience, relativism of view is NOT preferred for representation. If you preferred it, you'd spend your time doing shit along the lines of setting up 16 boolean columns to represent 16-bit integers. There is a huge complexity penalty for doing so, but you do gain two things: (a) your RDBMS is easier (since you don't need to support integers), (b) you can now 'view' integers in terms subcomponents of one of its infinite representations.]
- [Seriously, you're a total hypocrite: you should be arguing that RDBMSs should reject support for integers, and especially strings. Strings, after all, "hard-nest" characters. Either you should be rejecting strings, or you should be recognizing that support for relative 'viewing' of one of infinite representations is generally not worth the complexity, even to you. In truth, I expect that 99.9% of the time you'd prefer to not need to care about internal representation. That same preference to avoid complexity and concern for internal representation applies to people in other domains. Being forced to 'virtualize' types by inventing and maintaining representations for them is really, really not worth it.]
- I said for *complex structures*. Strings are not complex structures, or at least haven't been parsed into computer-readable complex structures. Thus, I am not the "hypocrite" you claimed due to excess testosterone.
- [If strings - which are arbitrary-length lists of characters, which in turn are integers represented by complex sequences of bits (often of variant length for unicode) - are not complex structures, then neither should be arbitrary length lists of strings, or lists of lists of strings... or even whole trees of strings, since those can be formed from lists of lists. "Strings are not complex structures" in your mind only because you haven't spent much time studying the complexities of their storage and transport... because the tools and libraries available to you make them easy to work with. I'd bet money that if you grew up with RDBMSs that made easy working with sequences, sets, and trees of values then you'd be sitting there shouting: "Trees are not *complex structures*! When I say *complex* I mean like graphs, lambdas, and stream values!" Further, FLIRT (intended to be a transport between RDBMSs) would have been designed from the start to accommodate these 'simple' structures. It seems you rationalize what you work with and tirade against everything outside your comfort zone without any real application of logic or reason to the whole decision process.]
- Strings are not "parsed out" into structures such as lists, trees, etc. In other words, they are not "atomized" into nodes and relationships between nodes (links). I'm not saying they don't have the potential to represent complex structures, only that before they are atomized they are not complex structures for computerized usage.
- [Huh? Strings are 'atomized' into characters. Strings are, by nature, just as complex as sequences of integers. If you're willing to accept lists of integers, why not values that are sequences of other "simple" values? You've said "strings" are simple. Why not sequences of strings?]
- (Replies from this point cut and moved to page_anchor: top's book)
- If we use them heavily in their atomized (node-ified?) form, then they should be part of the table structure rather than treated as nested components.
- Have you read nothing of what we've written about the problems inherent in representing complex values as "part of the table structure rather than treated as nested components"? Have you even attempted the exercises I wrote?
- [He's probably taking his "sweet time" still, just as he'll be doing next week, the week after, and so on until he's dead. It's more convenient for him to completely miss the mark as he attempts to counter our reasoning from a position of ignorance than it is for him to handle a few homework problems.]
- First we need to settle on the scope. This is an open buggaboo. I am not implementing operators, for example.
- There's nothing to settle. Scope is not the issue. Simply answer the following: "Now, create equivalent or simpler queries using System B that are as expressive and intuitive as the above. You may presume any representation you like. Pay particular attention to attr2 and attr5..." You needn't define operators; you may assume -- if you wish -- that they exist. Merely show us the queries.
- I am not here to define/invent/document a query language for reasons already stated. --top
- WHAT? How did you arrive at that? Use any query language you like; it doesn't matter.
- [And just a thought: given that there are always infinite ways to represent or 'virtualize' any non-trivial value structure, the chances of such a solution being sharable with an independently developed solution with minimal translation effort will not be high. 'Virtualized' types are, I imagine, a serious blow to sharability.]
- So... You'd rather construct a query string, execute it, obtain a resultset, and get an attribute from its first row than simply issue object.getAttribute()?
- [Based on the solutions he's advocated in the past, Top would rather construct a query string, execute it, obtain a result set, process the result-set procedurally to test properties outside the query language's capabilities, construct another query string, execute it, obtain another result set, process it procedurally (likely a second procedure), then return the desired attribute... and do all this repeatedly and in slight variations... than use 'object.getAttribute()'.]
- In some cases we'd filter out specific stuff, in others we'd obtain the entire schema. It depends on the situation. It's good to have both options when needed.
- [I wasn't precluding the possibility of filtering. Are you proudly confirming what I wrote above, or attempting to raise a counterpoint?]
- I'm not sure what you are trying to achieve so I don't know what is being compared. Nor are they necessarily mutually exclusive. If one ones to put an OOP wrapper around a schema, they can. If this is going to turn into another HolyWar about whether querying is better than object traversal, then we should probably end this now.
- [This isn't about querying being 'better than' structured value traversal or vice versa. It's about them not being viable replacements for one another. You can put an OOP wrapper around a schema... but it won't give you OOP unless you proliferate schemas like a madman, implement a garbage collector, and pay out the arse in terms of performance. The other direction is true, too: you can put a relational wrapper around an object, but the complexity of doing so, indexing, handling updates to the object and its indices, etc. is no picnic.]
At a glance, I don't see anything that would prevent implementation of a system that provides a relational reflection of the run-time core of a programming language, with a relational repository for its source code. Indeed, at work I've internally published a similar idea as a possible applied thesis topic for students (no takers, so far) -- with a particular focus on exploring its value in terms of refactoring, maintenance, and run-time manipulation/observation (such as for debugging purposes) of the executing environment. However, I intuitively suspect ExtendedSetTheory may prove more flexible than the RelationalModel while providing the same benefits (if any), but at this point, it's pure (though rather enjoyable) speculation.
page_anchor: top's book
Generally strings are treated as a unit. If and when we want to do "listy" things with then, we can convert them into a more formal data structure. I agree the distinction may be somewhat arbitrary or usage-specific, but trees cross the line in my book. Lists can go either way, but trees are into the structure camp.
also: Relational is against nesting in my book.
[It seems your "book" makes several assumptions with which I disagree:
- you assume that trees, sets, measurements, geometries, graphs, and other structured values aren't used as units. They might not be used often in CBA, but other domains need to use these as units often enough. This includes need to compare for equality, join, update, and delete... as a unit. Often complex structured values represent point measurements or identifiers.
- you assume that value accessors are 'special' operations. Technically, it is a non-predicate unary function with a single return value, no different from preparing "(32 + (x * 1.8))" as part of a query. This is the strongest distinction you can make while remaining consistent with the use of accessors in immutable value systems.
- you probably assume that relational systems can't effectively handle multi-valued returns. This is simply untrue, and you'd know it to be untrue if you ever used a logic language (like Prolog or Mercury). For example, square-root(9) can return both 3 and -3. Representation of this in a query is simple: one tuple in the query result for each return value. Having the query language support operators with multiple-return values will resolve every technical complaint you've ever made with regards to supporting 'views' of the data.
- you assume that there is some sort of advantage of breaking apart a structured value for purpose of 'views' over that object. This assumption is contradicted: representing a structured value in this manner actually hinders arbitrary views of structure because one cannot readily break the structure into a new organization of schema without a complicated series of joins. Between using unary 'possreps' and multi-valued function returns, you can obtain an arbitrary view of value without performing any joins: you can focus strictly on how you need to view the data without worries as to how it is represented.
- you assume that 'general' is good enough to inflict a hard rule with forced engineering tradeoffs. Technically, you are very inconsistent with several of your professed principles on this matter. That is, you are demanding that users always trade for an (assumed and alleged) benefit in flexible 'views' of data even at massive costs to the complexity of updates, deletions, and equality comparisons. (It's made worse by the fact that this 'alleged' flexibility benefit is an illusion without any real substance, as described above.)
- you assume that operators can readily be defined to handle destructured values. You have asserted several times that you are not opposed to operators to support tree structured data... as though this is supposed to make a difference. There are reasons to believe you're incorrect: even with such operators, you'll still end up violating OnceAndOnlyOnce repeated naming of tables and columns and complex functions like 'equality' from one query to another.
[Unless you're practicing a religion, you should seriously test your assumptions. We've tested them. We have expressed disagreement and reasoning with many of your stated and unstated assumptions, often with examples, but it seems you just aren't
open-minded enough to
seriously entertain the possibility your assumptions could be in error.]
[It isn't as though promoting support for rich structure means I am "against relational". There are both technical and semantic points that can rigorously be applied in determining where relations are warranted (regarding such things as mutability, measurements, predicate disjunction, normal forms, nature of data (inferred model vs. sensory), etc.). But your naive "book" doesn't consider these points, and instead favors an a one-size-fits-all approach with you declaring "trees are into the structure camp" and that they shouldn't be nested within a cell. Worse, your approach isn't even justified on any technical basis, which leaves it, essentially, unjustified.]
If you want a stated rule, then: "Lists can either be embedded in the atom or broken out as a structure (table) based on usage patterns, but anything more complicated should be a formal structure (table)."
What is your rational basis for defining this rule? Do you really prefer the complexity of schema definitions and obtuse, complicated queries to the inherent simplicity of using complex values in exactly the same manner as a simple, primitive value?
The definition for a structure has to be *somewhere*. I'm just using conservation of conventions/rules to avoid reinventing one from scratch for each "kind" of data structure.
[I think you've got it backwards. Embedding structure back into the schema is what requires users reinvent types (trees, sets, lists, etc.) from scratch for each "kind" of data structure. How many times do you think 'trees in SQL' have been reinvented?]
As far as queries, we are not defining a query language, at least not something I will participate in.
[I agree: we are not defining a query language. However, ignoring query languages as a representative tool with with which types and objects are utilized and shared would be an act of rather monumental stupidity. What good is CrossToolTypeAndObjectSharing if it can't work with one of the more common tools for sharing data? So we are using queries as a framework for proof-of-concept for CrossToolTypeAndObjectSharing. If a solution fails for integration with RDBMS and queries, it fails: end of investigation... time to look for a better solution.]
[As to your latter problem: I might suggest Firefox. My spellchecker works just fine. I also suggest forcing yourself to learn to spell correctly every word that you mistype. E.g. domonent -> dominant; just type 'dominant' as fast as possible ten or so times each time you mess it up (including for each mess-up while typing it ten times). Burn correct spelling into muscle memory... it, on average, shouldn't take more than a minute of your lifetime per word. Funny. I'd have thought a person who rejects support from the compiler for catching errors would have a bit more self-discipline and less reliance on a spellchecker.]
I am using Firefox, and it reaches a limit. And I have more interesting skills to work on than spelling. Long topics should be divided for other reasons anyhow. The spellchecker croaking is just a reminder of the size.
[You're probably not working on any skills while arguing with me... or doing anything else productive or useful (like sleeping), even if you should be, so don't try to rationalize your laziness with that excuse.]
And what is your excuse against producing a realistic biz domain example? I suspect its not that you are lazy, but that you are afraid of open scrutiny.
[Doing so is not logically required of me. That's a very good excuse. If you feel the existence of types inhibit you from doing the biz-domain work you're doing right now, then please provide an example that proves the rule. Otherwise you're just whining because your personal HobbyHorse isn't in the limelight.]
Your "logic" tends to use faulty givens. If you don't want to sell your pet techniques to possibly the largest niche on the planet and cater just to narrow niches, be my guest.
[Of course I'd sell my techniques to the largest 'niche' on the planet. Businesses would (presumably) benefit from the sharing described by the title of this page (along with the reflection and the greater simplicity of configuration management that are implied by it). Businesses gain the potential to access data that was, before, tied up inside system software or unavailable for queries or data-mining due to need for complex joins between non-integrated or specialized DBMS products. There's a lot of icing on the domain-generic cake. But it simply isn't logically relevant to my arguments. Whether such things are proven or not isn't critical to claims of 'better'-ness, which are based on RDBMS being domain generic software and the ability to practically service more domains. While it may be fun to speculate, I consider doing so a distraction from the relevant arguments.]
You say my logic uses "faulty givens". I'll write my givens here. With which of them do you disagree, if any? Please give reasoning.
- RDBMS should be domain generic.
- Systems Software, Mathematics, Geology, Biology, Sonar and Radar and Video, Arts, Games, etc. all qualify as domains.
- Some domains require or benefit from relations over or indexing of structured values such as sets, trees, geometries, code, behavior and beliefs, sparse matrices, etc.
- You have not identified a need to "hard nest" complex structures inside relational "cells".
- [What about polynomial expressions, source code, nested arrays, geometric shapes, graphs, and other static, complex values that are common in a variety of domains? It is not so much that there is a need to "hard nest" (?) these as type values in relational cells -- you can, indeed, awkwardly force them into a relational schema. The problem is that doing so -- and then indexing and querying these -- is far, far more complex, awkward, and inclined to result in duplication than using appropriate type definition mechanisms, as has been amply demonstrated in various places on this page.]
- Indeed. There is no need to "hard nest". You're free to sacrifice simplicity, performance, OnceAndOnlyOnce, semantic purity, declarative consistency, independence of representation, and CrossToolTypeAndObjectSharing... in order to avoid "hard nesting". On the other side, the implementation is a little easier. Fair trade?
- Your demonstrations were for existing RDBMS. As I mentioned below, they are limited for user-defined types regardless of which approach is taken. Being pro-relational is not the same as being pro-status-quo. Your demonstrations pointed out problems with existing implementations outside of biz, but did not point out universal faults of relational itself. --top
- [Sorry, I'm not following you here. I don't see any intent (or justification) on this page, or any other, to point out "universal faults of relational itself". The biggest "fault" of the RelationalModel is that true implementations thereof -- other than in-house products -- can be counted on one hand, and none are yet complete. Another fault is the inclination to regard the RelationalModel as a fundamental paradigm or overarching approach -- which it is not -- rather than a flexible, composable, and effective means for manipulating collections of values within the context of other paradigms and/or approaches. It is worth noting that the RelationalModel does not deprecate rich types; quite the opposite, in fact. However, defining complex types by shoe-horning primitive values into schemas is considered an unfortunate (but sometimes necessary, as in many SQL implementations) work-around, not a feature.]
- I agree here. Neither correspondent #1 nor myself has a problem with the RelationalModel itself. The problem is those convoluted workarounds you deal with when you don't have the types your project demands. It seems Top is thinking we are anti-Relational simply because we are pro-type.
- I'm not sure what you mean by "have" in this context. If you have an example of "convoluted workaround" that are inherent to the relational model and not just existing vendor products, please show them. --top
- [This has already been covered. See, for example, the portion of this page starting with "Top, consider ordered tree-values used as primary keys ..." for an example of a convoluted workaround, in which storing a tree value in a schema -- obviously a workaround for appropriate user-defined type capabilities -- adds needless complexity.]
- You are asking about how a query system would work, no? This is not about query systems. If you want to start a topic about type-friendly query systems, be my guest.
- No, I'm hoping you'll explore some of the problems and issues surrounding rich type-definition capabilities vs a lack of rich type-definition capabilities, in a context which (a) you should find resonant and interesting, as the author of TopsQueryLanguage and proponent of TableOrientedProgramming, supporter of extensive use of DBMSes, and advocate of query languages; and (b) is directly relevant to CrossToolTypeAndObjectSharing, given that DBMSes are the primary means by which business (and other) data is managed for the express purpose of being shared.
- I've already agreed that a flexible query language/system could find lots of use in some domains. But to get the flexibility and power you want appears to require a complex and sophisticated query language; one that has graph-oriented features for tree and graph operations and user-defined types. I'm not against that. I'm just not going to participate in such right now. By the way, for your tree example, why not make a LISP server to provide a query language that is a subset of LISP? Your trees are close enough to ess-expressions and I'm sure there are libraries that provide ess-expression comparisons. (Or a language similar to Lisp.)
- [We don't require the sophisticated query language with graph operations. In fact, we don't even believe that those sophisticated query features would actually solve the problems being raised in the tree example above. But, since you insist that sophisticated query features would solve the problem, we do ask that you show us how using the tree example... because we don't believe you. What 'appears' to be the solution is support for structure-rich domain values, such that you can stick a complete tree-description into a single attribute cell so you can perform equality tests as easily as 'columnA = columnB' or 'columnA = Tree(1,2,Tree(3,Tree(4)),5)'. And you have claimed to be against that ("relational is against nesting in my book," you say). As to your Lisp question, I'll answer with more questions: (1) how can a small or medium-sized project justify the expenses of reinventing, implementing, optimizing, and maintaining a database, a relational query language, and ODBC in Lisp just to get proper support for tree values? shouldn't that cost be divided among all developers? (2) how does doing so help accomplish CrossToolTypeAndObjectSharing, such as cross-database joins and access to the data via auto-gen CRUD screens?]
- There are three basic options for storing structured values:
- Store structured values in a format opaque to the DBMS, such as a string or a BLOB.
- Store structured values by folding it back into the schema schema:
- simple coinductive types (records, tuples, etc.) may be represented as multiple columns in a schema, such as '(a_real, a_imaginary)'.
- simple inductive types (unions or tagged unions) may be represented as multiple columns with all but one group of columns being NULL, or by falling back to the 'opaque format' (integer | string | boolean) vs. representing booleans and integers as strings. Inductive types can be recognized by the function that most naturally processes them being a big pattern-matching case or if-then statement.
- arbitrary-length collections (sets, vectors, matrices) receive a surrogate identifier and a separate schema where they are broken down into constituent parts. For example, a set of (integer|string) in myTable might be broken down into TABLE myTable(mySetID,attr1,attr2) and TABLE setDescriptors(setID,anInteger,aString). A vector or matrix might also receive a positional argument or two, e.g. vectorDescriptor(vectorID,nPos,anInteger).
- recursive inductive typed values (aka fold values... trees and lists, for example) are broken down into simple coinductive types with more surrogate identifiers. Doing so usually requires another schema.
- function values aren't handled at all, except by falling back to BLOBs or strings. Identity over such values must be processed by applications.
- recursive coinductive typed values (aka unfold values... streams of fibonacci numbers, for example) are not handled at all because functions are not handled at all, and functions are necessary to describe unfold values. (As a note, recursive coinductive types handle the vast majority of pure functional object-oriented programming.)
- graphs and isomorphic-identity values aren't handled at all. One can fall back to sets, but points end up requiring labels and no attempt to support isomorphisms is implied.
- value semantics can be handled by keeping an extra column to represent the meaning of a value (e.g. as a tag, used in EntityAttributeValue), or by using the column/attribute name to apply semantics.
- Store structured values as domain types, such that a column can handle records, tuples, tagged unions, sets, vectors, matrices, fold values like trees, function values, unfold values like streams, etc.
- When storing in an opaque structure, such as a string or BLOB: This structure can be processed application-side by use of parsers and libraries. Arbitrary applications cannot share this data in any meaningful way because they do not know how to interpret it. It is difficult to join on any particular features of the data, meaning the DBMS acts more as a persistence layer for this opaque data than an actual database. Validating complex strings will either occur application-side or require some rather arbitrary evaluations from a regular expression. Testing string equality is easy, but tests of semantic equality (e.g. because of a difference in spacing) will either require passing some complex functions as part of the query or performing application-side processing.
- When folding values back into the schema, one will encounter a great many smells. Simple coinductive values force violations of OnceAndOnlyOnce as soon as you need to use them twice. As an example, the use of the complex value folded into the schema requires repeating parameter groups in queries such as (a_real,a_imaginary), in addition to repeating these structures in different schema and repeating them in query operations (a_real == b_real AND a_imaginary == b_imaginary). Needing to deal with surrogate identifiers on values is even worse, since you must perform very complex queries to test equality. Testing equality between two lists or two vectors is difficult enough... testing equality between two sets is far more difficult. These are complex enough that I can't easily provide a solution, and so I leave them as exercises to the reader. The real stink is that the programmer must hand-implement the solution every time a set or vector is used, and that the processing takes place application-side.
- Violations of OnceAndOnlyOnce is a high programming stink. The need to transport large amounts of data for application-side processing is a high complexity, representation coupling, sharing, AND performance stink. The above solutions smell very, very bad for the users.
- When storing structured values as domain types, the above smells aren't encountered by the users. The implementor of the RDBMS, the query language, and the transport language may have a slightly tougher time supporting these types, but the users of the system do not. Among other things, users who are focused on CBA and only need strings, integers, and datetimes will still have these sorts of values available to them.
- You seem to be focusing on the fact that most existing RDBMS don't support user-defined or compound types. While it is problematic, one does not have to use an existing RDBMS for my suggestion. It may gum up implementation, but that is true no matter which route is taking for user-defined types. --top
- The problem certainly exists today because existing RDBMS's, filesystems, et. al. have not solved it before today. And compound types were hardly the only thing on the above list. Those places requiring surrogate identifiers on values (e.g. to represent trees) also earned plenty of my ire and focus, above. Your approach to compound types helps in small part, but doesn't touch the majority of cases where structured values would be beneficial.
Your main objection to types seems to be that you want to keep the RDBMS, query language, and FLIRT as trim as possible. I.e. you're focused on the implementation, not on the users of the system. If there is a "faulty given" here, I think it is your belief that the job of the implementor should be to fob off as much 'stink' onto the users as is necessary to make their own jobs easier unless, of course, the domain is CBA, where you make a big, entirely self-centered exception (I didn't say "just strings", I want dates and other CBA types, too!).
I think you are doing the opposite: trying to sneak your pet paradigm or Grand Unified Type Machine into different shops under the guise of a data exchange system.
Indeed, instead of selfishly starting a HolyWar over every RDBMS or data exchange feature that might not directly pertain to my needs, I'm doing the opposite. I believe there is great value in a framework that simultaneously meets domain needs for types and allows these shops to readily automate sharing of data. The ability to share is a straight up feature for data storage and exchange. I'm not being particularly 'sneaky' about it.
I am not sure what you mean by CBA exception. We need *some* kind of base types. Most tools already support strings, numbers, and dates; and that's why I used them as base types. If you are envisioning another set, let's see it. Put your cards on the table if you don't like mine. --top
You could get by with just strings. Or even just finite-width bit arrays. Or even just bits if you fold bit sequences into the schema with (col_x_bit_0, col_x_bit_1, ...). If you're going for type-light, why do you feel you should make CBA types 'special'? because they were there first?
I'd suggest the following as a starting point: unit, bit, codepoint (aka unicode 'character'), point (point = opaque surrogate identifier), semantically tagged value, tuple, record, unions, collection, set, recursive inductive types, lambdas, recursive coinductive types. And then I'd ensure the language can be extended just in case I missed a few.
It looks like we'll forever disagree about the scope of this "project". You want the whole kitchen sink in it such that it is an application programming language or mega-query-language in itself with a few "sandbox" restrictions. We seem to be going in circles. Hopefully we raised some good questions.
You don't comprehend the actual "scope of this project" because you've been too busy using circular logic to defend your delusion that issues of usage are "far beyond the goal of 'sharing', per title". Here is the circular logic: You believe that use and exchange are orthogonal. You are wrong. We present examples that demonstrate you are wrong. You refuse to seriously examine these examples. Why? Because: You believe that use and exchange are orthogonal. Circle complete. We've run that circle at least a few times already. If you'd like to break out of the circle, you must be open-minded enought to allow a contest of that belief of yours and you must be willing to go to the effort to present evidence that we are incorrect. But either approach requires that you seriously confront the examples we have provided. After all, it is those sort of examples that convinced us that you are incorrect, and our belief on that issue will not change by you merely waving your hands and insisting otherwise.
As far as your belief about my own goals: I have not suggested a full query language, but I am aware enough to recognize that any solution to this "project" must integrate effectively with filesystems and query languages. After all, CrossToolTypeAndObjectSharing really ain't worth much if databases and filesystems are outside the set of tools with which the types and objects may be shared. And so it is worth considering the "types and objects" in the context of filesystems and databases in addition to operating systems and other tools, many of which have, indeed, been explored above.
If you can pull it off without inventing a programming or query language to do it, I welcome your attempt. If not, and the only known way to achieve such is to invent a programming or query language, then that's that. I was hoping/trying for a declarative solution that didn't require expressions and algorithms; but if that's not do-able or won't satisfy enough domains, then that's that. --top
I do agree that keeping types as declarative as possible is a fine idea.
So far it seems not possible to get types as rich as you want without using non-declarative techniques. Perhaps "declarative" needs to be defined better for our context. For example, a functional language can be TuringComplete but still considered "declarative" by some.
Declarative doesn't mean "not TuringComplete", but I would consider it reasonable to avoid TuringComplete definitions - but we can have much richer types than we currently have without resorting to TuringComplete type definitions. ML and Haskell don't have TuringComplete types (but they are TuringComplete languages).
Because you seem confused, I'll note that we can also have a much richer value system without having TuringComplete values, and you could look at Charity language (http://en.wikipedia.org/wiki/Charity_(programming_language)) for an example. However, I am not so certain that the engineering tradeoff of doing so is worthwhile for CrossToolTypeAndObjectSharing.
In any case, independent of the TuringComplete issue, I think that procedural and mechanical (including function-evaluation) definitions of types and constraints (e.g. trigger-based constraints) are a bad idea. I also object to building values piece-by-piece in this mechanical manner, though I don't object to values containing a function.
It is worth noting that my objection to procedural description of values is one of the (great many) reasons I find your 'break structures into schema' approach to be abhorrent: the need for surrogate identifiers to build up a tree (or set, or list, or any 'nested' construct) node-by-node requires a procedural or mechanical approach to value descriptions. As usual, I think this is a case where you say one thing (you want declarative values) but, because you're operating on wrong assumptions, you favor approaches that accomplish the exact opposite of what you believe they'll accomplish.
I suspect your example is not realistic enough to illustrate that trees are not really trees.
I'm not attempting to illustrate that "trees are not really trees". I suspect that you are still assuming that values should be broken down into structures if they're used for more than equality. I stated above which assumptions of yours I reject, and that is among them. The example, in fact, is meant to challenge your assumptions. Therefore you are missing the point (and, I'll repeat, resorting to circular logic) when you reject the example because of one of your assumptions.
But anyhow, it seems a semi-custom system may be more appropriate to make every domain and personal prefrence happy. Rather than provide the One Grand Language, perhaps what is needed is kits that allow one to roll-their-own database without having to start from square one.
Ah, everyone will roll-their-own database without using a standard language for rolling it... what a fantastic idea for CrossToolTypeAndObjectSharing. Why, with that idea even databases can't share types or objects. Even better, it will be nearly impossible to write generic tools that can view, manage, and update databases. Oh! Oh! And you DBA guys will get to squash bugs in both the database and the schema! (And you'll get to learn a new, non-standard language each time! How exciting!) Rather than a small group of experts getting the database implemented mostly right with a few learnable quirks, we can let thousands of small groups each get it wrong in their own way! You'll never have a shortage of new and interesting stories for jobs you wish you could automate.
Are you trying to sabotage CrossToolTypeAndObjectSharing? and reliability? and maintainability? and overall simplicity?
Why do you object to a standard language for type and value descriptions? You obviously don't object to standards for communication in general (like SQL, ODBC, Sockets, filesystems, XML, HTTP, TCP/IP, etc.). Do you have any valid technical objections (i.e. excluding any based on unvalidated assumptions), or do you just fear the idea of learning something new in order to repair the messes people make of new features? (It isn't as though they won't make messes of the existing features, but perhaps that is the devil-you-know.)
I'm skeptical that a generic cross-domain query language can be made that doesn't become giant ball of crud that only a committee could love.
I asked about "a standard language for type and value descriptions" which is not the same thing as a "a generic cross-domain query language". Producing a system of types and values is a smaller task than producing a query language. Every query language must have a system of types and values (even if it is 'EverythingIsa bit') in addition to a query semantics, and possibly a data-manipulation semantics (update, insert, delete, etc.). For "generic cross-domain query language", I find your skepticism reasonable - it is difficult to figure out who needs what queries (clustering?) and which optimizations. But for "standard language for type and value descriptions" your skepticism is unwarranted: we already know a great deal about what works for types and values across domains. The set of common GeneralPurposeProgrammingLanguage types (records, sets, ordered collections, graphs, etc.) is common for a reason. We even have examples of working structured value systems in the form of JSON and YAML, and for simple type-descriptors with XML Schema.
It's tough enough to standardized *within* a domain. SQL, ODBC, FTP, etc. are the lucky few that "clicked" among many dozens if not hundreds of dead or lingering attempts, and they assume rather basic base "types". My suggestion is walk before we run by focusing on a non-expression-based declarative approach to type sharing. You gotta learn to get to orbit before you land on the moon. You don't try for the moon on the first attempt. When that is perfected and road-tested, then add operator implimentations etc. You are risking another never-ending XanaduProject by trying for all-or-nothing. HTML and the Web succeeded where Xanadu failed because it only bit out what it could chew and swallow. --top
I do agree that an intelligently designed evolutionary approach (i.e. one designed to accomplish backwards compatibility while resisting TIMTOWTDI and LanguageIdiomClutter) is a good idea. One doesn't wish to bite off more than one can chew, but one also doesn't want to deprecate features and receive internal competition between product versions, or to clutter up the system with features or multi-version maintenance as tends to happen if you don't deprecate features while adding new ones. When it comes to shared interface standards such as APIs, languages, and CrossToolTypeAndObjectSharing, taking 'big bites', or BigDesignUpFront, is the only practical way to end up with a simple and complete solution. And the less you can get to up front, the more you need to 'big design' support for future versions.
Anyhow, I don't believe the "walk before we run" analogy is particularly apt... not unless, on this evolutionary scale, you rate modern SQL as "sliming along under a heavy burden". One of the major reasons that many "dead or lingering attempts [that] assume rather basic 'types'" exist is because they don't sufficiently advance state-of-the-art. To succeed, any project must offer enough to offset the perceived pain of change, which is subjectively measured in the perceived (guesstimated) costs of paying for the new technology, learning to use the new technology, and deprecating the existing technology (for an in-depth study, read The Change Function by Pip Coburn). Incremental improvements to non-disposable technologies will usually fail unless they provide backwards compatibility. Contrary to your apparent expectations, those "rather basic 'types'" you mention are an indicator that the system will die or fail... because 'basic types' isn't sufficiently above and beyond the existing systems (XML/YAML/JSON/etc.) to motivate change. In order to succeed, a new technology will need to enter the world-at-large "running" better than the other technologies and will probably need an evolutionary path leading towards locomotion and unmanned flight.
In your tree example, you had a query that asked for tree equality. Do you expect this operation to be implemented in the sharing system? Or, merely defined as existing?
Answered below.
You asked, "What I want to be able to do is define trees and their operators OnceAndOnlyOnce so that I don't need to build them into each query." If the goal is not related to building a query system, why are you asking me this? --top
I'll try to simplify it: what you seem to be hearing is that I expect you to implement a query system, whereas what I've been actually saying is that I believe (for reasons given) that your destructure-the-values approach sucks for CrossToolTypeAndObjectSharing and almost every other purpose. Further, I'm saying that I am utterly skeptical of your claim that a few new query operators will fix the problems. If you believe a few new operators will help, I want you to show me. And if you can't show me, I want you to EatYourOwnDogFood and see what it tastes like. You keep giving advice on how tree values should be destructured and how doing so provides alleged benefits. Now you try it.
Seriously. Follow your own advice. Destructure a domain-value that happens to be a tree, then actually use it as a domain-value - as an identifier with equality-queries, access queries, joins, inserts, deletes. We're allowing you the latitude to choose your own query operators so long as they are clearly implementable and aren't schema-specific. We just, honestly, don't believe they'll help as much as you seem to believe they will.
And to avoid the impression that I'm claiming the tree-structured domain-values are used for nothing but equality and joins, go ahead and handle a few tree operations (merge, contains-subtree, partial pattern match, views, etc.) that you believe your approach handles more effectively. Demonstrate those benefits you have regularly asserted are there to be found relative to structured domain-values. If your examples correspond to uses of domain values (i.e. excluding mutation because domain-values cannot be mutated) we'll give you our versions of those operations should you challenge us on them.
Values and Operators and Tools and Sharing
In your tree example, you had a query that asked for tree equality. Do you expect this operation to be implemented in the sharing system? Or, merely defined as existing? --top
Value equality would be well defined as part of the standard. If an implementation provides an equality operator to programmers, it would be expected to adhere to the same semantics that everyone else has been told to use as standard for equality.
Whether a given tool implements or provides access to the equality operation (e.g. via a scripting language or query language or library API) would be up to the tool. Some tools, such as socket transport for values, might be unconcerned with equality comparisons. Of course, one would expect most tools to end up using a common shared library that implements these things, and I imagine such a library would be very likely to provide the equality comparison operator even to tools that don't need it (at least prior to dead-code-elimination optimizations).
Similarly, you could expect other primitive operators (e.g. projection on a record or tuple, set unions and intersections, cartesian joins, etc.) to be well defined. It is these sort of operators that define the value, providing its intrinsic semantics and differentiating it from other values... these operators are part of the EssentialDifficulty of any value system, in the same sense that 'integers' are defined in terms of successor and predecessor and that 'strings' are constructed of ordered sequences of characters. Also, if type-descriptors are part of the standard, one would expect a standard definition for whether a given value is a member of a given type-descriptor (but not all tools would be concerned with types).
Tools would be able to provide arbitrary higher operators, such as pattern-matching, definitions for joins, functions, etc. Tools would provide the extrinsic semantics for values - e.g. interpreting a given record as a command or query or identifier or physical address or statement of truth. Not all tools (e.g. transport and storage) are interested in such extrinsic semantics, and only need to concern themselves with intrinsic semantics. If standards are to be placed on these extrinsic semantics, that can be performed with a different set of standards, likely standardized specific to the domain (e.g. physics) or the class of tools (e.g. DBMSs).
Unfortunately for your goal of externalizing as much as possible, having multiple such standards is directly counterproductive to CrossToolTypeAndObjectSharing, because it prevents sharing between tools of different classes and domains, and prevents sharing with cross-domain utilities (like the DBMS or object-editor). Essentially one is 'parsing' or 'interpreting' the value for the given domain. In an extreme case, one could make it so 'EverythingIsa BinaryLargeObject?' with each domain providing its own implementation libraries and semantics, and we'd be even worse off than we are today. Sharing between these tools would require extremely complicated translation efforts. Note these problems of extrinsic semantics is inherent; it exists whether you are putting everything into BLOBs, into strings (plain text), or destructured into relational schema. Other tools won't usually know how to interpret a given 'node table'. The greater the degree to which semantics are externalized, the more duplication of effort and semantic divergence will occur, and the more expensive sharing becomes.
Thus one must, by nature, strike an engineering balance between simple cross-tool sharing and what you call LanguageIdiomClutter. Engineering tradeoffs are for anticipated costs and benefits justified by use of analogy, model, example, and prototype. And before you raise the 'DisciplineEnvy' flag, you'd do well to realize that even professional engineers rarely understand the full ramifications of any given design decision, especially in RDT&E (Research, Development, Test, and Evaluation). In this sense, SoftwareEngineering and language design are no worse off than other RDT&E programs. There are always risks when trying something new. Real engineers recognize this and attempt to control and compensate for risk.
I'm certainly with you on the goal for language minimalism (or 'concept preservation' as you call it), but I do see considerably more need for structured value support than you see. I would not be surprised if this is due to the different domains in which we work. You write business reports, CRUD screens, help companies organize their data, and other CBA apps. I do robotics, operator control units (soft realtime GUI with commands, planning, video media, remote camera manipulation, overview maps, world models integrating sensor payloads from different platforms, etc.), configuration support (languages, reactive programming, scripting languages, domain object models), mission planning (AI and heuristics support, DeltaIsolation, knowledge management), command and control protocols, and distributed systems - and that's just for my job. In my minimal spare time I study languages and I'm implementing one. I'd like to think I know a bit about the real demands for CrossToolTypeAndObjectSharing.
There are some rules of thumb one might follow in order to help achieve this balance between sharability and clutter. One example is ThreeStrikesAndYouRefactor. Essentially, if you see the same basic 'concept' being reinvented in three different domains, it is time to look at finding a way to fold it into the standard. Not that I recommend starting from nothing and refactoring from there: it is easy to paint oneself into a corner when it comes to concerns for backwards compatibility and language minimalism. Languages, APIs, protocols, and other interface standards are notoriously difficult to extend and especially to shrink or refactor unless great care is taken, so some BDUF is warranted even if it is minimally to prepare the extension mechanisms. Comparatively, fixed approaches like "lists are okay, trees are not, and I'm going to completely ignore sets" will not be able to meet real engineering demands.
I know for certain that we systems, math, and language programmers would benefit from easy CrossToolTypeAndObjectSharing of: unit, strings, sequences (lists), sets, unlabeled graphs (with isomorphic equality), structured commands and messages (can be represented as records of these things), and unions or tagged unions.
I'll admit the apparent need for sharing 'lambda' substitution functions and 'unfold' values (which require lambdas) is less than need for the value structures named above. These evaluative forms are useful for the transport of predicates and guards, transforms, triggers, accessors, infinite concepts like the fibonacci series or the sequence of twin prime numbers, etc.. But they aren't used so often that JSON or YAML bothered supporting them. These are a place where I could just ensure there is a way to add them later without breaking existing tools, then leave it at that. On the other hand, these also aren't complex or difficult to implement. When unburdened by syntax and optimization concerns, implementing lambdas, lambda application, coinductive unfolds, and structural equality can easily be done in fewer than one-hundred lines of C code. And transport can look a lot like other tagged records, simply using reserved names.
I found my reply repeating things I already said, so I junked it. I suggest we re-focus on a smaller goal here: transferring data between different apps in a "static" fashion: that is, packaged in a file. Let's put aside a query-oriented solutions for now. How do we transfer object and type-heavy data?
If you paid attention, you'd note that these questions have already been answered AND that nothing in the above 10 paragraphs involves "query-oriented solutions" making your entire comment here a non-sequitur, and making it quite clear that you didn't bother reading before replying. Anyhow, I went into a thorough answer to your specific question in the section entitled Type Heavy FileSystem. This time, try to avoid ListeningWithYourAnswerRunning.
Stop projecting your poor documentation skills as the sin of ListeningWithYourAnswerRunning on my part. If it smells, tastes, looks, and feels like a query language, then for all intents and purposes its a query language even if you call it something else.
I'm not projecting. You really are an awful listener, and you are extremely rude to ask questions, not read the answers, then reply to what you assumed the answer would be. And you just did it again (twice, here and below), so don't pretend this is my fault. Nothing written in the section entitled Values and Operators and Tools and Sharing "smells, tastes, looks, and feels like a query language"; indeed, stuff written in that section applies to query languages, but it also applies to interpreters, filesystems, sockets, fifos, and so on.
"Extremely rude"? You are being a drama queeen again. Extremely rude would be calling you a meandering befuddled run-on idealistic exaggerating detached patronizing blowhard. See the difference?
I do see a difference. Actions speak much louder than words. I consider name-calling to be childish, and a much lesser form of rudeness than such things as intellectual dishonesty, sophistry, hypocrisy, ShiftingTheBurdenOfProof, or asking questions then not listening to the answers. Indeed, the fact that you'd waste forum time and space exclusively to diverging on the off-topic and subjective semantics surrounding "extremely rude" speaks much worse of your behavior than does calling me a 'drama queen'. If you're going to waste a little time in meta-discussion, you could at least spend an equal amount of time on content relevant to CrossToolTypeAndObjectSharing in order to ensure forum progress.
Your sense of "justice" is as convoluted and twisted as your writing style. Good riddance.
I prefer to call it 'developing and maturing', but I can see how it might look 'convoluted and twisted' to someone whose understanding of justice seems to have reached its full development in grade school. Hmm... I guess this presents something of a challenge to you: can you stand by your "good riddance" statement and resist attempting to achieve the last insult? Or will you belie your assertion and reply? Time will tell.
Type Heavy FileSystem [perhaps move to TypeHeavyFileSystem??]
Q: How should we package type-heavy or structured data in a file for both persistence and sharing between processes?
A: We should take the straightforward approach: we should make it so files can package type-heavy or structured data for persistence and sharing between processes. Doing so requires upgrading the FileSystem (one might call the result a Type Heavy FileSystem. We need the FileSystem to support structured values as structured values - both in the sense of persisting structured values and of providing structured-value FIFOs and such.
FileSystems today support structured values as octet streams, which is remarkably inefficient and awful for CrossToolTypeAndObjectSharing. Modern approaches to sharing structured data is to serialize YAML or JSON or XML or CSV or whatever into an octet stream using interpretive layer of structure (objects, attributes, sequences, etc.) atop an interpretive layer of characters (typically ASCII, UTF-8, or UTF-16) atop the octet stream. Alternatively, we serialize structure directly atop the octet stream (and call it a 'binary' format, like MP3 or H264).
But this modern approach has a significant set of engineering disadvantages that prevents it from being how we should do it:
- every tool must duplicate code and effort for parsing octet streams into structured data (decodec)
- every tool must duplicate code and effort for serializing structured data into octet streams (encodec)
- coding is inefficient because programmers must continuously repeat the effort of integrating encodecs and decodecs with each new tool.
- coding is inefficient because programmers must continuously play 'gatekeeper', testing for and handling cases of poorly formed input.
- message passing and file sharing between tools is inefficient due to the serialization and parsing and gatekeeper efforts
- message passing and file sharing between tools is inefficient due to the need for hardware memory protections that could be eliminated given support for typed FileSystem pipes, fifos, and files.
- the OperatingSystem is incapable of detecting or raising errors if a tool will receive values that it cannot process.
- proliferation of standards (YAML and JSON are just two among many) largely precludes the creation of common utilities for processing these files except as octet streams (or possibly as 'text' by assuming ASCII, UTF-8, UTF-16, EBCDIC, etc.).
- lack of support for views (i.e. 'open_as_type' or 'view_as'): supporting views intrinsically requires translating between structures which, in turn, requires that knowledge of the original structure be available... which is not the case if everything is an OctetStream?. Tools that process many different structured values (such as media players supporting MP3 and Ogg) must deal with great complexity in these 'view as WAV' translations, especially to support a streaming model, and there is no automatic means to support new structured values that can be viewed in older structures. It would be convenient if views could simply be installed into the OperatingSystem, such that every tool that processes, say, audio can simply ask 'open_as(filename,type_audio_stream)' and the OS or language lib will grab the appropriate translator.
- the octet stream model for files is only capable of random access in one dimension (seek/jump to octet count); it is impossible for a FileSystem to index for random access into structures such that one can readily just grab the pieces in which one is interested. Additionally, seeking is essentially useless for anything but fixed-sized structures in a flat file.
- the octet stream model for files is only capable of streaming in one dimension (forward); it is impossible for languages or the OperatingSystem to optimize memory usage for other forms of access.
- engineering systems of lightweight tools is discouraged in favor of combining tools into a larger application due to the resource costs and safety issues from the code and time spent loading, saving, parsing and serializing, and gatekeeping. This discouragement introduces pervasive, hidden complexity costs and further duplication between the tools that isn't obvious until one studies alternatives (like NoApplication frameworks and lightweight processes).
- the filesystem can't support many other interesting features that you might not even think to ask for because the existing tools are too limited, cross-index the files based on shared structures for easy searches, hyperlinking, complex filesystem queries (i.e. other than 'find <regexp>'), joins, etc.
Instead of supporting structured data
inside octet streams, even using some sort of 'standard' like YAML or JSON, we would be much better off to have the
FileSystem directly support structured data.
By doing so, all of the above problems diminish. The FileSystem can index for efficient access to parts of a structured value. The OperatingSystem and language standard libraries can provide streaming access into this structure such that if you access a really large file only the parts you're looking at are cached into RAM and the whole file needn't be loaded just to access the few pieces you need. (With polymorphism, this can be done while maintaining the same interface as values produced inside the language runtime.) The OperatingSystem can support views and translations via 'open_as_type(filename,type-descriptor-or-name)'. The need for parsers and serializers and gatekeeping and 'views' is centralized OnceAndOnlyOnce into some mix of the OperatingSystem and its libraries. FileSystem FIFOs, Stacks, Pipes, etc. can be made massively more efficient by avoiding the need for marshalling and unmarshalling when communicating between tools on a single machine. New tools and processes become easy, cheap, and safe to write and integrate via pipelines due to the savings in memory and CPU, the savings in programmer effort, the flexibility of views, and the ability to detect workflow errors (maybe even statically). And it becomes easier to create tools that can usefully access, view, and manipulate any file used by any other tool.
And the line between OperatingSystem and Language blurs even further. LanguagesAreOperatingSystems. I consider this a good and natural thing, but I'll note that some people like the hard division between languages and operating systems... probably because that's the devil they know. Or perhaps they don't like it, but simply don't have the imagination or education to envision anything else.
The above benefits aren't guaranteed, of course. One could trivially thumb their nose at the typesystem and start passing data around as type:[(bool,bool,bool,bool,bool,bool,bool,bool)] - a sequence of octets - and be no better or worse off than they were with the existing FileSystem. The idea is to obtain as much optimization and safety and useful features like automated translations and as possible... balanced against the desire for language minimalism and the need for simple implementation. Striking that balance is an engineering problem.
You may be wondering about transport: how do files get from one implementation of the filesystem to another? But transport of structured data is essentially a solved problem. The only real issue is picking just one or a few of the vast array of solutions (XML, YAML, JSON, EssExpressions, erlang marshalled values, mozart marshalled values, or even a new format just for this). It's a solved problem and uninteresting to people who are properly educated in ComputerScience. I don't dwell upon it, but if I were forced to pick just one I'd favor YAML for the following reasons:
- YAML is already designed to support the vast array of actual, observed needs for structured data in the real world.
- YAML is more extensible than JSON due to its 'type tags', (!type-tag). It also largely supports backwards compatibility, so that a new type-tag doesn't break existing tools.
- YAML is more complete than XML, which is incomplete without a schema; essentially the design efforts of YAML would be repeated for XML
- YAML is more text-editor friendly than XML, JSON, or EssExpressions. In particular, it supports scalar text without bracketing (e.g. in quotes). It was designed with this goal from the start. This would be a big issue (I imagine) with regards to backwards compatibility when upgrading to TypeHeavyFileSystem? from the existing situation.
- YAML is already implemented.
- YAML is unicode friendly. If we do upgrade to a TypeHeavyFileSystem?, we might as well support unicode and drop ASCII as the default encoding.
- YAML has versioning support. In particular, YAML starts with '%YAML1.X' on a line by itself. Rigorously using this would easily allow for transport-layer to be upgraded later via '%ANOTHER_OPTION'. XML does this, too, but (IIRC) JSON does not.
- YAML's toplevel file is a sequence of records broken apart by '---' on a newline. This makes YAML highly suitable to integration with FIFOs, stacks, and streaming data. (Not that it couldn't be done with another solution... this just makes it convenient).
I'd disfavor building my own solution
because this is a non-critical issue. If we need to later support more than YAML supports, well, YAML itself is extensible or we could upgrade to '%ANOTHER_OPTION' and dispatch to the correct
OperatingSystem plugin.
The FileSystem itself wouldn't store structured data as YAML (unless it wanted to). YAML is just a network serialization format. It could also be used to provide 'plain text' access to values and objects, but doing so would be less efficient than keeping the value in value form (e.g. for streaming). Because YAML is 'just a serialization format' instead of 'the official language', one needn't embrace all of YAML's features or types; one could get by with using and supporting only a subset, so long as one supports every feature one uses.
The choice of YAML is reasonable... but, ultimately, boring. People who focus too much on the syntax of transporting and saving values might make some small, incremental improvements, but they'll not be making any significant improvements to CrossToolTypeAndObjectSharing.
- YAML, XML, FlirtDataTextFormat, etc. are all representationally equivalent. Thus, its basically "my language can beat up your language" kinds of style issues at this point. (This is not the place to debate data exchange text/markup languages, by the way.) --top
- I strongly doubt the validity of your assertion about representational equivalence unless you mean it in the same sense that BLOBs can carry any digital data if you just interpret them correctly - which happens to be a vacuous tautology (even a single bit can carry any digital data if you just interpret it correctly).
- Either way, I've noted above that issues of transport language are uninteresting and can easily be changed later (modulo backwards compatibility issues). There are some engineering components that make YAML a better choice than most, but (like I said) one could make due with JSON or XML or a made-up-language-on-the-spot... so long as it is shared by different OperatingSystems. If you truly understand that YAML is just one plug-in choice, and that transport of structured data from one FileSystem to another is effectively a solved problem, then you understand the meat of what was being said. If, however, your 'understanding' is that this has devolved into a "my language can beat up your language" issue, then you completely missed the point, and you'd do well to ask questions to help me clarify it for you.
There are
major one-time costs regarding a TypeHeavyFileSystem
?: implementation, optimization, provision of other features (versioning, distribution, caching, sharing, etc.), and
especially its integration with the existing panoply of existing files and tools. Choosing such a
FileSystem would, for an
OperatingSystem, be a lot like starting from scratch. Many existing efforts would need to have significant rewrites to take advantage of the system.
Example APIs:
Value myFileState = open_file_as( myFileName, aTypeDescriptor ).
save_file( myFileName, expression producing Value )
MutableFile myFile = open_file_for_write( myFileName )
myFile.artist = ... expression producing Value ...
myFile.element[20] = ... expression producing Value ...
The
OperatingSystem would include plugins (selected heuristically) to perform that type conversions automatically (and possibly lazily), and could raise an error if no conversion was available. The 'myFile' object itself would then be accessible in the language... and, again, might be accessed lazily and with indexing (to support very large files) such that 'myFile.artist' can immediately return the artist without loading or searching the file
and without concern for the
representation of the file. Usefully, the language or libs could also integrate filesystem transactions, versioning, mutable views, language-integrated queries, and such behind the scenes.
No, I guess I was not clear. File systems are not static. They are a kind of special-purpose hierarchical database. I meant more like the kind of stuff that XML and CSV would be used for. --top
File states are immutable values. That makes them static. The fundamental answer to "how do we transfer object and type-heavy or structured data between different apps in a 'static' fashion" is to support type-heavy or structured immutable values as file states. Which are static. To support structured values as file states requires modifying the FileSystem. I labeled the result 'Type Heavy FileSystem'. And you'd know all of this if you weren't too busy jumping to conclusions to read what was written.
Or, perhaps, you believe that values and structures can somehow be magically separated from the operations you use to access them? In reality, attempting to divide values from their operators is a logical impossibility, a bit like trying to divide wetness from water. This particular detail was covered above, in the section entitled "Values and Operators and Tools and Sharing" at the part discussing intrinsic and extrinsic value semantics of values.
In modern FileSystems, the intrinsic semantics of every file is octet stream (or binary large object), and so the FileSystem API (the operators) only support access to files as though they were BLOBs. In the proposed Type Heavy FileSystem, the intrinsic semantics of every value can allow a variety of types, such as records, unions, sequences, sets, graphs, matrices, numbers, dates and times, geographic locations, other measurements, and so on (subject to engineering tradeoffs). And so, in such a FileSystem, the API (the operators) would provide a richer set of accessors and manipulators, likely via integration with language-objects.
You can't change the intrinsic nature of files without also changing the FileSystem. That's fundamental.
Basically, with a Type Heavy FileSystem, you obtain the advantages of XML and CSV (and YAML and JSON and other structured data representations)... but you also get a whopping massive bundle of additional advantages for optimization, code complexity, OnceAndOnlyOnce, safety, and so on. These additional advantages are described above (mostly as relative disadvantages of the modern EverythingIsa octet stream approach).
And as a side note, you say that FileSystems are a "hierarchical" database. While common, that is completely optional. There is no problem with flat filesystems without folders or locality of reference... or, at least there's no problem if you have some other means of organization (e.g. a system of tags).
Relating FileSystem to DBMS
The issues in a DBMS are similar... just broader. If this confuses you, think in terms of:
TABLE FileSystem {
FILENAME,
FILEVALUE,
AND MANY COLUMNS,
OF SECURITY,
TIME,
VERSIONING,
BRANCHING,
TRANSACTION,
MIRRORING AND CACHING,
AND OTHER FILESYSTEM METADATA,
MAYBE INCLUDING 'TYPE',
MAYBE INCLUDING 'NAMESPACE' (or e.g. folder)
}
Implementing and optimizing this table (and optionally a namespace/folders table) to support block devices and streaming and internal indexing and distribution and transactions and other features is what gets you a filesystem... just as supporting
arbitrary schema (plus indexing, streaming/cursors, distribution, transactions, and other features) is what gets you a database management system.
In a regular FileSystem, FILEVALUE is an octet sequence - a BLOB. In a Type Heavy FileSystem, a FILEVALUE is a complex structure with nesting. If you wanted a statically typed filesystem, you could have 'TYPE' be a metadata element or part of its filename. This would not do you much good unless compiled processes provide static info on what types they need for input/output. If you prefer dynamism, you'd simply use the any type. What makes it a 'TypeHeavyFileSystem?' is the formally standardized complex structure found in a 'FILEVALUE', not the use of a manifest 'TYPE'.
Do you at least agree that introducing direct behavior implementation into such a tool/language greatly increases its complexity? --top
It unquestionably increases the amount of developer effort for any given new implementation of such a language, but this is a trivial observation -- having to implement n + 1 features requires more effort than implementing n features. However, this is a one-time amortisable cost which is increasingly negligible over time.
For a user of such a (presumably well-designed and correctly-implemented) language:
- Users that do not require "direct behaviour implemention" need not be aware of its existence in the language. If we measure a user's perception of complexity via some positive integer metric c, the increase in complexity is c = 0.
- Users that do require "direct behaviour implementation" will unquestionably find it less complex to simply use it (assuming the language provides it), than to have to implement it themselves via whatever means. Having to implement "direct behaviour implementation" because the language does not provide it adds c = k complexity.
Therefore, using a language that provides "direct behaviour implementation" will, in fact, demonstrate less complexity than using a language that does not possess it.
It's perhaps worth noting that we can replace "direct behaviour implementation" with any useful language feature <x>, and the above remains true. There is a cost to learning to use feature <x>, of course, but this is again a one-time amortisable cost which is increasingly negligible over time. The cost of not having <x>, on the other hand, will be incurred every time <x> is needed.
[In the context of whole language design and implementation, by which I mean to include the language standard libraries, introducing direct behavior implementation of certain features (in particular, KeyLanguageFeatures and features that overcome a MissingFeatureSmell) can actually reduce the net complexity of a language. It can do so because the complexity cost of efficiently and correctly implementing the standard library (i.e. paying attention to NonFunctionalRequirements such as optimization and security and robustness and simplicity of interface and the ability to avoid leaking implementation details) may be greater without this feature than is the total cost of integrating the feature with the language then using it to help implement the standard library.]
- What "standard library"?
- The functions, classes, modules, etc., that are typically supplied with a language implementation. E.g., the C Standard Library, the Java 2 Platform libraries, etc. These are often written using (mainly) the language itself.
- Why are we talking about C etc. here?
- [Languages and programming tools each generally define both a set of primitives and a library of composite components for performing common tasks. Neither the primitives nor the libraries by themselves wholly describe the language (or tool). The library associated with a language is commonly called the "standard library". Each language traditionally has its own "standard library" or libraries, and implementing this standard library is often the majority of work involved in implementing a language. CeeLanguage is one example of a language with a "standard library", but is one among very, very many.]
- But what does this have to do with a sharing utility or language?
- It's becoming a diversion, though it's not inconceivable (and, indeed, quite likely) that a sharing language would have its own standard library. Please read what we've written above in this section, and avoid being distracted by standard library issues.
- To be clear, I am talking about the complexity of the "sharing tool" itself, not net complexity for developers of using it with app languages etc. I've resigned to the idea that we'd come to agreement on that level.
- The grammar specification for a language with "direct behaviour implementation" would almost undoubtably contain more rules than one without. So what? It is the net complexity for user developers that is relevant, not the number of grammar rules that need to be implemented in the language parser.
- So then you agree with my original statement in this section? Note I used "it's complexity" and NOT "total complexity".
- No, I do not agree that implementing "direct behaviour implementation" in a sharing language greatly increases its own inherent complexity. Having implemented behavioural mechanisms in a variety of domain-specific and other languages, these are relatively trivial and insignificant (in terms of developer effort) compared to implementing the features of the language that justified its existence in the first place. The behavioural aspects of the TutorialDee implementation in my RelProject, for example, were a few evenings worth of coding but I've been working on the whole thing, in my spare time, for four years.
- But the world is not ready for the most part to have the DB also be a heavy-duty app language, even if it turns out to be the "logical thing to do". They'd rather have their favorite app language be given DB features rather than the other way around. And most shops use different app languages and tools and want something that integrates well with them all. Complex typing on the DB side will not help this. Your "solution" requires a fundamental rewrite of the industry, not just DB's. -t
- TutorialDee and similar efforts (LINQ, for example) [LanguageIntegratedQueryProject], are about providing database features in general-purpose languages. Arguably, the current dichotomy between database systems and application languages is an artificial one that will (eventually) be addressed by seamlessly integrating the functionality of both. Research work is actively exploring this idea; it is only a matter of time before it becomes mainstream. Interestingly, in many ways, this harks back to ExBase systems like dBaseIII and MicrosoftAccess, which (relatively) smoothly integrated database and application functionality for small "desktop" or workgroup database applications, but were lacking in the levels of scalability, portability, distributability, reliability, support for complex user-defined types, expected "modern" language features, and maintainability that are needed for enterprise-scale systems. However, these are solvable problems that require only developer time and effort to address; there are no theoretical or conceptual hurdles to overcome.
- MicrosoftAccess is a poor example of query-to-language integration in my experience. The simplest things required verbose API's if one wanted to do anything more than run a macro, especially at the column level. But as far as user-defined-types in DB's, what about sharing? If your little app creates a type called "coordinate", what then if a Java app, Python app, and a CrystalReports project a department away also want that info? Do they need to re-define the coordinate type within? Successful apps almost always need to start sharing info at some stage in their maturity. RDBMS have succeeded in the market place largely because they allowed sharing fairly easily. There may not be "theoretical hurdles", but there are certainly practical ones.
- Certainly MicrosoftAccess (and ExBase) were and are fraught with flaws; I mention them only as conceptual steps in what might be the right direction -- at least to the extent that shared constructs (e.g., database schemata, including type definitions) should be centralised, but even purely distributed applications should seem to be centralised whether they are or not. I.e., physical locality should only need to be considered for deployment; it shouldn't be an issue for development. In the long term, I predict that successors to Java, Python, CrystalReports et al. will support canonical user-defined type systems, which will permit sharability the same way that canonical built-in types -- e.g., integer, string, float, etc. -- are handled by mechanisms to support sharing despite the fact that they may have different representations in various languages. Currently, mapping the differing representations of canonical built-in types is a function of ODBC/JDBC drivers and the like. Conceptually, there is nothing that precludes the development of sophisticated user-defined type and value mapping mechanisms, perhaps similar to those (for example) currently employed in CORBA to distribute object definitions.
- As already argued above, a full definition of "types" requires something far more complex than sharing attributes, including a TuringComplete language. It's an order of magnitude in complexity. The industry will not make such a heavy investment without some huge factor to push it. OOP rode in on GUI coattails. Shared types would need a similar "killer need". To embed something like that in each app language/tool is rather excessive. A kind of AbstractionInversion. An underlying principle that affects this all is that attributes are simply and inherently easier to share than types/objects. We'll see flying cars before we ever see shared types. A compromise, described above, is to only share the declarative portion of types/objects. (Note that I'm a big fan of ExBase, at least for certain uses, although I'd change lots of things if given the chance.) -t
- [The concerted efforts by such multi-industry organizations as OMG (the creators of CORBA and DDS), XML, W3C, plus individual efforts by Microsoft and IBM and Sun (who have each created systems that allow object sharing (COM, DotNet, Java Enterprise Beans, etc.) indicates that the industry will make heavy investments, that they do see a "killer need" for sharing structures in a standardized manner. The problem isn't the lack of need, but rather the standardization of solution. Standardization is a complex, political mess... but, without numbers, you cannot claim the standardization of types to be 'more complex' than any other standardization problem. You certainly can't argue it 'easier than attributes' or 'an order of magnitude in complexity'. I'll further note that it doesn't take a TuringComplete language to support very powerful "type" definitions that would serve most purposes - see CharityLanguage and TotalFunctionalProgramming for examples. Also, it is no AbstractionInversion to integrate such standards in a language, tool, or standard library... and, further, doing so is as natural as supporting ASCII or Unicode.]
- Developmental complexity (users can obviously choose not to use it, so it doesn't represent complexity from a user's point of view), order of magnitude or not, is not an obstacle. It merely requires time and effort to achieve. Arguably, the "killer need" already exists, as anyone who has tried to develop complex applications in existing tools will attest... As for seeing flying cars before we see shared types, about twenty years ago the IT director of a large firm for which I did some consulting -- a firm staunchly committed to their mainframe -- proudly proclaimed that "there'll be flying cars before we start using PCs." A few weeks later, the CEO fired him and hired a new IT director who replaced the mainframe with a network of PCs. Change can occur quickly, even when the need is only perceived instead of real. And, in this case, the need is real -- you'll feel that need the next time you have to store a collection of discrete trees in a database, and are required look them up via a variety of queries -- whether you recognise the need or not.
- Tree query features and "user-defined-types" are generally two different animals. I'm all for adding graph/tree traversal to RDBMS. And your "types are the new PC" is a stretched analogy at best. "The new Xanadu Project" is also a possible fit.
- You've misunderstood -- I used a tree as an example of a complex type. Replace it with complex number, geographical location, temperature, polynomial, lattice, graph, or any other complex type. It has nothing to do with recursive queries, tree traversals, and the like. And, I would argue that TypefulProgramming is certainly significant and a growth area. Reducing the strength of type systems, i.e., to the point that the only type is a string, for example, garners little interest at present.
- This gets into a messy classification argument and/or argument about how complex the cells/atoms/lego-blocks should be for shareable info. Or perhaps attribute-driven versus behavioral-driven interfaces. We've already been around and around on this. Plus, I'm describing how the market-place is likely to react. Without a "killer app" or catchy demo, it will shun heavy type sharing regardless of any possible inherent merits because most have not been shown how to do it well. You need to produce an effective demo/example if you wish to sell your BronzeHammer.
- Whilst I don't disagree that a catchy demo would grab the eye of stakeholders and sceptics, are you not speaking for yourself rather than "the market-place"? What is "excess" type sharing, anyway? How could it be excessive?
- I hardly see lots of "buzz" about it. I'm just saying that most will ignore it unless you find a better way to "sell" it. There are already too many GoldenHammer and BronzeHammer evangelizers. (And I adjusted the "excessive" comment.)
[So, no, I
do not agree that, in general, "introducing direct behavior implementation into such a tool/language greatly increases
its complexity" - not even when just considering the implementation of the language itself. This is because, when implementing the standard library, the language implementor is also a language user. And, in general, feature support in the language or its standard library simplifies things for users.]
AugustZeroEight and MarchZeroNine
See Also: PowerOfPlainText, CrossAppLanguageOopIsRough, NaturalEventSyntaxDiscussion
CategoryInfoPackaging, CategoryReuse, CategoryText, CategoryIdealism