Direct language connection with relational database if available.
Not all state is necessarily stored in database (temp vars, etc)
All variables are tables
This tuple is created directly, or by performing relational algebra on available tables.
Methods are dispatched similar to Lisp multi-methods.
Tuple keys can be renamed (follows from relational algebra: 'rename' operation)
(?) Anonymous tables may be common
(?) Arg processing functions
(?) How do we process a method which wants to apply to a table? Does this become part of the formal signature?
(key value) | key | value | (key value value value) | key | value | value | value | ((key value)(key value) (key value)) || key | value || key | value || key | value || ((key value) (key value) (key value)) || key | value | | key | value | | key | value || (tableName (key value value value) (key value value value) (key value value value)) | tableName | | key | value | value | value | | key | value | value | value | | key | value | value | value ||Note the similarity with lisp sexp's.
The Ordering Of Columns/Tuples Specifically Does Not Matter!!!!!
The syntax rotates the tables 90 from how they're usually viewed: a tuple is a column, not a row.
So, how do we deal with foreign keys? Do they need to be explicitly marked? Does making a joined table suffice? (such that the join-table itself maintains the references; as long as the join-table exists (hasn't been garbage collected), the identity remains, and changes to one join key write through to the other) I kind of like the idea of the constraints being applied to the tuples, and not being inherently part of them.
Basically, we realize that we can create any object we like from the tuple containing its state, plus functions which dispatch on that tuple's keys. Not all tables have to be public. They never can be. When I want to update a typical database, the tuple that I provide is private until I issue the 'update' command.
(?) Polymorphism can potentially be handled by the implicit naming of each key. Including if available the specific type of that key, although I don't have reason to believe that this is necessary or desirable.
Other things which come to mind, you get two-way keyword parameter passing, without the hassle of providing the keywords when they're provided implicitly: if the tuple you get matches, you don't need to rename. If you're just passing the parameters through on the other side, you don't need to reference their names.
Relational Theory as Applied By Me :)
Results from relational operations on tuples are to be live.
(?) Duplication of any table will be commonplace, in order to support transactions.
Has this any overlap with TupleSpaces and LindaLanguage?
Not entirely sure... but I don't think so. The TupleSpace page refers to it as a Bag, which basically brings along everything bad about SQL (relational theory is based on Sets of Tuples). I'll look into it a bit more though, thanks for the ref
I don't think it makes it something other than TupleSpace to use sets rather than bags. The same is true of relational databases, after all: people say that SQL/RDBMS bags are bad practice, is all.
Some would say that an 'RDBMS' which uses bags is not a relational database. The theory requires sets in order to have certain properties; if it doesn't, it's something else. Now, you could still potentially say that relational theory is a subset of TupleSpace's; perhaps we finally have a better term for rdbms's which fail to implement relational theory is all. :) -- cwillu
But note that an "RDBMS" which uses bags (which is to say, all of them) nonetheless implements something so close to relational theory that relational theory is widely useful and applied to them; there is not an alternate Bag Theory that pragmatists study instead! (Bags have been studied mathematically, but that's not the point.) This means that the purists basically are full of it if they say anything other than that it is "bad practice"; it's clearly just splitting hairs about angels dancing on pinheads to make a fuss about it "not really being relational", because it's a terminological distinction with no helpful pragmatic applications.
It's the other way around, it's usually easier if bags are allowed, because if they were not, that would put more pressure on SQL programmers and schema designers to really get things right. More often than not on any topic, there's a contrast between "right" and "easy".
And look at the pros and cons argued on the always-use never-use SELECT DISTINCT pages that have been active here this last week. It's not like sticking to sets solves all of your problems, and it's not like it'll make it impossible to get valid work done if you use bags. It's more subtle than that.
Have you ever seen a non-db person (e.g. a salesperson) use a spreadsheet as a lightweight database? They've never even heard of normal forms, right? And if you tell them, they don't care. They want to do what they want to do, not what someone says is the right thing to do. That's one extreme on the topic. The other extreme is much more sparsely represented.
Finally got around to responding to this. I've toyed around with TupleOrientedProgramming too, though I was looking at it from a LindaTupleSpaces perspective. Gave up on it because I couldn't figure out basics like arithmetic and function-calling in a manner that I'd want to use myself. I also tried doing a tables-all-the-way-down TableOrientedProgramming language, but ran into the same problems. Tuples are really annoying when you just want to define '+'.
I'll respond to each point here, because if I did it inline it'd hopelessly ThreadMess this page.
I'm not sure I understand "tuple types" fully. Keys are a property of tables, not tuples. For a one-tuple table, all columns are keys, because they each identify the single member uniquely. A tuple header, as defined by DateAndDarwen, is a set of name/domain pairs. That's probably your best candidate for "tuple type".
Are you familiar with PrologLanguage? This was really the first relational language, and has an incredibly elegant formulation. It doesn't explicitly conform to the RelationalAlgebra, though you can simulate it with it (there're papers on this somewhere on the net). But it has the same MathematicalLogic? + SetTheory base and is declarative, so it comes out as almost an idealization of TupleOrientedProgramming.
Passing as tuples doesn't just facilitate KeywordParameterPassing, it mandates it. I'm not sure this is a good thing. The bootstrap interpreter I've written for my toy language works like that (only because I haven't gotten around to including syntax for defining positional args), and it's really a pain. All the little methods you take for granted - +, -, *, /, let/set, etc. - now need keywords.
Your model does let keywords be specified implicitly when they match the method's formals, but this couples the implementations of several functions together. If you rename a variable in one, you either have to change the tuple definition and every other function that uses it, or you have to change each call site to create a new tuple with the appropriate bindings. You also might run into all the pitfalls of Perl's implicit parameters approach.
Polymorphism of some sort is pretty important. Almost every language offers it for arithmetic operations - otherwise you need separate operators for operating on reals vs. ints. I believe ObjectiveCaml requires this, and its the one stain on an otherwise very clean language.
It looks like you're implementing mutability and sharing. This is tricky, particularly if more than one "derived" relation might be visible at the same time in the program. Read TheThirdManifesto if you haven't already, and see my comments in the middle of DateAndDarwensTypeSystem. It works (heck, most current programming languages do it), but you end up giving up certain guarantees in exchange for aliasing.
LindaTupleSpaces use bags instead of sets because they were initially designed as a concurrency system. It has to be this way, to accommodate various synchronization primitives. Remember, tuples in Linda are "consumed" when a process uses them (well, there's also a "read" primitive that doesn't consume them, but the multithreading aspects are based on this consumption). The number of identical tuples in the tuple space dictates the number of processes that can access a given piece of data in the TupleSpace. If they used sets, you could have only linear computation (it'd be reduced to ContinuationPassingStyle), which eliminates any parallelism and hence their reason for being.
What will basic arithmetic operations look like? One of the strengths of FunctionalProgramming, ImperativeProgramming, and ObjectOrientedProgramming is that basic computation operations fit easily into the framework. I had difficulties with this when I tried doing a pure TableOrientedProgramming language, and even more difficulties with a TupleSpace-based TupleOrientedProgrammingLanguage?. You may succeed where I've failed here, but if you want the language to be "tuples all the way down", it should be able to handle basic computations.
For another nifty feature, how would you represent program code as tables? This would be really cool if you could get it to work - Lisp with the full power of the RelationalModel behind it! But tables aren't well-suited for describing code, because they're meant for holding large quantities of homogeneous data, where the structure (field roles and domains) is more or less the same. Program code is largely heterogenous: any useful language includes functions that take a wide variety of different arguments, composed in different ways. I thought about storing caller/callee relations in relational tables, but there's not enough information to precisely specify. Code naturally ends up tree-structured, and the relational model has a very tough time with that.
-- JonathanTang
A partial answer (and a fairly complete-if-draft write up of one or two sections of what I want to do):
On parameters, primitive operations, and positional vs keyword binding.
...
There are four basic ways to map the values provided to a function to the implementations of that function. [wiki syntax for numbered list intentionally avoided... these terms are referred to by number]
Note that the bindings in (4) need not be identical from invocation to implementation; there must only be a mapping provided from one to the other. This is trivially demonstrated by considering a function which merely delegates to another function, mapping the keywords by hand from the first function's implementation signature to the delegate's invocation signature, effectively mapping the invocation signature of the first to the implementation signature of the delegate.
Any keyword parameter may be converted to a positional keyword (on either the implementation or invocation signatures) by taking a hash of the key, and using it to obtain a total ordering on the keywords. As long as the hash function relies only on a particular key, and not any of the surrounding context (i.e., the ordering of the definitions of the parameters, the time of day, etc), the hash for a given key will be invariant, and therefore sufficient to reliably map keywords to and from their positional versions.
Herein lies the relational theory's objection to positional parameter passing: it by necessity ties invocations to implementations by an accidental detail: the total ordering provided by an arbitrary-but-consistent hash function. This is undesirable on both invocation and implementation sides, as both can be broken by what should be opaque details of the keyword values themselves (the particular construction of the keyword values, or even the order in which keywords are defined in some implementations).
I'd like to emphasize in passing that this can be just as big of a problem to implementations as to invocations. Consider a function (f) which accepts an arbitrary function (g) which itself accepts two keyword parameters (f is therefore a higher-order function). Suppose we have many functions which map to (g), and in particular, at least one (g') which translates the keywords to positional binding on the implementation side. Now we have a case in which it is possible for only some functions to break when function (f) changes (especially in the context of refactoring browsers). The fact that this happens isn't interesting (obvious, really), but it's a case of a function implementation breaking in response to a change in invocation.
The positional-keyword conversion itself can be done such that it doesn't devolve into a bag-based system (where an arbitrary position is required to differentiate otherwise identical values), by effectively translating the keyword to a totally orderable key (i.e., the hash function), and providing a shorthand to represent the 'nth-lowest key' positionally (by position in syntax). This is by necessity dependent on the original keys, and is as such dependent on implementation details, in addition to adding a somewhat inelegant hack feature to the language (the positional shorthand, a single exception in what is otherwise a fairly consistent language). We will deal first with the positional-keyword translation.
The problem can be summed up as an issue caused by the conflict between 'the accidental-if-consistent nature of taking a hash to represent the key' vs 'the wish to avoid repetitive keyword naming, especially for primitive functions'. 'value1=15 + value=2' for addition anyone?
A partial solution is available in the form of requiring all functions to accept a single tuple, and return a single tuple as a result (besides any side-effects caused by the function). Given a lightweight syntax to construct a tuple, this allows the keywords to be implicit in many cases (provided by the results of functions), while still allowing keyword parameters to be provided directly when necessary, without excess overhead.
A second (and orthogonal) solution is possible when we consider the generic nature of many (most?) primitive operations: the two terms in an addition have no real variance in meaning (a+b = b+a), and many cases where this doesn't seem to apply can be converted to a form which does (A/B -> Ax(B^-1) = (B^1)xA). From this we hypothesize that a form of 'positional tuple' can be safely provided, if there is no distinction to be made between parameters (except possibly with regards to efficiency), especially to the function which is to accept it. This implies a couple prerequisites to the use of such a 'positional tuple':
It is my working hypothesis that in the remaining cases, the requirement that keywords be provided will no longer be tedious, rather they will be beneficial to the readability of the resulting code.
Now we turn to the nature of the 'shorthand' mentioned earlier, used to provide 'positional tuples' (hereafter referred to exclusively as sets) to and from functions. It is simplified greatly by the realization that the sets described above (with the constraints placed on them) are sufficient in general to provide lightweight 'primitive' functions. There is however still the matter that it may not be desirable to use the same syntax for constructing these sets as is used to construct tuples, and likewise for tables; but yet it may not be desirable to provide several similar-but-incompatible syntax's to provide several similar-but-different functions, especially if a single syntax can be provided which is general enough to provide all of these functions (if in a less than lightweight manner). Note, that we want an enforced consistency in syntax, and not an enforced limitation of possibilities.
An so (drumroll please), we introduce macros and the like. Yay for whole new balls of wax.
This is still a wide-open area of research for me. What I'm hoping is that a combination of hygienic-macros plus double-evaluated functions is sufficient for my purposes (i.e., new syntax elements).
The intent of this is, that given a general purpose construction syntax which is sufficient to describe Sets, Tuples, Tables and so forth, and rewrite macros, we can create arbitrary lightweight syntax specialized for each of them. Furthermore, definitions provided in such a way maintain no special status in the language, and thereby avoids (hopefully) many (most? all?) of the issues with having many special cases (I wish to avoid the 'executable line-noise' phenomenon as much as possible). In effect, what would otherwise be 'yet another syntax element' becomes merely 'a standard library function'.
...
In other news, support for MultiMethods provides polymorphism in a natural way. Because methods of dispatched by all of the parameter types/keys, conventional polymorphism is achieved by simply create a new specialization (specifically, on the first parameter, which would be the 'favoured' parameter in other languages). There is still the open issue of what to do with ambiguous invocations. Do we have a preferred ordering? (reintroduces the 'favoured' parameters, although slightly more general) Do we require the programmer to disambiguate? (requires more parameter typing, and/or potentially many more method definitions) Or is there some way for the runtime to determine the best operation to run? (i.e, if a method is ambiguous, it's because either one would be correct (including implementation), and therefore is a performance issue?)
In any case, this is approximately how I'm planning on dealing with basic operations (of course, the proof of the pudding is in the eating), does this sound like it has potential?
Some syntax examples would be very helpful. Your parameter passing method seems very different from anything I have experience with, and I'm having trouble wrapping my head around it. I'm familiar with the 4 mechanisms you list above, but you seem to be going beyond those to come up with hybrids. Conceptually I understand the idea of creating an ordering via a hash function of keyword arguments, but I still can't see how this would work for the programmer.
(Also, there're other mechanisms of parameter passing to consider:
I'm not sure you can count on all args being equivalent for those functions you'd want to invoke positionally. You're asking for full commutativity in any "positional" function, which is a pretty hard property to satisfy. It applies for + and *, but there're obvious exceptions with - and /, and even + and * on matrices don't exhibit it. It does eliminate a whole question though, the PreferredOrderOfSrcDestArguments?. These'd have to be specified explicitly by keyword in your model, which prevents errors but can be somewhat inconvenient.
Treating A - B as A + (B^-1add) and A / B as A * (B^-1mult) is interesting. Basically, you're allowing positional arguments only on algebraic fields. Not all fields, though, as you might have ones where even the + and * operators are non-commutative (like matrices). Though in a matrix, A * B = transpose(B * A), so there's at least a transformation you can perform to get back to the original. I wonder how well this would translate to other operations that need shorthand. I suspect that programmers would balk at a requirement to do subtraction as addition of an inverse, but a good macro system may be able to cover this.
Double-evaluated-functions + FirstClassFunctions + compilation are incompatible. There's a brief discussion of this on RuntimeMacro, and a link to a paper which explains why the Lisp community ditched FEXPRs and NLAMBDAs back in the early 80s. Basically, they make it impossible to predict from the code itself whether a given argument will be evaluated, so the decision of whether to evaluate or package up in a thunk can only be taken at runtime. RebolLanguage does exactly this, which is why Rebol is essentially un-compilable.
I'm finding that a common readability pattern is to have one argument be positional, and all the rest be keywords. You then end up with a verb - direct-object - prepositional phrases construction, where the prepositional phrases can be moved around at will. Some examples from my language (none of which is parsable, yet, but I've constructed the parse trees in my head and it's self-consistent):
if (a < b) then: do-something else: do-something-else map (factor * it) over: numbers-to-multiply map over: input result <- process it print result line <- read from: data-fileI'd really encourage you to come up with concrete syntax, even if you don't have an implementation. I found a lot of semantic difficulties when I tried to actually write programs and think through what the syntax means. As language designers (enlightened ones :)), we like to pretend that syntax doesn't matter, but it does. I find that semantic features that make my eventual infix syntax easier make the prefix syntax I'm developing in much clunkier, so it's a tradeoff between ease of bootstrapping the compiler and parser, and ease for the eventual programmer. And some language features just lead to incredibly clunky syntax - if you can't express it, it's not much use.
I'll respond to the question of positional vs. ambiguous differentiation of MultiMethods on whichever page had that discussion. I've changed my mind about that, mostly because of keyword args. -- JonathanTang
Hmm... Following through on the creation of a concrete syntax, that of function creation, and the consideration that a relational view of code might be desirable. When I define a function, I provide a list of operations that I want to happen (I'm thinking imperatively here). I don't want to have to provide line numbers (nightmares of my first basic programs), but operations are not arbitrarily reorderable. This of course doesn't apply in a pure declarative style (where they are reorderable), and pure functional style (where statements which must be ordered are ordered by nesting).
I do note however, that the elements are required to be of the same type (i.e., they're all language elements of some type). Can I get away with loosening it to something else? I'm getting the sense that an autonumbered set (i.e., a list) is common enough to benefit from a specialized syntax for it. Notably, this syntax would always be resolvable to a table with numbered elements. It could be abused, but I can't justify disallowing something for only that reason.
The previous statements would then change:
A function which can accept a List containing a stack and an integer in that order (for instance) must also accept a List containing two stacks, or two integers, or an integer and stack (i.e., reversed ordering). My old working hypothesis is invalidated, and replaced as such.
In some sense, this would avoid the third-manifesto's objections to positional parameters in the same way; I wouldn't imagine someone creating a function accepting 15 parameters that relied on some property of each of the 15 parameters for purely convenience sake (it be more an intentional abuse of it, which I have no issue with: you can abuse anything).
Given this, constructing functions through function calls (via the same table-based parameters) seems now to be possible, possibly even convenient. I now return to my experimentation with more concrete syntax examples...
-- cwillu
Possible (i.e., pre-alpha draft scrap version) syntax
set definition:
{element1 element2 element3 ...} {key1 element1 element2 ... ; key2 element3 element4 ... ; key3 element5 element6 ... ; } [name1 name2 name3 ---- value1 value2 value3]is equivalent to
[ name1 | value1 name2 | value2 name3 | value3 ]is equivalent to
{name1 value1; name2 value2; name3 value3}
[Value Index ---- value1 1 value2 2 value3 3is equivalent to
{value1; value2; value3}
function definition:
define {name functionName; signature {key1; key2; key3}; definition { operation; operation; operation; }} define [name signature definition ---- functionName {key1; key2; key3} { operation; operation; operation; } + {left; right}%{integer} { return {left + right}%{integer} } multiply {left; right}%{integer} { return {left * right}%{integer} } ]'define' is just another function... the implementation of which would add the parameters to the function table.
define {name define; signature {name; signature; definition}; definition { add {table functionTable; tuple signature} }}pending of course the finalization of however we deal with tables. (I'm not unconvinced that normal function calls isn't an appropriate manner to deal with them though)
-- cwillu
It looks really interesting. Function definition in particular seems quite elegant; I'm also making 'define' just an ordinary function, but it seems to map better to tables than it does to pure functions (I'm making everything, even plain old data structures, into functions. CodeIsData because data is code.)
Couple of things:
Lists could be problematic efficiency-wise if implemented as numbered sets. Inserting an element at the head of a list requires that all the other elements be renumbered. Normally you could chalk this up to an "implementation detail" and just use a list behind the scenes anyway. But if lists are "really" tables, they should act like tables, which means that such a renumbering would have observable side-effects. (This is also the big advantages that special-purpose data structures have over tables: you can make compromises on the ways in which you'll use a data structure, allowing certain optimizations.) There are N pieces of observable data that have to be updated, so the minimum worst-case time complexity possible is O(n) (actually, I think it should be omicron(N), as it's a lower-bound and not an upper bound). This - and the difficulty of representing a tree - is one of my big complaints about tables.
Some good test cases might be to write some simple programs with your language and see how easy they are to write. People use a language for what it makes easy; if it doesn't make anything easy, people won't use it. I suspect you'll do marvelously with RDMS access, since that's what this is for anyway. But it should also be able to handle arithmetic and string operations easily. Some sources for test programs:
-- JonathanTang
The semicolons don't seem to be strictly necessary, except for two things: they seem to add readability in some cases, namely when a value is assigned from the result of an operation; and I think ordered set definition might be ambiguous with tuple definition otherwise.
I'm still not entirely sure how I want to deal with tuples. Function calls seem to suffice for selection and the like, but the actual reading/writing operations get a wee bit cumbersome. Of course, I'm also thinking that the function name should also be a keyword. This would clean up a few remaining operations, but we'll see.
In any case, the following is what a SieveOfEratosthenes might look like, as well as a for-loop:
sieveOfEratosthenes {forSize} { create {table table keys {value; isPrime}} for{index index = 2 to n do { add{to table row {value index isPrime unknown}} }} sieveOfEratosthenes table } sieveOfEratosthenes {value; isPrime} { if {unknown isPrime then { set{ value select{from table where equals{ %{element.value value} 0}} = false } set{value isPrime = true} }} } for {index; =; to; do} { add{to parameters column do = 1} for parameters } for {index; =; to; step; do} { rename{in parameters from = to equals} for parameters } for {index; equals; to; step; do} { apply {value {index equals} to do} if {true ={equals to} then { ++ {equals} goto for parameters }} } ]I'm treating 'goto' as an explicit tail-call... equivalent to call/cc if the function being called accepts a function and the caller provides it. 'parameters' is bound to the tuple of parameters; this can be shortened to some arbitrary symbol (you could make ExecutableLineNoise? if you wanted to... please don't make my language your bitch :p).
<...incorrect code snippet removed>
There's a trade-off of implicit function application vs explicit via alternate construction vs explicit via additional syntax (i.e. a 'funcall' operation). I think an implicit use of the function name as a keyword is workable, but at the same time strikes me as a rather odd 'feature' (read 'hack'). I like smalltalk's method (although I wouldn't really call it a distinct method, more like a variant), although it requires the use of separate variable names in order to distinguish a one-arg method from a no-arg method.
Of course, such a 'no-arg method' is actually a one-arg method when one takes into account the object reference, perhaps there is no possible use for a method which takes literally no parameters. If it's doing anything, it'll be doing it to some context; at some point somebody may (and therefore I should assume 'will') want to change that. <some time later...> And looking at the new syntax, I've changed my mind (for now at least).
Interesting note; you can optomize constraints, namely, you can optomize when they're evaluated. From the text of a constraint, it is easily determined when certain actions can have no possible bearing on that constraint to varying degrees of conservativeness. Vaguely in the sense of bound vs free variables in a function.
And this is where I find things get interesting. Consider a unit test. It's a constraint which has no variables at all; the interface is simply 'runTest()', all required state is set up by the setup code, and is ideally sandboxed. This means that the only variable in a test is the implementation of the methods which it calls.
Therefore, you can define constraints which also have no reliance on anything except particular methods. Because they don't reference existing tables, normal operations (including using the methods they test) will have no bearing on them, and so they should be elided even by a conservative optimizer. The only time they need to be checked is when the methods themselves are changed; the only time they need to be checked is when the tables which represent the methods are edited.
Which kinda makes sense when you think about it: unit tests are constraints on the implementation of a method. --WilliamUnderwood
See also: TableOrientedProgramming, MultiParadigmDatabase