Tuple Oriented Programming

Direct language connection with relational database if available.

Not all state is necessarily stored in database (temp vars, etc)

All variables are tables

All variables have types, don't they? Those types are the column names/tuple keys.
If a variable type (and name even) is never used dynamically, it can be optimized to a hash value

Methods receive exactly one tuple for all parameters, and return exactly one tuple.

This tuple is created directly, or by performing relational algebra on available tables.

Methods are dispatched similar to Lisp multi-methods.

Methods are dispatched by method name, and tuple keys (tuple type is error detection, no overloading?)
(?) Methods may be dispatched from an arbitrary (but accessible) table

Still supports anonymous methods (lambdas), formal args must match supplied tuple

Tuple keys can be renamed (follows from relational algebra: 'rename' operation)

(?) Anonymous tables may be common

(?) Arg processing functions

tuple is passed unprocessed to function
- tuple is preprocessed... functions in the tuple are processed first, then the outer function. Do these 'special forms' weaken security?

A method applied to a table (set of tuples) is applied once to each tuple in unspecified order.

(?) How do we process a method which wants to apply to a table? Does this become part of the formal signature?

A function invocation environment is shared when applied to multiple tuples due to it being applied to a table. I.e., a function can share state between applications to multiple tuples in a single invocation.
(?) What syntax should be used for this?

(key value)
| key | value |

(key value value value)
| key | value | value | value |

((key value)(key value) (key value))
|| key | value || key | value || key | value ||

((key value)
 (key value)
 (key value))
|| key | value |
    | key | value |
    | key | value ||

(tableName 
(key value value value)
(key value value value)
(key value value value))
| tableName |
| key | value | value | value |
| key | value | value | value |
| key | value | value | value ||

Note the similarity with lisp sexp's.

They _aren't_ specified as cons-cell based, although they could be implemented that way (perhaps local variables would be implemented that way).
Cons, car and cdr could be provided, but they could be provided for any collection structure.
- The order that elements would be returned would be undefined, except that all of them would eventually be returned given sufficient calls to cdr.

The Ordering Of Keys Specifically Does Not Matter!!!!!

The Ordering Of Columns/Tuples Specifically Does Not Matter!!!!!

The syntax rotates the tables 90 from how they're usually viewed: a tuple is a column, not a row.

(?) This is arbitrary, and an alternate syntax can probably be provided to accommodate the other view

How do we deal with primary keys?

(anecdotal) All computer issues at work are caused by either indexing errors, duplicate records or...
(anecdotal) Major security flaws / issues are caused by records which contain invalid foreign keys
- A couple stores I do service for have a habit of deleting my user account: "He doesn't work here [on a daily basis], why does he need an account?" This makes correcting many issues a pain, as I have to hack my way through encrypted database files to restore my security levels before I can use any of the purpose built tools, or I have to hack the configuration files by hand. Neither is fun. So, I added an entry to the encrypted password file, which specifies an invalid employee id. The program still accepts the password and associated rights. (Btw, I have the owner's + regional manager's blessing to do this.) And it can't be deleted by any tools other than the raw editors, which aren't stored on disk. I love my job sometimes. :)

A primary key is such that it's inclusion into another table as a foreign key is including that actual value: changing the value in either table causes all references to it to change as well.

Can we define a primary key as one which is referred to by other tables, and therefore determine which keys are primary by that condition?
Do we lose any arbitrary-query abilities by defining it as such?

Ahhhhhhh.... after reading CostinCozianu's excellent explanations in SelectDistinctIsaCodeSmell (to the contrary of the page title): assuming that a table is a Set of Tuples as opposed to a Bag of Tuples, primary keys need not be explicitly defined: they are generated implicitly, by the premise that each tuple be unique; and therefore any set of keys which can uniquely identify each tuple could be a primary key.

So, how do we deal with foreign keys? Do they need to be explicitly marked? Does making a joined table suffice? (such that the join-table itself maintains the references; as long as the join-table exists (hasn't been garbage collected), the identity remains, and changes to one join key write through to the other) I kind of like the idea of the constraints being applied to the tuples, and not being inherently part of them.

Basically, we realize that we can create any object we like from the tuple containing its state, plus functions which dispatch on that tuple's keys. Not all tables have to be public. They never can be. When I want to update a typical database, the tuple that I provide is private until I issue the 'update' command.

(?) Polymorphism can potentially be handled by the implicit naming of each key. Including if available the specific type of that key, although I don't have reason to believe that this is necessary or desirable.

Other things which come to mind, you get two-way keyword parameter passing, without the hassle of providing the keywords when they're provided implicitly: if the tuple you get matches, you don't need to rename. If you're just passing the parameters through on the other side, you don't need to reference their names.

: Interesting Sidenote: Ever stop and realize that the method implementation using a bound variable is the flip-side of keyword parameters? Only occurred to me after reading some of the common Perl pitfalls (the formal parameter names in functions, or lack there of).

(?) Implements FacetPattern cleanly; because a method accepts exactly the tuple it needs to do its job, one nearly has to go out of one's way to give more authority than requested (i.e., pass a tuple which contains more keys than the method uses).Incidentally, this is how optional parameters could be handled, although I'm leaning to an implementation which flags excess keys as an error, and functions with 'optional' parameters handled by having multiple functions with the same name but varying signatures. This would make FacetPattern implicit with every function, making it possible-but-difficult to get around.

Relational Theory as Applied By Me :)

Results from relational operations on tuples are to be live.

Modifications to the original tuples show up in the result sets.
Modifications to the results of an operation show up in the original tables.

Some operations are difficult to do this with;

Rename: Trivial
Product: Requires a division implicitly to resolve changes back to the original tables. Mainly troublesome if they don't divide evenly into the original tables (i.e., adding a single tuple to a non-trivial product). This can be partially handled by making a 'remainder table' accessible, to which tuples are added by default until enough other tuples with appropriate contents are added to make the division possible in a straight-forward fashion. It is important that this be done efficiently, or be avoided in the general case.
Selection: Added tuples are required to pass the predicate which created the selection (This might of itself be troublesome), at which point they're added to the original table (which may as always reject the tuple). Depending on the nature of the supplied/allowable predicate(s), it might be necessary to recompute the entire selection at this point (i.e., a selection which will specify the top ten tuples sorted by some element).
Projection: Trivially impossible: a tuple can only be added to a table if it contains the exact same key set. A projection of a table will not have the same key table as the original table. Hence, no tuple can be added to a projection
Division: Similar issues (and resolution) to product.
Joins and Computed Relations: Their issues are only ones of efficiency, as they can be derived from the primitive operations already specified
- Join: Product followed by selection by equal keys. Tuples may only be added when they contain all keys from both original tables. If the join operation is properly implemented (a user could implement his own even with one provided), none of the constraints of additions to selections will apply.
- Computed Relations: Trivial, as they are equivalent (and will likely be implemented as) a join of an already existing table with a newly generated table, with at least one key common between them.

Efficient mutability of tuples: We can 'mutate' any tuple by simply deleting it and adding a replacement tuple. Assuming that operations are implemented lazily where possible (only updating when they are queried directly, including queries by listeners), this is not a terribly inefficient way of guaranteeing the operation tables are updated appropriately. Really, if the tuples support listeners (this should be a language feature), there's nothing requiring even this... they could simply notify their containing tables, which do the rest of the notification work. Also, predicates which don't maintain state in a run (and therefore would never require reprocessing of more than one tuple for any notification) can be detected, and we therefore make some additional efficiency gains when actually reprocessing an operation.

(?) Duplication of any table will be commonplace, in order to support transactions.

Such a cloned table will provide the functionality to apply any changes made to the table from which it was copied. This application of changes is not guaranteed to succeed, but it might be possible to guarantee that such an application would be successful within a finite number of attempts (see papers relating to WaitFreeSynchronization). I'm not sure this is feasible, and there's much to be done before a decision on this must be made.
'Query transactions', where we merely want a consistent state, are support by merely dropping the duplicated table without attempting to apply it. The data may be stale, but this wouldn't matter to a correct program; any decisions to be made based on this data would have done their modifications by applying the modified table, or would already be relying on querying again at some point (i.e., polling).

Re:

Computed relations: not as trivial as I thought, due to the complications of making every table live.
Projection updating: ditto, mainly in the case of a computed relation: it is the source table which would generally fail an addition to a projection. If the projection is removing a computed relation, then the keys will match with the parent table (i.e., 1 + 2 - 2 = 1 <=> table A + computed relation - computer relation = table A)

A function applied to a table can mutate that table, as well as return a table (populated by each tuple returned by applying the function to each tuple in the original table). The returned table is not live with respect to the original table; (?) the relational operators are sufficient to perform any operation one would wish to do to whole tables. Changes made to the returned table can be applied however to the original table, if those additions are compatible with the original table (i.e., the changes must not break any constraints, and in the case of an added tuple, the keys must match

-- WilliamUnderwood

Has this any overlap with TupleSpaces and LindaLanguage?

Not entirely sure... but I don't think so. The TupleSpace page refers to it as a Bag, which basically brings along everything bad about SQL (relational theory is based on Sets of Tuples). I'll look into it a bit more though, thanks for the ref

I don't think it makes it something other than TupleSpace to use sets rather than bags. The same is true of relational databases, after all: people say that SQL/RDBMS bags are bad practice, is all.

Some would say that an 'RDBMS' which uses bags is not a relational database. The theory requires sets in order to have certain properties; if it doesn't, it's something else. Now, you could still potentially say that relational theory is a subset of TupleSpace's; perhaps we finally have a better term for rdbms's which fail to implement relational theory is all. :) -- cwillu

But note that an "RDBMS" which uses bags (which is to say, all of them) nonetheless implements something so close to relational theory that relational theory is widely useful and applied to them; there is not an alternate Bag Theory that pragmatists study instead! (Bags have been studied mathematically, but that's not the point.) This means that the purists basically are full of it if they say anything other than that it is "bad practice"; it's clearly just splitting hairs about angels dancing on pinheads to make a fuss about it "not really being relational", because it's a terminological distinction with no helpful pragmatic applications.

It's the other way around, it's usually easier if bags are allowed, because if they were not, that would put more pressure on SQL programmers and schema designers to really get things right. More often than not on any topic, there's a contrast between "right" and "easy".

And look at the pros and cons argued on the always-use never-use SELECT DISTINCT pages that have been active here this last week. It's not like sticking to sets solves all of your problems, and it's not like it'll make it impossible to get valid work done if you use bags. It's more subtle than that.

Have you ever seen a non-db person (e.g. a salesperson) use a spreadsheet as a lightweight database? They've never even heard of normal forms, right? And if you tell them, they don't care. They want to do what they want to do, not what someone says is the right thing to do. That's one extreme on the topic. The other extreme is much more sparsely represented.

: I have. They don't care. And it usually comes and bites them in the ass. At this point, they just want me to fix it so it works. Using sets eliminates a class of "It doesn't work!"s. Of course it's more subtle than 'impossible to get valid/invalid'. And most of the 'cons' provided on the various pages about select distinct are either confused about relational theory, or are in complete agreement (joins including candidate keys are automatically 'distinct').
: This bears repeating in different words: If you don't use Select Distinct, then you're either reimplementing it, or creating a bug!!
: Now, there's not necessarily anything wrong with reimplementing it (it may be necessary for any number of reasons), but that doesn't change the fact of it.
: -- WilliamUnderwood

Hey, maybe you could put a summary at the top of those SELECT DISTINCT pages. They were too long for me to wade through completely, but in the back of my mind I wondered if I was missing some interesting argument...I thought I saw stuff that was just outright wrong, but it seemed like I'd have to read slowly and carefully first to make sure that I wasn't repeating what had already been said.

: Already done, see AlwaysUseSelectDistinct. An even shorter summary is as follows: A Join of two Bags of Tuples can have duplicates in exactly two cases: One - there was a duplicate in the original table, and so you shouldn't remove the record. Two - the join is malformed, and so the duplicates may be safely removed. If you follow the constraints provided in relational theory (i.e., sets, not bags), the first case can't happen, and so you can always safely remove the duplicates. -- cwillu

Finally got around to responding to this. I've toyed around with TupleOrientedProgramming too, though I was looking at it from a LindaTupleSpaces perspective. Gave up on it because I couldn't figure out basics like arithmetic and function-calling in a manner that I'd want to use myself. I also tried doing a tables-all-the-way-down TableOrientedProgramming language, but ran into the same problems. Tuples are really annoying when you just want to define '+'.

I'll respond to each point here, because if I did it inline it'd hopelessly ThreadMess this page.

I'm not sure I understand "tuple types" fully. Keys are a property of tables, not tuples. For a one-tuple table, all columns are keys, because they each identify the single member uniquely. A tuple header, as defined by DateAndDarwen, is a set of name/domain pairs. That's probably your best candidate for "tuple type".

Are you familiar with PrologLanguage? This was really the first relational language, and has an incredibly elegant formulation. It doesn't explicitly conform to the RelationalAlgebra, though you can simulate it with it (there're papers on this somewhere on the net). But it has the same MathematicalLogic? + SetTheory base and is declarative, so it comes out as almost an idealization of TupleOrientedProgramming.

Passing as tuples doesn't just facilitate KeywordParameterPassing, it mandates it. I'm not sure this is a good thing. The bootstrap interpreter I've written for my toy language works like that (only because I haven't gotten around to including syntax for defining positional args), and it's really a pain. All the little methods you take for granted - +, -, *, /, let/set, etc. - now need keywords.

Your model does let keywords be specified implicitly when they match the method's formals, but this couples the implementations of several functions together. If you rename a variable in one, you either have to change the tuple definition and every other function that uses it, or you have to change each call site to create a new tuple with the appropriate bindings. You also might run into all the pitfalls of Perl's implicit parameters approach.

Polymorphism of some sort is pretty important. Almost every language offers it for arithmetic operations - otherwise you need separate operators for operating on reals vs. ints. I believe ObjectiveCaml requires this, and its the one stain on an otherwise very clean language.

It looks like you're implementing mutability and sharing. This is tricky, particularly if more than one "derived" relation might be visible at the same time in the program. Read TheThirdManifesto if you haven't already, and see my comments in the middle of DateAndDarwensTypeSystem. It works (heck, most current programming languages do it), but you end up giving up certain guarantees in exchange for aliasing.

LindaTupleSpaces use bags instead of sets because they were initially designed as a concurrency system. It has to be this way, to accommodate various synchronization primitives. Remember, tuples in Linda are "consumed" when a process uses them (well, there's also a "read" primitive that doesn't consume them, but the multithreading aspects are based on this consumption). The number of identical tuples in the tuple space dictates the number of processes that can access a given piece of data in the TupleSpace. If they used sets, you could have only linear computation (it'd be reduced to ContinuationPassingStyle), which eliminates any parallelism and hence their reason for being.

What will basic arithmetic operations look like? One of the strengths of FunctionalProgramming, ImperativeProgramming, and ObjectOrientedProgramming is that basic computation operations fit easily into the framework. I had difficulties with this when I tried doing a pure TableOrientedProgramming language, and even more difficulties with a TupleSpace-based TupleOrientedProgrammingLanguage?. You may succeed where I've failed here, but if you want the language to be "tuples all the way down", it should be able to handle basic computations.

For another nifty feature, how would you represent program code as tables? This would be really cool if you could get it to work - Lisp with the full power of the RelationalModel behind it! But tables aren't well-suited for describing code, because they're meant for holding large quantities of homogeneous data, where the structure (field roles and domains) is more or less the same. Program code is largely heterogenous: any useful language includes functions that take a wide variety of different arguments, composed in different ways. I thought about storing caller/callee relations in relational tables, but there's not enough information to precisely specify. Code naturally ends up tree-structured, and the relational model has a very tough time with that.

-- JonathanTang

A partial answer (and a fairly complete-if-draft write up of one or two sections of what I want to do):

On parameters, primitive operations, and positional vs keyword binding.

...

There are four basic ways to map the values provided to a function to the implementations of that function. [wiki syntax for numbered list intentionally avoided... these terms are referred to by number]

(1) Positional binding of invocation parameters to keyword bindings of the implementation.

: (function arg1 arg2) -> (define function (name1 name2) (implementation using name1, name2))
: Commonly used in java, C, Lisp (common simple functions)

(2) Positional binding of invocation parameters to implementation positional bindings

: (function arg1 arg2) -> (define function parameters (implementation using parameters[1], parameters[2]))
: Commonly used in Perl, Lisp (application of a list to a function, with the function defined with REST parameters)

(3) Keyword binding of invocation parameters to implementation positional bindings

: (function (name1 arg1) (name2 arg2)) -> (define function parameters (implementation using parameters[1], parameters[2])
: Not commonly used, perhaps in functions defined by macros.

(4) Keyword binding of invocation parameters to implementation keyword bindings

: (function (name1 arg1) (name2 arg2)) -> (define function (name1 name2) (implementation using name1, name2))
: Commonly used in SmallTalk, more complex Lisp functions

(1) is what most programmers are familiar with. IMO, this is at least partially because of the overhead of providing all the keywords at every invocation point, as well as every implementation point (i.e., polymorphism).

Note that the bindings in (4) need not be identical from invocation to implementation; there must only be a mapping provided from one to the other. This is trivially demonstrated by considering a function which merely delegates to another function, mapping the keywords by hand from the first function's implementation signature to the delegate's invocation signature, effectively mapping the invocation signature of the first to the implementation signature of the delegate.

Invocation Signature: The keywords (and types if necessary) of each parameter that need be included to call a function.
Implementation Signature: The keywords (and types...) of each parameter that need to be included by the implementation to access those parameters (i.e., your typical java method declaration).

Further more, the position binding methods (2) and (3) may be converted to keyword versions, and vice verse ((1) and (4) to positional). The demonstration of this is a bit more complicated than the former 'signature mapping'; note that these conversions are deprecated in the relational model (and specifically outlawed in the third manifesto), but an understanding on the method is necessary for further work (i.e., resolving this with the set-based/anti-positional' relational theory).

Any keyword parameter may be converted to a positional keyword (on either the implementation or invocation signatures) by taking a hash of the key, and using it to obtain a total ordering on the keywords. As long as the hash function relies only on a particular key, and not any of the surrounding context (i.e., the ordering of the definitions of the parameters, the time of day, etc), the hash for a given key will be invariant, and therefore sufficient to reliably map keywords to and from their positional versions.

Herein lies the relational theory's objection to positional parameter passing: it by necessity ties invocations to implementations by an accidental detail: the total ordering provided by an arbitrary-but-consistent hash function. This is undesirable on both invocation and implementation sides, as both can be broken by what should be opaque details of the keyword values themselves (the particular construction of the keyword values, or even the order in which keywords are defined in some implementations).

I'd like to emphasize in passing that this can be just as big of a problem to implementations as to invocations. Consider a function (f) which accepts an arbitrary function (g) which itself accepts two keyword parameters (f is therefore a higher-order function). Suppose we have many functions which map to (g), and in particular, at least one (g') which translates the keywords to positional binding on the implementation side. Now we have a case in which it is possible for only some functions to break when function (f) changes (especially in the context of refactoring browsers). The fact that this happens isn't interesting (obvious, really), but it's a case of a function implementation breaking in response to a change in invocation.

The positional-keyword conversion itself can be done such that it doesn't devolve into a bag-based system (where an arbitrary position is required to differentiate otherwise identical values), by effectively translating the keyword to a totally orderable key (i.e., the hash function), and providing a shorthand to represent the 'nth-lowest key' positionally (by position in syntax). This is by necessity dependent on the original keys, and is as such dependent on implementation details, in addition to adding a somewhat inelegant hack feature to the language (the positional shorthand, a single exception in what is otherwise a fairly consistent language). We will deal first with the positional-keyword translation.

The problem can be summed up as an issue caused by the conflict between 'the accidental-if-consistent nature of taking a hash to represent the key' vs 'the wish to avoid repetitive keyword naming, especially for primitive functions'. 'value1=15 + value=2' for addition anyone?

A partial solution is available in the form of requiring all functions to accept a single tuple, and return a single tuple as a result (besides any side-effects caused by the function). Given a lightweight syntax to construct a tuple, this allows the keywords to be implicit in many cases (provided by the results of functions), while still allowing keyword parameters to be provided directly when necessary, without excess overhead.

A second (and orthogonal) solution is possible when we consider the generic nature of many (most?) primitive operations: the two terms in an addition have no real variance in meaning (a+b = b+a), and many cases where this doesn't seem to apply can be converted to a form which does (A/B -> Ax(B^-1) = (B^1)xA). From this we hypothesize that a form of 'positional tuple' can be safely provided, if there is no distinction to be made between parameters (except possibly with regards to efficiency), especially to the function which is to accept it. This implies a couple prerequisites to the use of such a 'positional tuple':

The 'positionalness' must be symmetrical: the must be no reason for either the invoker or the implementer to require a specific order in order to ensure correctness. One could in theory randomly shuffle the ordering without having any effect on the correctness of the final result.
(although implied by 1, this bears repeating explicitly) Every element in the tuple must support the same types, and a function must not depend on the total number of elements of a given type. If a function accepts a 'positional tuple' containing a stack and an integer, it must also accept one containing two stacks, or two integers.

Note, that the can simplify the above somewhat by stating that if a function can accept a 'set' of size 'n' containing types from the set 't', then it must accept any set of size 'n' containing any combination of types from set 't'. This renders the 'positional' nature of the set (formerly 'positional tuple') irrelevant, and neatly sidesteps the objections of relational theory to positional based operations. This remains in the spirit of relational theory (even if not the letter of it); tuples are already dealt with as such. The order of tuples in a table is irrelevant to all relational operators: more specifically, selection and projection operate by applying an arbitrary predicate to every tuple in the set, and construct the result from the results of the predicate. Any ordering maintained as well as any knowledge of the nature of the predicate used matters only as an implementation detail for purposes of efficiency.

Tuple: A set of (key, type [optional?], value) pairs.
Positional Tuple: A fancy word for set, with some addition constraints such that the 'positional'ness doesn't really matter.

Note that everything in the above applies to the result tuple of a function as well as the parameter to a function. Invokers of a function which returns a set must be prepared for any combination of valid types in the result.

It is my working hypothesis that in the remaining cases, the requirement that keywords be provided will no longer be tedious, rather they will be beneficial to the readability of the resulting code.

Now we turn to the nature of the 'shorthand' mentioned earlier, used to provide 'positional tuples' (hereafter referred to exclusively as sets) to and from functions. It is simplified greatly by the realization that the sets described above (with the constraints placed on them) are sufficient in general to provide lightweight 'primitive' functions. There is however still the matter that it may not be desirable to use the same syntax for constructing these sets as is used to construct tuples, and likewise for tables; but yet it may not be desirable to provide several similar-but-incompatible syntax's to provide several similar-but-different functions, especially if a single syntax can be provided which is general enough to provide all of these functions (if in a less than lightweight manner). Note, that we want an enforced consistency in syntax, and not an enforced limitation of possibilities.

An so (drumroll please), we introduce macros and the like. Yay for whole new balls of wax.

This is still a wide-open area of research for me. What I'm hoping is that a combination of hygienic-macros plus double-evaluated functions is sufficient for my purposes (i.e., new syntax elements).

(?) hygienic-macros: a rewrite rule of a supplied block of code, such that (among other things) the macro cannot create references to variables to explicitly passed to the macro, which is performed at read-time.
double-evaluated functions: a function which has the ability to avoid execution of any parameters passed to it. Although this could so far as to provide the code itself to the function (and thus becoming equivalent in power to the above defined macros), providing merely the option of executing the code (via a zero-arg lambda) would seem to be preferable. This would allow the generation of control structures without the security issues arising from allowing potentially untrusted code access to potentially sensitive code/data definitions, and at the same time relieves the invoker from needing to wrap parameters in lambdas to make use of function based control structures and relieve paranoia.

I have a sense that using hygienic macros for code transformation is safer than allowing functions to perform these rewrites, as the macros can be limited in context; the results of an untrusted macro can be checked before it is executed, whereas a function already has control of execution when it performs a rewrite.

The intent of this is, that given a general purpose construction syntax which is sufficient to describe Sets, Tuples, Tables and so forth, and rewrite macros, we can create arbitrary lightweight syntax specialized for each of them. Furthermore, definitions provided in such a way maintain no special status in the language, and thereby avoids (hopefully) many (most? all?) of the issues with having many special cases (I wish to avoid the 'executable line-noise' phenomenon as much as possible). In effect, what would otherwise be 'yet another syntax element' becomes merely 'a standard library function'.

...

In other news, support for MultiMethods provides polymorphism in a natural way. Because methods of dispatched by all of the parameter types/keys, conventional polymorphism is achieved by simply create a new specialization (specifically, on the first parameter, which would be the 'favoured' parameter in other languages). There is still the open issue of what to do with ambiguous invocations. Do we have a preferred ordering? (reintroduces the 'favoured' parameters, although slightly more general) Do we require the programmer to disambiguate? (requires more parameter typing, and/or potentially many more method definitions) Or is there some way for the runtime to determine the best operation to run? (i.e, if a method is ambiguous, it's because either one would be correct (including implementation), and therefore is a performance issue?)

In any case, this is approximately how I'm planning on dealing with basic operations (of course, the proof of the pudding is in the eating), does this sound like it has potential?

-- WilliamUnderwood

Some syntax examples would be very helpful. Your parameter passing method seems very different from anything I have experience with, and I'm having trouble wrapping my head around it. I'm familiar with the 4 mechanisms you list above, but you seem to be going beyond those to come up with hybrids. Conceptually I understand the idea of creating an ordering via a hash function of keyword arguments, but I still can't see how this would work for the programmer.

(Also, there're other mechanisms of parameter passing to consider:

PatternMatching including the function name, as in PrologLanguage
PatternMatching without the function name, as in HaskellLanguage, ObjectiveCaml, and ErlangLanguage
PredicateDispatching, as in CecilLanguage. This includes most other methods as special cases, according to the papers.
KeywordParameterPassing where the keyword is part of the method name, as in SmallTalk
Function executes when parameters have values, as in DataFlowProgramming
PrototypeBasedProgramming where a function definition is a prototype that is copied and variables reset for each invocation. I may have invented this, though obviously it's derived from SelfLanguage and similar to the mechanism in BetaLanguage. I discarded it for my own language because it's cumbersome for the programmer and seems inefficient, though it has some nifty advantages, like easy PartialApplication? and Memoization, along with reified ActivationRecords.

Whether or not any of these are applicable to your language is something for you to decide.) [Funny, we've independently invented this then, as it was the way we were going to deal with a couple things in BlueAbyss... oh well, nothing new under the sun, eh? :) -- cwillu]

I'm not sure you can count on all args being equivalent for those functions you'd want to invoke positionally. You're asking for full commutativity in any "positional" function, which is a pretty hard property to satisfy. It applies for + and *, but there're obvious exceptions with - and /, and even + and * on matrices don't exhibit it. It does eliminate a whole question though, the PreferredOrderOfSrcDestArguments?. These'd have to be specified explicitly by keyword in your model, which prevents errors but can be somewhat inconvenient.

Treating A - B as A + (B^-1add) and A / B as A * (B^-1mult) is interesting. Basically, you're allowing positional arguments only on algebraic fields. Not all fields, though, as you might have ones where even the + and * operators are non-commutative (like matrices). Though in a matrix, A * B = transpose(B * A), so there's at least a transformation you can perform to get back to the original. I wonder how well this would translate to other operations that need shorthand. I suspect that programmers would balk at a requirement to do subtraction as addition of an inverse, but a good macro system may be able to cover this.

Double-evaluated-functions + FirstClassFunctions + compilation are incompatible. There's a brief discussion of this on RuntimeMacro, and a link to a paper which explains why the Lisp community ditched FEXPRs and NLAMBDAs back in the early 80s. Basically, they make it impossible to predict from the code itself whether a given argument will be evaluated, so the decision of whether to evaluate or package up in a thunk can only be taken at runtime. RebolLanguage does exactly this, which is why Rebol is essentially un-compilable.

I'm finding that a common readability pattern is to have one argument be positional, and all the rest be keywords. You then end up with a verb - direct-object - prepositional phrases construction, where the prepositional phrases can be moved around at will. Some examples from my language (none of which is parsable, yet, but I've constructed the parse trees in my head and it's self-consistent):

  if (a < b) then:
do-something
  else:
do-something-else

  map (factor * it) over: numbers-to-multiply

  map over: input
result <- process it
print result

  line <- read from: data-file

I'd really encourage you to come up with concrete syntax, even if you don't have an implementation. I found a lot of semantic difficulties when I tried to actually write programs and think through what the syntax means. As language designers (enlightened ones :)), we like to pretend that syntax doesn't matter, but it does. I find that semantic features that make my eventual infix syntax easier make the prefix syntax I'm developing in much clunkier, so it's a tradeoff between ease of bootstrapping the compiler and parser, and ease for the eventual programmer. And some language features just lead to incredibly clunky syntax - if you can't express it, it's not much use.

I'll respond to the question of positional vs. ambiguous differentiation of MultiMethods on whichever page had that discussion. I've changed my mind about that, mostly because of keyword args. -- JonathanTang

Hmm... Following through on the creation of a concrete syntax, that of function creation, and the consideration that a relational view of code might be desirable. When I define a function, I provide a list of operations that I want to happen (I'm thinking imperatively here). I don't want to have to provide line numbers (nightmares of my first basic programs), but operations are not arbitrarily reorderable. This of course doesn't apply in a pure declarative style (where they are reorderable), and pure functional style (where statements which must be ordered are ordered by nesting).

I do note however, that the elements are required to be of the same type (i.e., they're all language elements of some type). Can I get away with loosening it to something else? I'm getting the sense that an autonumbered set (i.e., a list) is common enough to benefit from a specialized syntax for it. Notably, this syntax would always be resolvable to a table with numbered elements. It could be abused, but I can't justify disallowing something for only that reason.

The previous statements would then change:

A function which can accept a List containing a stack and an integer in that order (for instance) must also accept a List containing two stacks, or two integers, or an integer and stack (i.e., reversed ordering). My old working hypothesis is invalidated, and replaced as such.

In some sense, this would avoid the third-manifesto's objections to positional parameters in the same way; I wouldn't imagine someone creating a function accepting 15 parameters that relied on some property of each of the 15 parameters for purely convenience sake (it be more an intentional abuse of it, which I have no issue with: you can abuse anything).

Given this, constructing functions through function calls (via the same table-based parameters) seems now to be possible, possibly even convenient. I now return to my experimentation with more concrete syntax examples...

-- cwillu

Possible (i.e., pre-alpha draft scrap version) syntax

set definition:

{element1 element2 element3 ...}

{key1 element1 element2 ... ;
key2 element3 element4 ... ;
key3 element5 element6 ... ;
}

[name1 name2 name3
----
value1 value2 value3]

is equivalent to

[
name1  | value1
name2  | value2
name3  | value3
]

is equivalent to

{name1 value1; name2 value2; name3 value3}

[Value Index
----
value1 1
value2 2
value3 3

is equivalent to

{value1; value2; value3}

function definition:

define {name functionName; signature {key1; key2; key3}; definition {
operation;
operation;
operation;
}}

define [name signature definition
----
functionName
{key1; key2; key3}
{
operation;
operation;
operation;
}

+
{left; right}%{integer}
{
return {left + right}%{integer}
}

multiply {left; right}%{integer}
{
return {left * right}%{integer}
}
]

'define' is just another function... the implementation of which would add the parameters to the function table.

define {name define; signature {name; signature; definition}; definition {
add {table functionTable; tuple signature}
}}

pending of course the finalization of however we deal with tables. (I'm not unconvinced that normal function calls isn't an appropriate manner to deal with them though)

-- cwillu

It looks really interesting. Function definition in particular seems quite elegant; I'm also making 'define' just an ordinary function, but it seems to map better to tables than it does to pure functions (I'm making everything, even plain old data structures, into functions. CodeIsData because data is code.)

Couple of things:

Lists could be problematic efficiency-wise if implemented as numbered sets. Inserting an element at the head of a list requires that all the other elements be renumbered. Normally you could chalk this up to an "implementation detail" and just use a list behind the scenes anyway. But if lists are "really" tables, they should act like tables, which means that such a renumbering would have observable side-effects. (This is also the big advantages that special-purpose data structures have over tables: you can make compromises on the ways in which you'll use a data structure, allowing certain optimizations.) There are N pieces of observable data that have to be updated, so the minimum worst-case time complexity possible is O(n) (actually, I think it should be omicron(N), as it's a lower-bound and not an upper bound). This - and the difficulty of representing a tree - is one of my big complaints about tables.

Some good test cases might be to write some simple programs with your language and see how easy they are to write. People use a language for what it makes easy; if it doesn't make anything easy, people won't use it. I suspect you'll do marvelously with RDMS access, since that's what this is for anyway. But it should also be able to handle arithmetic and string operations easily. Some sources for test programs:

-- JonathanTang

The semicolons don't seem to be strictly necessary, except for two things: they seem to add readability in some cases, namely when a value is assigned from the result of an operation; and I think ordered set definition might be ambiguous with tuple definition otherwise.

I'm still not entirely sure how I want to deal with tuples. Function calls seem to suffice for selection and the like, but the actual reading/writing operations get a wee bit cumbersome. Of course, I'm also thinking that the function name should also be a keyword. This would clean up a few remaining operations, but we'll see.

In any case, the following is what a SieveOfEratosthenes might look like, as well as a for-loop:

 sieveOfEratosthenes {forSize} {
create {table table keys {value; isPrime}}
for{index index = 2 to n do {
add{to table row {value index isPrime unknown}}
}}

sieveOfEratosthenes table
 }

 sieveOfEratosthenes {value; isPrime} {
        if {unknown isPrime then {
set{
value select{from table where equals{ %{element.value value} 0}}
= false
}
set{value isPrime = true}
}}
 }

 for {index; =; to; do}
 {
add{to parameters column do = 1}
for parameters
 }

 for {index; =; to; step; do}
 {
rename{in parameters from = to equals}
for parameters
 }

 for {index; equals; to; step; do}
 {
apply {value {index equals} to do}

if {true ={equals to} then {
++ {equals}
goto for parameters  
}}
 }
 ]

I'm treating 'goto' as an explicit tail-call... equivalent to call/cc if the function being called accepts a function and the caller provides it. 'parameters' is bound to the tuple of parameters; this can be shortened to some arbitrary symbol (you could make ExecutableLineNoise? if you wanted to... please don't make my language your bitch :p).

<...incorrect code snippet removed>

There's a trade-off of implicit function application vs explicit via alternate construction vs explicit via additional syntax (i.e. a 'funcall' operation). I think an implicit use of the function name as a keyword is workable, but at the same time strikes me as a rather odd 'feature' (read 'hack'). I like smalltalk's method (although I wouldn't really call it a distinct method, more like a variant), although it requires the use of separate variable names in order to distinguish a one-arg method from a no-arg method.

Of course, such a 'no-arg method' is actually a one-arg method when one takes into account the object reference, perhaps there is no possible use for a method which takes literally no parameters. If it's doing anything, it'll be doing it to some context; at some point somebody may (and therefore I should assume 'will') want to change that. <some time later...> And looking at the new syntax, I've changed my mind (for now at least).

-- WilliamUnderwood

Interesting note; you can optomize constraints, namely, you can optomize when they're evaluated. From the text of a constraint, it is easily determined when certain actions can have no possible bearing on that constraint to varying degrees of conservativeness. Vaguely in the sense of bound vs free variables in a function.

And this is where I find things get interesting. Consider a unit test. It's a constraint which has no variables at all; the interface is simply 'runTest()', all required state is set up by the setup code, and is ideally sandboxed. This means that the only variable in a test is the implementation of the methods which it calls.

Therefore, you can define constraints which also have no reliance on anything except particular methods. Because they don't reference existing tables, normal operations (including using the methods they test) will have no bearing on them, and so they should be elided even by a conservative optimizer. The only time they need to be checked is when the methods themselves are changed; the only time they need to be checked is when the tables which represent the methods are edited.

Which kinda makes sense when you think about it: unit tests are constraints on the implementation of a method. --WilliamUnderwood