Homoiconic Definition Take Five

Homoicon definition made precise:

A language is homoiconic when the data types defining its AbstractSyntaxTree are part and parcel of the language (standard, part of the standard library, available implicitly at runtime/compile time for every client developer to play with, just like strings, numbers and other stock data types), and when the language supports AST literals.

For example, in Lisp/Scheme the fundamental AST structure is given by EssExpression. And they both support literals for EssExpressions:

 (define myExpression '(if x (lambda (y) (+ y 1)) (lambda (z) (- z 1)) ) )

The variable myExpression is initialized with a constant denoted by the literal '(...).

Java is not homoiconic, even if you can manipulate code (usign BCEL and other open-source libraries) to accomplish virtually everything, because there is no literal in Java to denote a for loop, or any other code construct.

TCL is not homoiconic, because while it uses strings, its strings are not AST. Once you put the code into a string, you loose the structure of the AST.


This makes Scheme and Lisp homoiconic, while Java, C/C++, Smalltalk, Python non-homoiconic, stringified languages non-homoiconic even if they have eval, etc.

In support of this revised definition.

It's a simple boolean test (no fuzziness in here) and it's quite unambiguous. Other definitions mess things up. For example "code has the same representation as data" -- complete ambiguity. What code (source code, byte code, machine code) and what data representation (constants/literals, data internal representation in memory, external representation, etc) ?

Most of the questions in HomoiconicFaq become obvious for anybody to answer, rather than subject for endless disputes. HomoiconicClassification? becomes clearer. There's a direct relation between the substance of the definition and both advantages and disadvantages of "homoiconicity".

It preserves Lisp and Scheme as the traditional homoiconic languages while it gets rid of TCL, foxpro, and others (bash anyone ? ) who can eval strings. Manipulating code as strings is no fun, and is not what homoiconic is all about, otherwise every language can manipulate strings and use a well packaged library to generate code. There's no big deal about manipulating strings, the big deal is about AST. This preserves the and the intent of the original definition in the face of language evolutions that makes it unclear (byte codes, byte code manipulations, string manipulations, just in time compiling, widespread distribution of eval, etc).

Which is why I said it was "reasonable" -- in those senses. However, this redefines the word to be incompatible with the definition provably in use by others in the field, so I don't see how this can be made to fly. Idiosyncratic definitions hinder rather than aiding communication.

But the old definition is provably bad as it makes every modern language homoiconic by a trivial addition of a library. A definition that does not differentiate is useless anyways. And in the end, who cares ? The term is very much unimportant, the old definition is bad, this definition has all the qualities except being compatible with some old pronouncements. And by the way, other than Raphael Finkel mentioning in passing that TCL is homoiconic and TCL community, who clearly have a conflict of bragging interests, picking up on it, there's nothing else staying in the way of progress. Clear definitions help rather than hinder communications, fuzzy definitions hinder rather than help communication, so the choice is clear, I rest my case, and this subject is closed as far as I am concerned. Take it with a grain of salt, and all that. TakeFive?.

Don't be like that. Consider that the term was invented by the authors of TRAC, a very TCL-like language, it was not invented originally to refer to Lisp. Further consider that I have always refuted the notion that "makes every modern language homoiconic by a trivial addition of a library", and have expressly said frequently that this misunderstands the idea. So if you're done on the topic, it is a very hollow victory indeed. -- Doug

You may refuted for yourself, but all the other persons were unconvinced, endless discussions ensued and the people took it that homiconicity is something on a scale of 0 to 1 or from strong to weak. But under this definition it is crystal clear why neither Java, nor Smalltalk or other languages can be shoehorned into "homoiconic languages". As for TRAC, does TRAC have a notation for AST literals ? I don't know, but looking briefly over the TRAC paper, their "strings" are implicitly more structured then just an array of characters, so TRAC might fit this definition just fine. Or it may not. It's an old forgotten language that never really flied. Now the icon for homoiconicity is LISP. The very specific difference between LISP/Scheme on one hand and Java/Smalltalk/Ruby on the other hand is that the first category has a notation for EssExpressions literals.

In the end, a "debate" over a definition cannot be won by arguments but by usage and acceptance of people. Since the old "definition", already confused enough people, and is, in itself, ambiguous it follows that the old one already lost the battle. This definition can lose the battle as well, which may be interpreted that "homoiconicity" is not an important enough feature to be worth the battle for a good definition. So it was never my intention to claim that this is the definitive definition for homoiconicity, but if anybody wants to think of homoiconicity in no unclear terms, this definition can guide him perfectly well.

If you can get your proposed definition accepted by the world at large, that would be fine by me. Until then, however, as you say, "usage and acceptance" hasn't happened yet.

Also, although yes, there has been lots of confusion, you overlooked a proposal I made to EricHodges just yesterday that might cut through all the confusion while still being backward compatible. [I think I missed it too. What was the proposal you made to EricHodges just yesterday? What page is it on? -- jtg]


Interesting.

#1: It bothers me that this definition excludes TRAC, the language where the term was coined. It doesn't seem possible to build meaninful AbstractSyntaxTrees for TRAC, as parsing is so intimately intertwined with execution. To build a useful AbstractSyntaxTree, the structure of the code has to be parsable before execution. But in TRAC, it's apparently possible to change structure during execution, based on the data.

#2: I think redefining the homoiconic term to be based on AbstractSyntaxTrees requires that you argue that that's what AlanKay really meant when he said "[...] both are �homoiconic� in that their internal and external representations are essentially the same." And that AlanKay was wrong when he acknowledged that TRAC was homoiconic. And that Mooers and Deutsch were mistaken when they coined the term.


I wonder if it would help to look for and list things that TRAC and LISP have (and maybe TCL, FoxPro, etc), that Java, C#, C, C++, FORTRAN, COBOL, etc. don't have. ...things seemingly related to "internal and external representations" being "essentially the same".

Like...

One possible advantage of pursing several independent lines of difference, like the above, is that maybe several of them together define "homioconic." If "homioconic" required "X", "Y" and "Z" attributes, and some language had "X" and "Z", but not "Y", then it wouldn't be "homioconic."


OK, I'll throw another wrench in the works:

With the AbstractSyntaxTree definition of homoiconic, the .Net languages are homoiconic. That is, C#, JScript, and Visual Basic (VB.Net/VbDotNet) are homoiconic.

At http://msdn.microsoft.com/library/default.asp?url=/library/en-us/netstart/html/cpframeworkref_start.asp we find "This section contains reference documentation of the public classes that constitute the .NET Framework, as well as lexicons for other languages employed in the .NET Framework." including the "CodeDOM Quick Reference"

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcodedomquickreference.asp

Microsoft's CodeDom? is standard for the .Net environment, and the C#, JScript, and Visual Basic (VB.Net/VbDotNet) languages. It offers AbstractSyntaxTree access to code, along with translation between that (the "DOM") to and from "string of characters" source code, and to (platform dependent) machine code.

For example, System.CodeDom?.CodeConditionStatement? is an "if" statement, and System.CodeDom?.CodeIterationStatement? is a "for" loop.

They still miss literals, in other words you cannot write something like:

 CodeExpression? myAssignemnt= 'i=10; ;
But other than that it's very cool. Hope Sun will imitate soon wioth java.


Ok, enough for homoiconicity -- too much noise for nothing really important.


[CD containing the song "Take Five" follows.]

The biggest bug of all: the Dave Brubeck version is vastly superior: ASIN B000002AGN (since that ASIN link doesn't work: http://www.amazon.com/exec/obidos/tg/detail/-/B000002AGN/)

"Boasting the first jazz instrumental to sell a million copies, the Paul Desmond-penned "Take Five," Time Out captures the celebrated jazz quartet at the height of both its popularity and its powers. Recorded in 1959, the album combines superb performances by pianist Brubeck, alto saxophonist Desmond, drummer Joe Morrello and bassist Gene Wright. Along with "Take Five," the album features [...]"

And if you're going to buy two albums, ignore Amazon's suggested pairing and get Two of a Mind with Paul Desmond and Gerry Mulligan (http://www.amazon.com/exec/obidos/ASIN/B00008VGMU/)


So somebody claims that TCL strings are abstract syntax tree. This may be true, as the previous claim was made by somebody with superficial knowledge of TCL. To clear things up please show the TCL function calls (or commands) that take a string and perform a traversal of the AST.

That somebody was me. How does the ability to take a string of Tcl code and perform a traversal of the AST (to the limited extent which such a thing even exists for Tcl) prove anything about the Tcl language? I could do that in any Turing-complete language, for input written in any language with a well-defined notion of an AST.

Tcl strings contain all the same information as any other AST for the code would. And they have the side benefit that they can be executed as code (without even using eval). A string is a very awkward way to represent an AST (from the point of view of traversing/updating it), but I fail to see how a string containing code is not a valid representation of a Tcl AST. What am I missing?

The essence of HomoiconicDefinitionTakeFive. That was it: that the language has AST elements as data (either built-in or part of the standard library) and that language has literals for this type of data. Strings in TCL are string literals not AST literals as, for example, LISP lists.


EditText of this page (last edited October 18, 2007) or FindPage with title or text search