[Moved from HomoiconicExampleInJava, hopefully will eventually move to back to HomoiconicLanguages, and hopefully will short-circuit some of the confusion in all these pages including HomoiconicFaq. P.S. thanks to whoever it was that pointed out a week or two ago that FirstClass was important but getting overlooked.]
For a primer see DefinitionOfHomoiconic
The Tcl/Tk site says "The Tcl programming language is an homoiconic-form language. Program and data are both presented as strings. A Tcl procedure's arguments list and body are not an exception to this rule, but the procedure itself is handled as a name bound to a particular couple of arguments list and body. This name lives in a separated namespace and does not collide with variables names."
http://www.tcl.tk/cgi-bin/tct/tip/187.html
Another usage:
"Lisp and Scheme are homoiconic: self-representing", p600 of
"Programming Language Pragmatics" by Michael L. Scott c. 2000
ISBN 1558604421
Compare with RaphaelFinkel in AdvancedProgrammingLanguageDesign:
- "Lisp is homoiconic: Programs and data have the same representation. This property, rarely found in programming languages, allows a Lisp program to create or modify other Lisp functions. As you will see, it also allows the semantics of Lisp to be defined in a particularly simple and concise manner. (Tcl, discussed in Chapter 9, is also homoiconic and enjoys the same benefits.)" (p123)
- "Not only is Eval expressible in Lisp; it is also provided as a predeclared function in every Lisp implementation. Programmers can take advantage of the homoiconic nature of Lisp to construct programs at runtime and then pass them as parameters to Eval." (p138)
- "Much of Lisp's success is due to its homoiconic nature: A program can construct a data structure that it then executes. The semantics of the core of Lisp can be described in just a few pages of a metacircular interpreter." (p145)
- "Prolog provides a similar ability [to write a metacircular interpreter]. As with Lisp, the trick is to make the language homoiconic, that is, to be able to treat programs as data. Programs are just sets of rules (a fact is a rule with a body of true)...So a program is a set of predicates, and Prolog provides a way to inspect, introduce, and delete the predicates that currently make up the program, that is, to treat the program as data." (pp296-297)
- SNOBOL is homoiconic, after a fashion. A program is a string, and it is possible at runtime to compile a string and to branch to a label in it. However, this facility is much less attractive than Lisp's equal treatment of program and data structure." (p319)
- 1.8 Homoiconic Use of Strings: Tcl Several syntax rules in Tcl interact to make it homoiconic. Lists are represented as strings; the individual elements are delimited by white space. Every string names a variable. The R-value of a variable is denoted by $ before the string that represents the variable...Strings need not be delimited by quotes unless they have embedded spaces. There are quotes ({ and }) that prevent any evaluation within a string, quotes (") that allow evaluation, and quotes ([ ]) that force the string to be evaluated. Evaluating a string means treating it as a series of commands delimited by end-of-line characters or semicolons. Each command is the name of a procedure...followed by paramaters. The whole program is a string to be evaluated...To see how Tcl is homoiconic, consider [the following code]" (pp328-329)
set a 4 -- a := 4 1
set rhs {expr $a +} -- rhs := "expr $a +" 2
set rhs [append rhs 5] -- rhs := "expr $a + 5" 3
set b [eval $rhs] -- b := 9 4
set cond {$b > 0} -- cond := "$b > 0" 5
set body { 6
puts "b is now $b" 7
set b [expr $b - 2] 8
} 9
while $cond body 10
- "The condition and the body of the while loop in line 10 are the result of previous computations. Even commands can be computed, as in [code below]" (p329)
set a ile -- a := "ile" 1
wh$a {$b > 0} {set b [expr $b - 2]} 2
- "Line 2 is actually a while command, because the first word evaluates to while." (p329)
Lisp specifies that code and data are both S-expressions, code S-expressions can bet passed/returned/assigned to variables and have contents mutated, so code is FirstClass.
- It is very important to note that what makes code in Lisp first class is not that it can be passed/ returned/assigned, nor that both data and code are s-expression (technically that is untrue, there are lots of caveats: code can be compiled, data can be something other than s-expression, although in the prevalent case both data and code are S-Expression. If it was so, than code is first class in Java as well. Code can be passed around, returned from a function, assigned to variables and even evaluated. For people who don't remember VisualAgeForJava performed exactly the eval trick on strings provided by the user.
- The big difference between Java and LISP provides the means to structure the code (i.e. the means of composition and decomposition, or in technical jargon constructors and destructors , which are respectively CONS, CAR, CDR). Using only CONS, CAR, CDR operators and the primitive values of Lisp (symbols, chars, strings, numbers), programmers can generate both all the code there is and all the data there is (in the later case modulo the class of equivalence with other more efficient data structures). Whereas for Java VM code is really opaque a byte array that the classloader provides to the runtime, and which has to conform to an external specification (class file specification). Byte arrays representing code can be manipulated just like the other values, but programmers are not offered structured access to the code represented in those byte arrays, not by default. There are quite a number of industrial and open source Java software that do perform transformations on these byte arrays (including generating new code dynamically -- without access to JavaC, modifying existing code, etc).
- Objection. Java's bytecodes are analogous to the machine code Lisp is compiled to (or its interpreter causes to be executed), not Lisp source code. The Java language is composed of expressions, not bytecodes.
- Your observation is irrelevant -- not to mention incorrect in the general case. Java's syntax is not uniformly composed of expression, like for example ML is. Java has class declarations, method declaration, variable declarations, statements, and only after that expressions, but those are not strictly the counterpart of S-Expressions in LISP's runtime. As far as Java's runtime is concerned, bytecode is the only way the code is supplied. As far as LISP's runtime is concerned S-Expressions is the way that code is supplied, S-Expressions are not source code, the source code of LISP represents a notation for S-Expression.
- You're correct, the Java language is composed of more than expressions, but it isn't composed of bytecodes. Bytecodes are the instruction set of the Java Virtual Machine, but the Java Language can be (and is) compiled to other instruction sets for other machines. You're mixing levels. If we're going to compare the Lisp language to the Java language, let's ignore the virtual or real machine on which they execute or the runtime that interprets them. Let's compare the source code of Lisp and the source code of Java, OK?
- Not OK. We're going to compare LISP platforms to Java platforms. And LISP's essential feature that you are missing in Java, namely eval operates on S-Expressions, and those are instances of an ADT living and breathing in the runtime. The whole purpose of this discussion is to compare "code", where Doug so finely missed to give a precise meaning to how this word acts in his definition (because there's source code, parsed syntax trees, compiled byte-code, just in time compiled machine code, statically compiled machine code, etc, etc). Java source code is irrelevant since it has no life in the JVM. Whereas the advantage of LISP is that LISP source code is a notation for S-Expressions and S-Expressions are what the LISP interpreter operates with, both as code and as data. In contrast, Java's source code is most definitely not a notation for bytecode.
- I thought we were comparing languages, not platforms. I don't understand why translations/compilations/interpretations of the source code matter. It's possible to build a VM for Java that executes Java source as an AST of strings without the use of bytecodes. It's also possible to compile Java to something other than bytecodes. Why can't we frame the definition of "homoiconic language" in terms of the language alone? Why do we need to introduce a specific platform or platforms?
- For languages like Java and Lisp language definitions are intertwined with constraints (minimum requirements) specified for any platform implementing them. In contrast, we have languages like C, Ada, ML that are specified in the abstract with almost no constraint (no assumptions) made upon the platform of execution (it can be anything: compiled code, interpreter, bytecode vm, etc). This is a trade-off: you loose some flexibility in terms of implementation, but you gain some features. For example in case of Java you gain security embedded into the language. You also gain WrightOnceRunAnywhereTM :) In case of LISP for example the mandate for all LISP platforms to provide a runtime "eval" function may be considered expensive in some cases (a fully compliant LISP implementation may not scale down to embedded platform with limited resources bas nice as C does), but on the other hand provides a very powerful programming technique quite distinctive for the LISP family of languages. Therefore because we talk about LISP and Java we cannot separated language from runtime as easily as if we were talking about say SIMULA and C, or ML and ADA. For example, when folks use the syntax and semantics of Java as per JLS, they do expect to be able to derive from java.lang.ClassLoader.
- I understand how tightly coupled Java and its VM are in the language spec, but I thought folks had found ways to fully implement Java without a VM (i.e. the Jove compiler supports custom class loaders by compiling bytecodes in addition to Java source). I thought we could approach the "homoiconic" distinction from a more abstract scope and express the definition in terms of the language alone. Are you saying that if we just consider the language definitions for Lisp and Java, ignoring the platforms and implementation details, we can't categorize one as a homoiconic language and the other as a non-homoiconic language?
- Language definition for java already settles classloader and binary compatibility of byte codes. Language definition for LISP mandates among other things that eval must be present. Language definition for Java already mandates a thread. Therefore there's no useful consideration of Java language in isolation of these platform details. So the bottom line Java is currently non-homoiconic, because code as data is not structured. Java could be made to have exactly the same benefits as homoiconic languages by adding a bunch of libraries to the mix. Java would still not meet the structural definition of homoiconicity ("the code is represented the same as data").
- Is that a yes or a no?
- It is neither. It's up to Doug to pick up responding to that question (if he bothers at all). But I don't see separating JLS from what happens inside the JVM, or separating Lisp from its interpreter(s) as a very interesting topic.
- How can it be neither? Either we have to consider the platforms and implementation details or we don't, right? What other alternative is there?
- The alternative is that two alternative can be pursued either in parallel or in different contexts. Java as a language is already "intertwingled" with platform details. So if you're discussing with me, no I don't like to talk in the abstract about Java language. There's no such thing. There's Java platform defined by Sun and implemented by Sun, IBM, BEA, HP and a few others.
- So there are at least 2 definitions of "homoiconic language", one that considers platform and implementation and one that doesn't?
- Not at all. There's only one definition. Because the definition is very informal, it applies differently to different contexts. Because Java Language cannot (imho,etc) be clearly separated from Java Platform, we choose the appropriate context.
- [0][Remember, Homoiconicity is a property of a language itself, as a standalone entity. It is not a property of a program, or an implementation of a language. Removing the distinction of "in language" and "in implementation" makes homoiconicity as useless as the difference between red and green to someone that is colorblind.]
- [1] If that was so, it would make the property less useful because many languages cannot be easily separated as "standalone entity". Not to mention the many languages defined more or less by their implementation. In realizing homoiconicity there's a role played by the language definition, for example in LISP it will relate to quote/unquote mechanism, but ot only, and there's a role played by the runtime/libraries (eval). Image based languages (including most Lisp implementations, Smalltalk, etc) by default are comingled.
- [Whoever said the property of homoiconicity itself was terribly useful? It's very interesting, and it certainly has benefits, but most languages get by without this property.]
- Whoever said reflection was terribly useful? Whoever said object orientation was useful? Whoever said first class closures were useful? Whoever said ... hell, why don't we just get rid of all advanced language features and stick to the least common denominator? Now, would that be Pascal or Assembly?
- [You're misunderstanding that statement. I am saying, "Homoiconicity is neat and fun and languages that are homoiconic have many amazing applications, but no one ever said it had a huge amount of utility. I am not arguing that it is useless.]
- [reply-to 0] I don't understand. If "[h]omoiconicity is a property of a language itself" then why would "[r]emoving the distinction of 'in language' and 'in implementation'" make the property useless? If it's a property of a language, it is independent of implementation.
The Tcl
language specifies that code and data are both represented as strings, which can be freely used as values regardless of whether the string is code or data.
In Smalltalk, code is a second-class value -- it can be passed as a parameter, for instance, but the contents are not generally mutable at runtime, despite some reflective capabilities, so it's not quite homoiconic.
- Requiring "contents" to be mutable at runtime is a mistake in this discussion, as we would rule out pure functional languages of the future (currently there doesn't seem to be any pure functional language that can have a claim to homoiconicity). And by the way, "contents" (very unclear what it means above - content of the code bound to a variable, content of the code bound to top level symbols??) can be mutated at runtime. That's what all Smalltalk IDE do. And since the IDE is a Smalltalk program, voila.
In Java, code is third-class -- it's not a value at all, by the language definition, regardless of what an implementation does with it.
- Code in Java is a value, it's not currently a structured value. Code is represented by a byte array that the classloader provides to the Java VM runtime. Nothing prevents java from providing code as a structured value.
- Incorrect. Java code meets none of the definitions of FirstClass value. (??)
Essentially all languages represent source code as some sort of value, such as a sequence of bits or a sequence of characters, but that matters not at all if the language definition doesn't specify that that code is
FirstClass. Most languages are thus not homoiconic.
[Very useful definitions moved from here to the top of the page.]
Doug,
To justify "homoiconic" as a useful definition, you should either avoid defining the notion through the mechanics of it, as the definition you provided so far is fuzzy enough and I have big doubts that is useful. You can start by defining what functionality, what is the benefit of the programmer in having a homoiconic language. If the benefit is just having an eval function that given a structured representation of an expression, it evaluates that expression, then you only need to observe that eval can be provided as a third party library in Java without modifying the JLS one bit.
I never claimed that Java is homoiconic as it is today. But given what I understand as homoiconic, and what it is good for, I must: accept that Scheme and Lisp are homoiconic, reject that TCL is homoiconic on grounds that its representation of programs as data is unstructured, accept that Java as it is today is not homoiconic. But then I shall observe that the difference between Lisp and Java that if added to Java would make it offer the same end-user benefits as Lisp, well that difference is not a matter of language design, because one could easily leave Java language specification in place, with its syntax and semantics unaltered, and write a library that will make up for that difference. QED.
So please come up with a better functional definition / motivation for homoiconicity if you disagree. The fact that you do not master what it is that you want to define, was proven when you provided a test that actually tested for ability to lazy evaluate an expression. If you think this is a very important property of programming language, then it has to be subject to meaningful tests. I.E. normal users of programming language do not give a damn if its MetaCircularEvaluator or not, wether its structured as cons versus structured as objects etc. A meaningful test can be for example: I can do X easily in LISP, versus I can't do it in Java, nor in Haskell nor in ML. -- CostinCozianu
I am trying to capture what the experts in the rest of the industry mean by the term, and I have a certain amount of confidence that I succeeded this time, despite the fact that, yes, the underpinnings of FirstClass and of 'same representation' are, alas, themselves not defined as precisely as we would like, no doubt leading to further arguments.
It doesn't matter that you disagree that Tcl is homoiconic, because recognized experts in the field say that it is, and I understand and agree with their arguments.
- My observation still stands that code as data should be structured, because code is naturally structured, and if we loose the structure when manipulating code as data, then we can no longer consider it first class. I don't know if TCL supports structured data at all.
[personal chaff deleted]
Your suggestion for writing about motivations for homoiconicity is not without merit, but you are completely off track in suggesting that such is more to the point than the simple definition I present.
The term "homoiconic" is not at all widely used, but it goes back 30 years, has been used by a number of well-known people, and when it is used, its definition is industry standard: I've never seen any disagreement about its meaning outside of these recent arguments on c2. This is not something I made up! The difficulty is just that it seems to have been used with an implicit rather than explicit definition. Apparently this is the first time in history that its use was challenged endlessly rather than accepted without comment. -- dm
- The only real disagreement you had was with Eric. Homoiconic was not even important enough to steer any controversity, because most people don't use it. I was trying to help you tighten up the definition and make the term meaningful.
Some motivations for why someone, somewhere, might care about homoiconicity (not limited to just the end user). These are not definitions.
Homoiconic languages make it very easy, sometimes even trivial, to:
- write a MetaCircularEvaluator (don't dismiss this as uninteresting; in fact, it is considered quite important e.g. in the Lisp world, and not just by the vendors of industrial strength Lisp implementations, who typically do not take advantage of this approach)
- implement reflection (even when not otherwise provided by the language)
- implement lazy evaluation (even when not otherwise provided by the language)
- implement dynamic code generation (even when not otherwise provided by the language)
Once again, the reverse is not true...just because some or all of the above are offered by a language does not mean it is homoiconic. That's not the point. I'm just addressing a little of Costin's issue with motivation/functionality. -- dm
Having read all this discussion on homoiconicity, I'm a lot more confused than I was when I first read it. I think part of the problem is that it's being referred to as an advantage or a feature, when it's really just a property - it's incidental to what you can do with a language.
- Untrue, as has been addressed multiple times in this set of pages.
Providing runtime access to the languages compiler allows you to do everything you homoiconicity grants you without necessarily being homoiconic.
- Completely misses the point, as has been addressed multiple times in this set of pages. (Search, and ye shall find.)
There's a real feeling of "nothing that doesn't match my ideal of a language is allowed" coming from some of the arguments.
- From some of them, perhaps, but nonetheless homoiconicity exists, and not all languages have it, despite the highly imperfect nature of the collective comments on this set of pages.
Would you consider Python to be homoiconic? The standard library provides structured access to code, including code which is currently running.
- So? Does it meet e.g. Alan Kay's criteria, that internal and external representation are essentially the same? Questions similar to this have been addressed multiple times in this set of pages.
In reviewing the above, it seems that FoxPro (ver 2.x anyway) fits the definition.
For years I wrote code that wrote code and executed it on the fly. Moreover, I could deal with an exception by creating arbitrary code (manually or automatically) which handled the condition and resumed execution. I had programs in which there were routines that deliberately wrote the appropriate code fragment for handling a specific value or situation and always executed the code thus dynamically created.
I was able to store code fragments in tables, use these fragments to generate modules at run time and then execute them. Code could be stored in variables, which variables could be executed either by &varname or evaluate(varname).
This allowed for extremely flexible code. It allowed a running program to redefine its tables and its treatment of those tables. The TCL example above can (with concessions to syntax) be rendered in FoxPro.
It's possible that my understanding is flawed, but the impression I get is that, intentionally or not, FoxPro is (or was) such a language. -- GarryHamilton
(This was a feature in 8-bit dBASE-II, upon which FoxPro was based: The "&variable" syntax was like a macro -- it put the string contents of the variable in that place /before/ the interpreter parsed and executed the commands of the line. -- JeffGrigg)
That certainly is powerful, and for many purposes, a highly desirable facility. It also is not "homoiconic", at least, not as you described it so far. What you describe could potentially be programmed in any language, it seems (using a code generator, which might be built into FoxPro or might not, you didn't say, but it doesn't matter for the purpose at hand, although it certainly makes such things as you describe much easier).
By contrast, a language is either homoiconic right from the get go, or it is not; it doesn't have anything to do with which programs/libraries/facilities are written in/for that language. A homoiconic language remains homoiconic when stripped of all external executables, all libraries, all include files, all everything. Further, a language that is not homoiconic cannot become so by adding executables, libraries, include files, etc.
Can you write code to achieve roughly the same end effect, in absolutely any language at all, as long as it is a TuringEquivalent language? Well, sure, that's what TuringEquivalent means.
The problem is, when reading this set of pages, listening to (mostly anonymous, and therefore hard to distinguish) arguments from all the people who didn't know the word "homoiconic" until they started arguing on these pages. (Why do people argue so passionately about topics they don't know? It's very weird.)
If you look more carefully, the history of the word is documented hereabouts from its first origins, and the original definition is clear, it's just that it turned into a religious war, where certain parties wanted to claim that all languages are homoiconic, or something, which most certainly and absolutely is not the case.
(There was also argument about whether it is a strict category or a fuzzy definition, but that ended up being merely an irrelevant rhetorical device, since the same issues arise with absolutely all categories and definitions, yet the modern epistemological tools for that topic were never used whatsoever.) -- DougMerritt
It also is not "homoiconic", at least, not as you described it so far.
- Hmmm. Perhaps my articulation is wanting. In the TCL example "cond" is assigned a value of {$b > 0}, and "b" is [append rhs 5], etc. such that evaluating "$cond" expands the several nested definitions into a complete expression. Rendered in FoxPro, this would be &cond, &b, &rhs, and so on. I would have to experiment a little to give you the exact rendition, but the point is that evaluation and execution of commands and expressions from stored strings in variables is expressly supported.
- This makes it possible to achieve the coding I described above. I have also written a "FoxPro interpreter in FoxPro" to allow for some tricks not normally supported in the language (e.g. multiple statements per line), but the language itself allows string representations of commands/expressions to be executed directly.
- As I said, I may still be a little wobbly on my grasp here, but it certainly seems to lean in that direction.
--
GarryHamilton
Yes, well...leaning, ok, all the way there, it doesn't sound like it. The latter ranges from simple BootStrapping all the way to MetaCircularInterpreters, depending on details, but none of that spectrum is the same thing as Homoiconic, although yes, there is a relationship (that has been discussed hereabouts).
TCL is in fact considered a HomoiconicLanguage, not just because it supports eval, but because code and data have the same essential representation: strings. (Most languages support strings, leading to unwarranted claims that therefore all languages are homoiconic, which badly misses the point, perhaps by badly misunderstanding the nature of strings in TCL vs C/Java/etc.
So does Foxpro use the same representation for code and data? A yet-to-be-evaluated expression in TCL IS a string, it's not a trick or an option. A yet-to-be-evaluated expression in Lisp (the original Lisp 1.5, say, to avoid side issues) IS an S-Expression, not as a trick or an option, but because that's what it truly is. -- DougMerritt
So does Foxpro use the same representation for code and data?
- Although FoxPro supports compiling its code, the code that you give to the engine is strings. Internally, there is tokenization to optimize execution speed. However, the physical storage of all data (including numbers stored in tables) is done with strings. Obviously, at run time, before it ever hits the CPU registers, numbers and commands are shredded into processor-compatible bits, but commands and expressions are strings.
- If I store an expression in a string (foo), and then present the interpreter with "&foo" it will expand and interpret the expression just as though I had written it as literals. If I say [cmd = "seek &bar"], assign the string "MOMMY" to bar, and then present the interpreter with a line that reads simply "&cmd" the parser will see "seek MOMMY" and will perform that command.
- You can, in fact, store the program as strings in a database table, stuff them into variables at run time and have code that looks like this:
store table.cmd to cmdvar
store table.arg to argvar
store table.cond to condvar
if &condvar
&cmdvar &argvar
endif
And, of course, more exotic contortions are available. --
GarryHamilton
Interesting. Ok, sounds pretty similar to TCL so far. One last hurdle: is this the normal way that the language handles conditionals? It is, in TCL. If it's merely an optional facility, then it would be going too far to call the language homoiconic, but perhaps it would be accurate to say it has an optional homoiconic facility. I.e. what is the scope of what "homoiconic" refers to here, the whole language, or a subset facility of the language?
Similarly, although Lisp is homoiconic, one could point to some implementations of Lisp compilation as an added facility which is not homoiconic (unless it takes pains to make that compilation transparent to the usual eval/quote/etc), although the rest of the language is homoiconic.
(I do not, however, see any reason to start trying to rate "degree of homoiconicity", despite some such suggestions on these pages; any particular facility either is or is not, any particular entire language either is or is not, etc. It's a qualitative statement, not a quantitative statement, and I don't see any associated metric space.) -- DougMerritt
One last hurdle: is this the normal way that the language handles conditionals?
- Normally one would write conditionals using variables & literals (if xi > 15 or if ab < qx). I would have to concur that FoxPro has optional homoiconic facility as opposed to being homoiconic as a language.
- Now I have to go learn TCL. -- GarryHamilton
I would say that the use of the macro ("&variable") notation is common usage in dBASE and
FoxPro - largely because some of the commands of the language have such a fixed definition that you need it to work around the limitations of the language. For example, "USE" and "SELECT" commands, used to open and read from a file/table, take a table name -- not an expression that evaluates to a table name. So if you want to use a file/table specified by a variable value, you must use the "&variable" syntax to "modify the code" at run time. --
JeffGrigg
- Yes, the dBASE implementation had that flavor. The FoxPro implementation, however, carried it a farther. This facility became popular to the point of being a mainstay of coding technique among ThreeStarProgrammer types. This was further formalized by the addition of EVAL(), which accomplished (almost) the same thing. The eventual feature went significantly beyond the original use. -- GarryHamilton
RE: "Java code meets none of the definitions of FirstClass value."
[from near the top of this page]
From the FirstClass page:
"FirstClass features can be...
- stored in variables,
- passed as arguments to functions,
- created within functions and returned from functions.
- In dynamically typed languages, a FirstClass feature can also have its type examined at run-time."
If one accepts that code is represented in Java as a byte array ('byte[]' type), then all of the above are true of "code in Java." ...even the "dynamically typed languages" point. ;->
(If "code in Java" were a String type, all the FirstClass points would also all be true.)
RE: "Java VM code is really opaque a byte array that the classloader provides to the runtime, and which has to conform to an external specification (class file specification). Byte arrays representing code can be manipulated just like the other values, but programmers are not offered structured access to the code represented in those byte arrays, not by default. There are quite a number of industrial and open source Java software that do perform transformations on these byte arrays (including generating new code dynamically -- without access to JavaC, modifying existing code, etc)."
- and -
"[...] So the bottom line Java is currently non-homoiconic, because code as data is not structured. Java could be made to have exactly the same benefits as homoiconic languages by adding a bunch of libraries to the mix. Java would still not meet the structural definition of homoiconicity ("the code is represented the same as data")."
So...
If the BCEL library were declared a Java standard library, then Java would be a homoiconic language? ...as the BCEL library provides structured access to read and change the Java native byte array representation of code.
- No. Read the above more carefully: "Java could be made to have exactly the same benefits by", not "made to be homoiconic by", and quite explicitly, "Java would still not meet the structural definition of homoiconicity ("the code is represented the same as data")."
If so, then we need to talk about the .Net environment: Microsoft provides a standard API for translating both ways between source code strings and byte code representations for C# and VB.Net. And it's intended that all other .Net languages should support these interfaces. So therefore, all .Net languages could be considered homoiconic.
- No. Not even close. Try looking over the FAQ.
On the other hand...
Perhaps "common usage" or "ease of usage" should (?) be part of the homoiconic definition?
- What has that to do with "same representation of code and data"???
- I'm assuming that Java's representation of code, at run time, is a well-defined sequence of byte array ('byte[]') values; that is, "Java byte code."
- Yes, and for the sake of the argument, let's even say that's defined by the language standard. It doesn't matter. Any implementation of any language will have to store its representation of the program in some kind of data structure, trivially. The fact that it does so does not mean that "code and data have the same representation" -- as has been discussed extensively on this set of pages.
- Yes; compiled C/C++ programs are stored in files that can be read by C/C++ programs into char arrays. But the ANSI standards for C and C++ don't say anything about those bytes. Anything done with those bytes, is "undefined," in terms of the ANSI C/C++ language definition. That's why http://www.ioccc.org/1984/mullender.c on the HomoiconicFaq page is not an example of homoiconic C; it's behavior is undefined in the language standard; it's platform (CPU) dependent.
- Java, however, has a well-defined byte code. Its interpretation is well-defined. A Java implementation that couldn't load and run Java standard byte could wouldn't really be Java.
- True, I get that, but nonetheless, just adding the definition of the data structure to the language definition does not make the language homoiconic. AlanKay's quoted usage at the top of the page was "internal and external representations are essentially the same"; he didn't say "the internal representation is a data structure defined by the language standard", and it misses the whole flavor of the idea to go off in that direction. The flavor of the idea is conveyed well by conditionals in TCL, check them out. -- Doug
[That was my argument lo these many months ago. Homoiconicity isn't boolean, it's scalar. It's also subjective, since "ease of use" varies from person to person. But several folks here are strongly opposed to such a definition. -- EricHodges]
- The counterargument was that the community that invented the term (MIT/Lisp/Alan Kay/etc) doesn't agree with that definition. None of that community have said the definition involves "ease of use", btw. I certainly haven't, regardless of whether I'm part of that community. -- Doug
- [I have yet to see a test for membership in the discrete set of homoiconic languages. I may have missed it amongst all the discussion. If so, please direct me to it. -- EH]
- I still feel that the topmost part of this page (the part preceding discussion), is precisely on target, including as it does reasonably thorough historical research and references to modern authors. The problem seems to be that, nonetheless, the definition and description there still confuses multiple people. I don't think it's wrong, but I'm still not sure how to improve it so as to confuse fewer people. And again note that we here on c2 appear, until someone says otherwise, to be the first to try to nail it down really really formally. Generally authors have just described the rough idea and moved on.
- At the moment I think part of it is regarding the ambiguity in English in "same representation": are we talking about one/some/any/all representations, or what? (With the answer being, not just "some representation" -- by default it's intended to be "all representations", but then due to complications such as homoiconic extensions to non-homoiconic languages, and non-homoiconic extensions to homoiconic languages, we end up with complicated and confusing discussions. -- Doug
- [I think "representation" is the problem. It's clear that Kay and others have some idea of what they mean by that, but I can't figure out what it is. To me, lots of languages have the same representation of code and data. The languages singled out as homoiconic just seem to make treating code as data easier.]
- [If English is really the problem, ignore English and use logic. Can't we find a statement that is only true for homoiconic languages? -- EH]
- Perhaps, but if I fail, keep in mind that that is merely my own failure, and not a reflection on the topic. I didn't coin the term. Try this: it occurs to me that the languages that are considered the best examples are always single-typed languages, to a first approximation. Sure, Lisp has arrays and hashes etc, but at its heart, it is untyped/dynamically typed/singly-typed (S-expressions). Similarly with TCL (the single type is "string"), Snobol (string again, and possibly not a perfect example either), TRAC (I think was string; I could've said more definitely last year but I forgot again), etc.
- In a singly-typed language like those, the English isn't so ambiguous and confusing. There's only one type to start with, so to say code and data have "the same representation" can only mean one thing. There are singly-typed languages that are not homoiconic, of course; BCPL and B have the single type of machine words, but no data representation of code at all.
- In a multiply-typed language -- well, I'm not so sure that the term "homoiconic" has ever been used for any of them. If one wanted to extend the definition beyond previous usage, one would immediately run into, well, all the confusion that this topic has suffered over the last year! :-) -- Doug
- [Huh? If a language has 2 data types, one of which has the same representation as code and the other doesn't, how does that push it out of the set of homoiconic languages? -- EH]
- It doesn't, necessarily, it's just that I notice that industry usage of the term seems to always be in reference to "untyped languages" (a term that seems biased to me, but in wide use for singly-typed languages), and that our discussions on c2 in trying to figure out which richly typed languages are homoiconic have been extremely controversial. But perhaps noting this distinction will help settle some of the past controversy.
Yes, I've been thinking...
"Is Java homoiconic?"
"Yes, but not very much."
- No, not even a little bit, as was covered on these pages last year.
"Is C# (.Net) homoiconic?"
"Yes. Slightly more than Java. But still not much."
-- Jeff Grigg
I don't agree that you can exclude the standard Java libraries from "The JavaLanguage". The class 'java.lang.Object' is typically stored in a separate library, not part of the JVM. If you exclude that library (by deleting it, for instance), no Java programs will run. All Java classes are a direct or indirect child of 'java.lang.Object', so no class can be loaded into memory without it. And Java without the other 'java.lang.*' classes wouldn't be much of a language. It wouldn't have Strings or exception handling, for instance. Would one argue that the 'java.util.*' classes are not part of Java?
In my opinion, the JavaLanguage, as defined by Sun, includes the standard libraries.
That is, "Java" includes the published interfaces for everything in the Java Platform, Standard Edition (J2SE).
But it only includes the runtime, the Java Runtime Environment (JRE). The "JavaLanguage" does not include the libraries unique to the Java Software Development Kit (SDK). Nor third party libraries, even if commonly used. However, if I can write it in standard portable Java, then I can do it in Java.
-- JeffGrigg
Your comments seem far removed (on the page) from previous discussions about that issue, but ok. You've got a very good point. Some previous discussion has been about it being irrelevant to add a new library to some language to make it homoiconic. You are correct that standard libraries are, these days, typically considered to be part of "the language definition", and somewhere around here there are probably comments that imply otherwise, but this is all beside the point, so...so? -- Doug
True, if adding the BCEL library to Java was necessary for Java to be homoiconic, then Java must not be homoiconic now. However, if Java with BCEL is homoiconic, and BCEL is just a convenience, embodying a set of techniques that I could apply myself in my own Java source code, then Java without BCEL would still be homoiconic. BCEL would just be a convenient tool that could be used to demonstrate that Java byte code can be interpreted and modified by standard Java programs.
-- JeffGrigg
Well, but above I already answered that, I said (to paraphrase) that it is an untrue claim that "Java with BCEL is homoiconic". So the rest doesn't follow. -- Doug
How about MyNaiveAttemptAtUnderstandingHomoiconicity?
MetaDiscussion? moved to HomoiconicMetaDiscussion
JuneZeroFive