Homoiconic Example In Java

[Sprung forth from the brow of HomoiconicLanguages]

Related concepts: reflection, metacircular, bootstrapping, lazy evaluation...


Note to googlers: I see this page is popular with google; be warned that this page is currently low signal to noise ratio, contains mostly unfinished arguments between multiple people (including some who do not use homoiconic languages), and despite the title, does not mean that Java is a homoiconic language (it's not).


[now-deleted example in Java is] Not homoiconic for quite a few reasons, but the easiest one is that it doesn't work on my system, nor on millions of other systems that support Java, because "javac" doesn't exist.

The Lisp example always works.

That just means it isn't portable. It still does the same things the Lisp example did.

[No it doesn't, the Lisp example doesn't invoke an outside program. This page is silly, how many times must you be told you can't invoke a compiler to do this. And writing a file and compiling it is the same ignorance I've seen in C when trying to say they can generate code at runtime, it completely misses the point, namely, native language support for this capability. Understand... NATIVE, it must be defined in the language spec, the language must provide this ability, or it isn't homoiconic. If you write your own eval, it's still not homoiconic, because your eval isn't defined in the language spec. OK, do you get it yet, if you have to hack up eval, it isn't homoiconic!]

The point isn't so much that it isn't portable, the point is that you are putting all of the important part of the computation into this string that contains "javac"; the semantics of that string are unknown and unknowable to the implementation of Java. On my system, "javac" might start up tetris, or do "rm -rf /", or fire up a lisp interpreter that runs the Lisp example, or absolutely any other computation.

The point is, you can't write the Java program you are attempting in such a way that it runs on my system; you're doing things that are not part of Java.

Precisely the same critique would apply if your Java program did nothing except fire up a Lisp interpreter on a Lisp program that your Java program synthesized. It demonstrates nothing whatsoever about the properties of Java to call an exterior executable.

I agree that my "eval" method is not part of Java, but for any system on which Sun's JDK is installed an "eval" method can be written. Lisp provides eval (and other conveniences), but eval is possible in Java.

Remember, I'm not claiming that Java or C are homoiconic languages. I'm suggesting that the set of homoiconic languages is not a discrete set. I think it's a fuzzy set where membership is determined by the ease with which homoiconic programs can be written. This would let us do things like place machine code less in the set than Lisp, for instance.

And if meta-circularity was equivalent to homoiconicity, then you'd have yourself a good example. But it doesn't, so you don't. Eval does not a HomoiconicLanguage make. Meta-circularity is possibly a fuzzy set. Here's an interesting difference. In the lisp example, it's impossible to make an improperly formed program. The program may not be logical, it may not reference valid symbols, and thusly it may not run, but it is always well formed and parses correctly. When your code and your primary data type are the same, you can't even ENTER an invalidly formed program into the system. Because your code and data types are the same, you can't make a program which does.

Sure I can. I can (and did a few years ago) create an AST package for Java that only allowed the creation of properly formed Java. I wouldn't be using a "primary" or "fundamental" data type, but I that wouldn't stop me.

Which misses the point. You added that functionality to the language. It wasn't there to start with.

I'm a pragmatist. It doesn't really matter to me what functionality is part of the language and what functionality is provided by other code. If I can do it, I can do it. And I can do it.

This is a notable difference from say, using a regex library to alter your code, where you CAN make invalid code that will not parse. This is because you're representing code in a datatype which has no realization that it's code. Sure, it's code, but that's a decision that context makes. A bug in your program means it's no longer valid code (oops, erased a semicolon!). You cannot do this in Lisp. If you enter in an invalid thing (like an unclosed list, or an invalid symbol (they do exist)) the program can't even read it. It's not a valid data-structure.

And I can do the same in Java. You can make Java mimic it, but you can't make Java, as a language itself, homoiconic. Please show me this AST type in your Java language spec.

Creating Eval in Java is cool. But it's not a homoiconic language, even if you succeed. And you didn't succeed. Even if you wrote a parser and made new datatypes so that you could mimic this property, it wouldn't be a homoiconic language because... it is not a core part of the language. Sure it's implementable (albeit awkwardly), but it's not part of the language.

Lots of things aren't part of the language, but we use them all the same. I don't understand the significance of that. It is true. However, that's the crux of this argument. The language, itself, as a standalone spec, does not include this data. Adding a library to Java to support these interactions does not mean that JAVA ITSELF has these features. Further, compare and contrast the code you'd use. The Java code is blocky, clumsy, laborious. Its meta-manipulation involves a lot of cumbersome work. The lisp version has support in the core of the language, way down at the most basic commands of the language (car, cdr, cons are essential to lisp. Without them, lisp does not exist, they aren't part of a library. They're like the equals operator in C, or the new operator in C++. Immutable. You can hook into them, of course, but you can't change what they fundamentally do).

Here's another example of Java code, using ASTs to build the expression instead of strings:

b = 3;
System.out.println("b=" + b);
SimpleAssignmentOperator? assignment = 
new SimpleAssignmentOperator?("com.ebh.fuzzyjava.FuzzyJava?.b", "15");
System.out.println("a=" + assignment);
eval(new Expression(assignment));
System.out.println("b=" + b);
assignment.setValue("37");
System.out.println("a=" + assignment);
eval(new Expression(assignment));
System.out.println("b=" + b);
This is more "blocky" than the Lisp example, but that's going to be true no matter what the code does. I agree that this doesn't make Java more homoiconic, but it shows that the behavior of the Lisp example can be added to Java. If Java (or any other language) can do what a homoiconic language does, why should I care if it uses a library to do it?

You can do in Java everything you can do in Lisp, and you can do in Lisp everything you can do in Java. They are both TuringComplete. But Lisp clearly fits the definition of homoiconic, and Java clearly doesn't. Homoiconicity isn't fuzzy. It's just a few people's understanding of it.

Then let's make the distinction clear to everyone. What part of the definition excludes Java? What statement can be made about homoiconic languages that can't be made about Java?

There are no parts. AFAIK the definition of 'language foo is homoiconic' is that foo represents code and data with exactly the same data types. That's yes or no, black or white, and it does exclude Java, because Java doesn't fit the definition. And this very definition is, trivially, a statement that can be made about every homoiconic language, and at the same time not a correct statement when applied to Java. If you have a Java program on your screen, you can interpret the source as a parse tree, imagine what would happen if you swapped two branches of the tree, modified a numerical value in one of the leafs, and so on. But the JavaLanguage itself does not allow you to actually do that, because the fundamental data types of Java are boolean, char, byte, short, int, long, float, double, and Object, and a Java program is none of them. The fundamental data types of LISP are atom and list, and every LISP program _is_ a list. A Java program is just ... a Java program.

But a Java program is a string of Unicode characters. Java has a String data type that is a string of Unicode characters. Therefore it represents code and data with exactly the same data type and is homoiconic. That's my problem with the definition. It makes Java just as homoiconic as Lisp, and no one wants that to be the case.

No, a Java program is not a java.lang.String. If it was and if Java was homoiconic (the two conditions are orthogonal), then a Java program could access the very java.lang.String object in which it is contained and, say, call the length() method. Pseudo-Java example:

 public class Example {
     public static void main(String[] arg) {
         System.out.println("length: " + Example.length());
     }
 }

However, this is not possible. (Maybe someone could give an example in LISP that prints the number of atoms the program itself contains?) Even if you implemented the length() method for the type Example and make it check the length of the file Example.java, this wouldn't have anything to do with proving homoiconicity. At the time this method was executed, it would (a) be expressed as Java bytecode, which is an entirely different language - even if Java bytecode was homoiconic, that wouldn't make JavaLanguage homoiconic - and (b) have no way of knowing which file it was compiled from. As I said, Java's datatypes are the so-called primitive types and java.lang.Object, and a Java program is neither of them.

Again: nope! I addressed this at bottom of page already (where you didn't comment): [we informally call language text in a file, such as C code, a "string", and yet that is using the word "string" in a somewhat different sense than the string literals that may also appear in that same file.]

Note that well: "different sense". I then tried to clarify a little: [I'm just commenting on the very loose informal practice of speaking of any sequence of bytes or characters as a "string", because that practice confuses things. For our current purposes, I don't want to ever use "string" to refer to a sequence of bytes in a file, only to refer to a true language data type, strictly.]

So no, I contradict you: a Java program is not a string nor a String of Unicode characters, not in any sense that is useful in this context, only in a loose sense which can only confuse things. On this page it is essential to be more careful with terminology: a Java program is a sequence of characters, and this is different than the language datatype "string".

I also refreshed my memory about some issues in Tcl strings, and commented on that at bottom of page.

I don't see how anything at the bottom of the page leads us to believe that a Java program is anything other than a string of Unicode characters. The Java compiler reads it as a String from the source file.

So? What was in the file to start with was a "string" only in loose parlance.

A file of Java source contains a sequence of characters, and a Java string contains a sequence of characters, so the contents of the file can be read into a Java string, but that doesn't make them the same thing by any means. Consider that the file of Java source can itself contain Java string literals, and that there is something different about that string compared with the sequence of characters in which it is embedded. If you call them both "strings" then you lose that important distinction. So to speak more carefully, "character sequence" should be distinguished from "instance of Java string type".

Huh? There's no difference between a Java String literal and the Java String the compiler compiles. They are both strings (in the general definition) and Strings (in the Java language definition).

That is 100% unrelated to what I said. We are now talking about something that is not at all exotic, but rather commonplace, so I'm afraid you're going to have to just put some thought into getting the point expressed above, rather than allowing yourself to get a bit confused based on a fast reading.

''You said "what was in the file to start with was a 'string' only in loose parlance." In fact, what is in the file is a String in Java's specific definition of that term from the compiler's perspective. There is no "loose parlance" there.

You also said there is something different between the String the compiler reads and the Strings embedded in it. That's true, but no different from the lists Lisp interprets and the lists embedded in them. I think you should put some thought into how to better express your point, or if you have a point at all.

Lisp programs are lists. When you parse Java, C, or C++ you need to not only figure out the tokens that are incoming, but also figure out (from the grammar of the language) the tree structure. There is enough information in the file, but the actual tree structure (before any kind of optimization) bears almost no resemblance to the source code. Lisp, on the other hand, does not need to infer any tree structure. We have explicitly told it. There is still a lexing stage, of course (remember, a file that contains data need not be a lisp program, we need to identify and of course provide a rudimentary sanity check).

Forget about string literals or other notationally different datatypes for a moment as you think of this. They're immaterial. It's the structure we're talking about. Lisp has two data types, a list and a symbol. A symbol is anything besides the list begin and end characters. Basically, Lisp skips the complex parsing step, instead just doing what amounts to a matching brace check.

But this is overly technical. Java has no native AST type. Therefore, it could never be homoiconic. Java uses different formats for code and data. It cannot be homoiconic. Java conceptually treats data and code as totally different things. It cannot be homoiconic. You might be able to force java to perform operations we typically associate with a homoiconic language, but that means nothing. Homoiconicity is a property of a language, not a property of any specific action.

You're making this harder than it needs to be. The Arisians would be very disappointed at your muddy, unclear thoughts.


There's no need to exec javac. Something like the following, from ApacheAnt (simplified), would work fine:

 Class c = Class.forName("sun.tools.javac.Main");
 Constructor cons = c.getConstructor(new Class[] { OutputStream?.class, String.class });
 Object compiler = cons.newInstance(new Object[] { logstr, "javac" });
 Method compile = c.getMethod("compile", new Class [] { String[].class }); 
 compile.invoke(compiler, new Object[] {cmd.getArguments()}); 

If the dependency on sun.tools.javac.Main bothers, supply a different one.

No need for all of that reflection. You can call the compiler directly like this:

 sun.tools.javac.Main.compile(new String[]{fileName})


Does the above invoke an outside compiler? If it does, I'm quite willing to delete this page on the basis that it aggravates me to even watch Eric get the same answer a thousand different times and keep dismissing it.


Here's another example which doesn't invoke javac or java executables and doesn't use files to share the value of b. It creates a new Java class, compiles it and loads it into the running VM. New new class modifies a static variable in the calling class.

 package com.ebh.fuzzyjava;

import java.io.FileWriter; import java.io.IOException; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method;

import com.sun.tools.javac.Main;

public class FuzzyJava { private static final String AFTER_VALUE = ";"; private static final String BEFORE_VALUE = "FuzzyJava.b="; private static final String AFTER_CLASS_NAME = " {"; private static final String BEFORE_CLASS_NAME = "public class "; public static int b; public static void main(String[] args) throws SecurityException, IllegalArgumentException, IOException, InterruptedException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException, ClassNotFoundException { b = 3; System.out.println("b=" + b); String a = createVariableSetterCode("test1",15); System.out.println("a=" + a); eval("test1", a); System.out.println("b=" + b); System.out.println("current variable setter value=" + getVariableSetterValue(a)); a = changeVariableSetterValue("test2", a, 37); System.out.println("a=" + a); eval("test2", a);System.out.println("b=" + b); }

private static void compile(String fileName) throws SecurityException, NoSuchMethodException, IllegalArgumentException, InstantiationException, IllegalAccessException, InvocationTargetException, ClassNotFoundException { System.out.println(Main.compile(new String[] { fileName })); }

private static String changeVariableSetterValue(String className, String a, int i) { a = changeClassName(className, a); int startPos = a.indexOf(BEFORE_VALUE) + BEFORE_VALUE.length(); int endPos = a.indexOf(AFTER_VALUE, startPos); return a.substring(0, startPos) + i + a.substring(endPos); }

private static String changeClassName(String className, String a) { int startPos = a.indexOf(BEFORE_CLASS_NAME) + BEFORE_CLASS_NAME.length(); int endPos = a.indexOf(AFTER_CLASS_NAME, startPos); return a.substring(0, startPos) + className + a.substring(endPos); }

private static int getVariableSetterValue(String a) { int startPos = a.indexOf(BEFORE_VALUE) + BEFORE_VALUE.length(); int endPos = a.indexOf(AFTER_VALUE, startPos); String valueString = a.substring(startPos, endPos); return Integer.parseInt(valueString); }

private static void eval(String className, String a) throws IOException, InterruptedException, SecurityException, IllegalArgumentException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException, ClassNotFoundException { FileWriter writer = new FileWriter(className+".java"); writer.write(a); writer.close(); compile(className+".java"); execute(className); }

private static void execute(String className) throws IllegalArgumentException, InvocationTargetException, SecurityException, NoSuchMethodException, InstantiationException, IllegalAccessException, ClassNotFoundException { Class c = Class.forName(className); Object newVariableSetter = c.newInstance(); Method setter = c.getMethod("setB", null); setter.invoke(newVariableSetter, null); }

private static String createVariableSetterCode(String className, int i) { return "import java.io.DataOutputStream;" + "import java.io.FileOutputStream;" + "import java.io.IOException;" + "public class "+className+" {" + "public void setB() throws IOException " + "{" + "com.ebh.fuzzyjava.FuzzyJava.b="+i+";" + "}}"; } }

Are you purposely being dense, or do you really not see how your sample proves Java isn't homoiconic?

I'm not arguing that Java is homoiconic, and I'm probably accidentally being dense.

Then why is the page named homoiconic Java example when Java clearly isn't homoiconic?

Because Doug asked me to translate the Lisp example into Java. I'll rename the page to avoid confusion.

He asked you that so that you'd realize while doing it, that Java wasn't homoiconic and the example can't be translated without violating the rule that the language itself must do it. Did you not learn that lesson? Do you not see that your example isn't homoiconic, you stepped outside the language to accomplish it? You wrote to the filesystem, then called a compiler, so you didn't translate the Lisp example, you butchered it. This page title is still misleading, Java isn't homoiconic.

Java isn't homoiconic. This example is. So, I'd say the page title fits perfectly well. -- JonathanTang No, it certainly is not homoiconic. -- DaveFayram

[FuzzyJava?.eval("int x = 10;"); doesn't work, does it? How can the FuzzyJava? example work with the local variable in the method I call eval() from?]

Hmm, I disagree, since homoiconicy is a language property, not a property of any particular piece of code that's faking it. Just my opinion though, I could be wrong.

Where did I step "outside" the language? I agree that I wrote to the filesystem, but that isn't "outside" the language. I called the compiler inside the language and it executed inside the language. My program performs the same behaviors as the Lisp example. If I wanted I could build an AST of my code and modify that to more closely match the way Lisp does it, but I thought my example demonstrated the principles well enough without requiring the curious to install something like ANTLR. All of this looks like it's inside the Java language from where I sit. I'm sorry if the page title is misleading. Rename it if you have a better one. Doug asked me to translate the homoiconic example from Lisp to Java, so "homoiconic example in Java" seemed descriptive to me. Once again, I'm not claiming Java is a homoiconic language.

-- EricHodges

OK, maybe I'm off base here, if so I apologize, but show me where the "compiler" is defined in the Java language specification. If it's not, then you went outside the language twice, once for the filesystem, and once for the compiler. Using a library to do your work for you isn't doing it inside the language.

Every Java program that does anything other than change its own memory and exit goes outside the language by that definition. My program went outside the language just to print the value of "b". All Java programs that stay inside the language are no-ops of varying lengths.

[Sorry, but the filesystem is certainly outside the language from my point of view. From the point of view of an OS designer. What would happen if the filesystem you were writing to turned out to be over a network and someone forgot to keep track of the endianness? Your code wouldn't compile despite there being nothing wrong with your Java compiler, despite the problem being completely outside Java's power to prevent or correct. And that's the key factor that determines whether something is in the language or outside of it; if the facility doesn't work, does it mean that the language is broken?]

And if the OS forgot to keep track of the endianness of RAM the Lisp example wouldn't work either. -- EH

It is true that needing to use a filesystem in this example is an inelegant workaround, at best - I presume you would actually agree, and that your point here is that this is an aspect of the "fuzziness" you've been mentioning - that the need to do a filesystem workaround distances this from an ideal example.

On the other hand, using methods from com.sun.tools.javac.Main makes this actually an interesting example, much more so than the previous one that gave the "javac" executable file to execute.

It does, alas, suffer from the weakness that it is unsupported and installation dependent prior to J2SE 1.5, but I suppose that's a minor issue, since it's standard in 1.5. This is obviously a pretty handy thing to have in the language.

So it would seem that your point is that you can translate the algorithm of the Lisp example, but would agree that it is further from an ideal example in that:

I'm not going to argue the difference between properties and algorithms, nor the "fuzziness" issue, this instant; I'm just summarizing. -- dm

Absolutely. I wouldn't do something like this in Java. (But I do it in JavaScript all the time.) My point is that the language doesn't make it easy, but neither does it prevent me from doing it. I used to write object oriented code in C even though the language didn't make that easy. It was more difficult than C++ but still possible. Therefore I'd assign C less membership in the OO language set than C++.

If I find a way to stream the code to the compiler without writing a file then I might start thinking about actually doing this in production code. I'd write a package to make it easier, though.

And I don't really think that using com.sun.tools.javac.Main made the example more interesting. The previous example demonstrated the idea. It doesn't really matter if a step involves the operating system or some platform specific configuration. If I wrote it in bash I'd probably involve the OS at every step.

-- EH

Your central point here is what I thought you meant. As a side issue, introducing extra facilities to help do a translation does matter. Convenience in writing code matters, ease of making changes matters, a speed difference of conceivably a million fold can often matter. Maybe you mean it's not the central issue.

I mean what I said. Convenience is not binary. The set of homoiconic languages seems to be fuzzy and membership values are possibly subjective. -- EH

Bash is an interesting example, I've been thinking of bringing it up for a while. The fact that it involves the OS at most every step is just part of what it is, though; once you start writing in bash, that's practically a given, so it's not quite the same thing (although it certainly has motivated people to translate bash scripts to other languages).

How much a language provides and how much it gets from the OS doesn't matter when we look at the definition of something like "homoiconic". If Lisp invoked a separate executable for each keyword it wouldn't make it less homoiconic. -- EH


On a different issue: Your example is using strings to represent the manipulatable code that you're then compiling and executing; you said "I could build an AST of my code and modify that to more closely match the way Lisp does it". Could you clarify what you mean? At first glance I don't see how going down that road would lead to a closer match.

-- dm

ANTLR comes with a Java grammar that can translate between Java source and ASTs. If I used it I could build an AST representing the source. The value to be assigned to "b" would be a node on an AST. I wouldn't search for BEFORE_VALUE and AFTER_VALUE; I'd just walk the tree as you did in the Lisp example. -- EH

Oh, I see. Ok. I'll say something about that later. -- dm

[The whole point of the discussion, Eric, was to define the term homoiconic, which you've been battling. The term was meant to point out the difference between languages that have that feature, and languages that don't. The hoops you're jumping through to accomplish this task in Java, should make you stop and think what makes it so much work to do in comparison to the Lisp example, because that which makes it so much work in Java is is the lack of what's being defined as homoiconic. You don't seem willing to separate the term homoiconic from turing complete, because your examples only prove java is turing complete, not homoiconic. So rather than repeating endless arguments and examples, why don't you just clearly state what your problem is with the term or its definition.]

My problem is that I don't think languages "have" or "don't have" this feature. I think it's more complex than that and languages lie along an axis of homoiconicity. I thought I'd clearly stated this several times. That's why I created DefiningDiscreteSetsOfLanguages. -- EH

Ahhh, okay I understand now. You're a LanguageAbuser. You're the type of person who says that C is OO because it is remotely conceivable to write OO code in C, given that it's Turing complete. You're an extreme relativist out to destroy any succinct communication by assaulting the meaning behind words, smearing them into meaninglessness. Nice. -- ??

No, I don't say C is OO. I say OO programs can be written in C with some difficulty. (I also say OO programs can be written in C++ with some difficulty, albeit less difficulty than in C.) I say C has an infinitesimal membership value in the set of OO languages for that reason. By treating the set of OO languages as a fuzzy set and acknowledging membership as subjective we can avoid any debate about which language is OO and which isn't, which is "more" OO, etc. I'm hardly an extreme relativist (I created EverythingIsRelativeStrangeLoop to show why extreme relativism is self-contradictory). Sometimes succinct communication sacrifices validity for brevity. -- EH

[If we can't define meaning to words and give examples of "here it is" and "here it isn't", then it's damn hard to communicate about anything. C is not OO, and no matter how you hack up OO in C, it still isn't OO. Java and C aren't homoiconic, and no matter how you hack up homoiconicy, they still won't be homoiconic. How are we to discuss these differences if we can't define words to describe them? You are attempting to make any such definition meaningless, how does that further discussion about those differences? How does anything you've done here contribute to the discussion rather that try to avoid it? There is a difference between these languages, that difference is, one is homoiconic, and one isn't, it's not a continuum.]

In addition to having meaning, I think our words should be as accurate as possible. It's fine to give examples of where it is and isn't, but we should also acknowledge where parts of it are. The current definition of HomoiconicLanguages is:

"Languages in which program code is represented as the language's fundamental data type are called 'homoiconic'. Such languages allow code and data to be DeeplyIntertwingled, so that new code can be generated and manipulated by the program itself at runtime."

My Java example represents Java code in one of Java's fundamental data types. Java doesn't provide ASTs as a "fundamental" data type, but it does provide strings. It "intertwingles" data and code (although not as "deeply" (again, a subjective qualifier) as Lisp). New code is generated and manipulated by the program itself at runtime. That doesn't make it as homoiconic as the Lisp example, but it shows that it has some partial membership in that set of languages. We can compare the ease of doing this in different languages and establish some ordering between them. It gives us a mathematical framework for determining if Smalltalk is more homoiconic than machine language, if Lisp is more homoiconic than JavaScript, or if Java is more homoiconic than C. I don't want to abuse language. I want to prevent its abuse.

-- EricHodges

Sigh... it's pointless discussing this with you then because the word has no meaning if you continue to claim that every language has it. Java has no membership in that set of languages just as C has no membership in the set of OO languages. If you put C or Java into a string, it's no longer C or Java, it's a string. You don't have to put Lisp into a list, because Lisp is already a list. Do you not understand that? All lisp code is already a list, all Java and C code is not already a string, you have to put it into a string. Never mind, I'm sick of talking to you, you're as pig headed and obtuse as Top.

Java code is strings, not ASTs or lists.

Oh really... then compile this in Java.... "int aNumber = 1;", oh that's right, it won't compile, because you have to remove the quotes for it to be valid Java. Java may be seen as a string by the compiler... but Java isn't a string at the language level... Lisp is a list at the language level, that is a significant difference, regardless of what you want to label it. I think you guys don't really grok what we're saying when we say at the language level. Maybe we're not explaining it well enough, I don't know, but it's so blindingly obvious to me that I don't know how to say it any simpler.

The compiler converts strings to ASTs of bytecodes, but those ASTs aren't Java. The word "homoiconic" doesn't define a discrete set of languages, but that doesn't strip it of meaning. The word "tree" doesn't define a discrete set of plants (one plant might be x% bush and y% tree, and its membership value in those sets may change over the course of its life), but that doesn't strip the word "tree" of meaning.


Let's focus the discussion a bit. Almost all languages (from BASIC to Java) can generate source code because source code can be strings, invoke a compiler, write a compiler from scratch, represent the parsed ASTs recover that from the compiler, manipulate ASTs, all in all they can do whatever Lisp can. After all they are turing complete. The problem is that they "can", only in theory. Just like in theory you can write business applications in Assembly language walking up indexes, b-trees and other file structure to gather data using video interrupts to put things on the screen and so on, so forth.

The fact that a language "can" in theory do something is irrelevant for discussing language design issues. Languages like Java and assembly can do everything in theory. The question is how easy it is to do this or that. And here we have both a tentative formal theoretical framework to compare languages for expressiveness, but we also have empirical observations. Out of the armies of Java programmers how many do manipulate Java ASTs or Java bytecodes. Very few. Even frameworks like gnu.bytecode or Apache's BCEL are exceedingly difficult to use and error prone. One cannot implement a meta-programming library (say aspects) in them with the same ease of use that one can do it in LISP. By the way such meta-programming is cited as example in LongFunctionsInLisp. Writing a mini-CLOS in Java is of an essential technical difficulty, orders of magnitudes more difficult than writing the full CLOS in Lisp. And mind you, CLOS is not LISP specific, it can be viewed as an abstract specification of an object system that could be ported to programming languages other than LISP.

So the question is: how easy it is to do meta-programming in Java ? And the answer is that it is essentially difficult, orders of magnitudes more difficult than in Lisp or Scheme, and this comes from a language design decision. The designer of Java decided that the familiarity of C-style syntax, and consequently the potential for commercial success were fundamentally more important than the ability to support LISP style meta-programming, by making language elements uniform and simple. This disqualifies Java from any claim to "homiconicity" (what a strange word), or more simply it disqualifies java as is from meta-programming. Aspect Oriented frameworks do recover a bit from the meta-programming power that homoiconic languages have.

But "ease" is relative, expressed as a point along an axis, not a boolean value. It's easier (for me, at least) to do meta-programming in Java than C. It's easier to do meta-programming in JavaScript than Java. It's easier to do meta-programming in Lisp than JavaScript. These form a continuum of languages, not discrete sets. I've never argued that it's easy to write homoiconic code in PDP-11 assembly, just that it's possible.

You're wrong. It is a discrete set, either the language represents its source code as its fundamental data type, or it doesn't, period. Pay attention to what everyone's been telling you and quit repeating the same wrong argument over and over.

Those terms were invented to try and make Eric understand what homoiconic means, he never did, I'm tired of discussing it. So believe whatever you like Eric, you obviously care little for the truth.

We've been down this road. PDP-11 assembly source is represented in a fundamental data type of PDP-11 assembly language. If PDP-11 is therefore a member of the discrete set of homoiconic languages, then "ease" is clearly not a determining factor, as was argued immediately above my response.


Can someone explain why there's so much anger in some of the arguments in favor of a discrete set of homoiconic languages? Why does it matter so much?

There's no anger, you're just misinterpreting me, that's frustration, not anger. Since you have yet to show an understanding of what homoiconic means, I don't think you're in a position to argue about what it should mean and what languages are or aren't. You've got to demonstrate that you "get it" before you can attempt to redefine it.

How would I demonstrate that I "got it" (other than agreeing that there exists a discrete set of homoiconic languages)? Doug told me if I translated the Lisp example into Java that would help me "get it", but if anything, it's made me more aware of how fuzzy this set is.

[I viewed it as an important step, to make things more concrete. I do not at all think that discussion of your translated example is complete, by any means, I'm just not feeling into fast and furious edits. Several people have made rather good comments that might be worth summarizing. -- dm]


From above, regarding the fact that the terminology definitions I attempted on HomoiconicLanguages have defects:

OK. I suggest we define the ideal properties and features of a homoiconic language using LISP 1.0 as an example. We then categorize languages by which of these properties and features they provide and assign them relative ease of use values.

Ok. I think we should also look at what most practical TuringEquivalent languages can do: typically we have

We don't want "homoiconic" to refer merely to that, because then "homoiconic" wouldn't have any useful ability to distinguish unusual languages such as Lisp 1.0, yet that was the intent of coining the term in the first place.

So we need to be careful to maintain, at minimum, the differentiated polar opposites represented by Lisp 1.0, and by "most languages".

We also need to remember the literal definition: homo-iconic: same representation of code and data.

Exactly what this means in practice is a headache, because we informally call language text in a file, such as C code, a "string", and yet that is using the word "string" in a somewhat different sense than the string literals that may also appear in that same file.

I also think we need to look at why Tcl is widely called "homoiconic", since it is claimed to represent everything as a string. (I'm mildly familiar with Tcl, and have a book on it, but I'm not a Tcl expert as such.) Whatever Tcl is doing with strings should be distinguished somehow from what C does with strings.

I agree. I find this part of the definition deeply unsatisfying. If someone can elaborate on why interpreting strings makes TCL homoiconic but interpreting or compiling strings doesn't make other languages homoiconic, please do.

Looking at RK's comment about Smalltalk above, we should also look at the relationship to reflection. And of course, metacircular interpreters. These three terms have interesting connections, but nonetheless are not synonyms.


Tcl is single typed. Everything is a string. Everything. Sometimes they are quoted in one of several ways (including but not limited to double quotes), and sometimes they don't need to be, but they are always treated as strings and stored as strings.

It does have e.g. numeric operators that interpret the strings as a numeral representing a number - it would have to, to be really usable - but they still accept and produce strings.

When a variable $a is encountered during interpreter evaluation, it is replaced macro-style by its string value.

Furthermore, all strings are parsed by the Tcl language implementation. Quoting mechanisms can turn off evaluation, which is a second stage after parsing.

Tcl uses this to implement e.g. conditionals; it evaluates the IF string or the ELSE string depending on the outcome of the test condition.

All of this is extremely different from the way that e.g. C or Java treats strings.


It appears to be significant that, in both Lisp and Tcl, the primary language definition of conditionals depends on quoting the IF and ELSE blocks of code, and evaluating just one of them.

Phrasing this in a bullet-proof way is hard, because a similar phrasing might apply to the C language...but here's a clarification: it is furthermore possible in both Lisp and Tcl to write a function MY_IF that takes 3 parameters: a boolean, an IF clause, and an ELSE clause, that behaves exactly the same way as the native conditional.

That is not possible in e.g. C or Java, where MY_IF(condition, if_clause, else_clause) inherently forces evaluation of all three clauses before the MY_IF function is even called.

This is possible in Smalltalk, however Smalltalk does not allow manipulation of the internals of the code blocks.

This seems to be related to the core notions of homoiconicity, without encompassing the whole notion.

So if the above were suitably paraphrased, I think it would become a portion of the new improved definition of homoiconicity (not the whole definition). -- dm

Here's an alternate proposal: write a function that when given a program (pre-condition: it doesn't contain macros and it is not self modifying), adds instrumentation code to all if statements, and then provide path coverage upon running the program. Now that you can trivially do it in Lisp, but you can't as easily do it in Java, Smalltalk, C, etc.

[You can do it (with varying degrees of difficulty) in any TuringEquivalent language. I want something other than "easier" or "harder" to distinguish homoiconic, otherwise it's a fuzzy set.]

Ha, ha. But your requirements conflict with turing completeness anyways. Any test you can ever come up with will be solvable even in assembly -- in the worst case by implementing a Lisp interpeter :) So to disambiguate, you have the following criteria: given only the standard libraries of the language (one can always imagine a suitable library to do almost anything), if it takes one hour or more than a day, then that's your difference. As I explained somewhere above, all the differentiation between programming language X versus Y involve something that can be done orders of magnitude easier in one versus the other. otherwise, most of them are turing complete.

This is interesting, and I don't disagree, but don't forget that "homoiconic" is a language property, not a language algorithm. -- dm

Well, but to make it a meaningful property, we should provide a suitable test that a non-homiconic language will fail. Implementing a function (it doesn't need to be IF for that matter) with the semantics of lazy evaluation will show just that: the language has support for lazy evaluation, or even semantics for evaluation. In the end, Eric might have been onto something. Bravo, Eric.

It just dawned on me while writing the above paragraph that it is not a property of the language, it can be only a property of standard libraries and the runtime. Java does not have the convenience of LISP's cons ? Screw the LISP cons. I have it already in my little library, and it is not even that important. How many student examples have we seen in LISP like language where a linked list is improperly use just for the sake of convenience whereas a programmer in an Algol derived languages would have used a better suited data structure like an array or a hashtable ? After all, LISP does have other arrays, LISP has tons of other structures that are better for other domains. CONS is the machine code of data structures. Who really cares if all the structure of a LISP program can be reduced to a huge tree of conses ? The programs of a language with a fancier syntax can also be reduced to a tree of structures that can fit a nice hierarchy.

In the end all it takes for Java to be homoiconic is for Sun to make public some classes that are used by the compiler. Imagine java.lang.reflect had publicized classes like Statement, Expression, Assignment, CompoundStatement?, ForStatement?, WhileStatement?, Sequence, Expression, PlusExpression?, ChoiceOperator? ... etc. Then, voila, Java can do all LISP can do with regards to its homoiconicity. Would Java hierarchy of language elements be more complex ? Well, of course, but this is just because LISP's complexity is shifted in other places. The basic element of LISP is the cons cell, but then there's a lot of things in syntax elements, standard macros and standard functions. Would the addition of such reflective classes to Java change Java's fundamental design as a language ? I do not think so. It would be one more powerful package added to the runtime.

By the way, did I express the opinion "homoiconic" has a clumsy if not altogether ugly sound ? If it was something sexier, maybe it was worth the fight. In the end it goes like this: either the language exposes the elements of its syntax in the standard library or it doesn't. Java doesn't do it, but the decision whether or not to do it, is as trivial as the decision on whether or not to include javax.net.ssl as part of the distribution, it is not really a language design problem. --CostinCozianu

Costin, you have somehow gotten off track. Don't forget the central point: "homo-iconic" means same representation of code and data.

No I don't think I've gotten off track at all. The central data structure in Java is objects. If you are familiar with the Java compiler, it represents code as objects. The only difference is that they choose to hide that, and also not make it available at runtime. You are also very unclear when you say the language should represent code as data. What code ? Source code at compile time ? Source code at interpreter run-time ? Parsing tree at both runtime or compile time ? Compiled to byte-code code ? Just in-time compile code ? Compiled code ? I'm affraid you'll continuously shoehorn your definition into "only Lisp/Scheme running under interpreter".

The only way you can make the definition relevant is by specifying what functionality it is good for, how it benefits the programmer. I gave you such an example, that can be done in Lisp an order of magnitude easier than in Java taking advantage of what you call homoiconicity. Except that the same thing can be done in Java without modifying the language, but by making public some classes already existent in the compiler.' And this is your proof that homiconicity fundamentally dfoes not belong to languaged design space. Algol languages can be just as homoiconic as CONS based languages. There's nothing mystical about that cons that makes it so much better than the classical ALGOL structure: blocks, statements, procedures, etc. You can put them in a tree just the same way you put conses in a tree. So the difference is irrelevant.


If Java repesent code as Object then you should be able to do something like this

 Statement x = if (y == 0 ) { System.out.println("yes") };

y.execute(5);

But you can't, right?

Your definition of Java being homoiconic is like saying Java is lazy evaluation language because you can create an Object with Callback in it. Or may be i could also say Java is also language which allow no side effect at all because every time I compile my code I run a program that check all byte code of the resulting program and reject if it contains any mutating statements. Or I could say Assembly is Visual Programming language since I write All my code In Visual Programming tool which have me only point and click (no coding need), then when I push 'generate' it generate assembly code.

Homoiconic is the property of the language. It's like saying birds are animal that can fly.

Human can fly now, using Aeroplane. Since Human is considered turing complete, we can do anything. But that doesn't make human bird, does it? Or else we can carry a gold fish on to the aeroplane with us and call that fish is also bird.

See ExcerptionNotAbstraction for more explanation along these lines.


New attempt at definition moved to NewAttemptedHomoiconicDefinition


Primarily to illustrate the limitations of the definition on HomoiconicExampleInManyProgrammingLanguages, here's a simple Java program that does this:

and does not do this:
 import junit.framework.TestCase;
 public class HomoiconicTest extends TestCase {
   public void testHomoiconicJava() {
     class Variable {
       int value;
     }
     class CodeBlock {
       int assignmentValue = 15;
       void assign(final Variable variable) {
         variable.value = this.assignmentValue;
       }
     }
     final Variable variable = new Variable();
     variable.value = 1;
     assertEquals(1, variable.value);
     final CodeBlock codeBlock = new CodeBlock();
     codeBlock.assign(variable);
     assertEquals(15, variable.value);
     System.out.println("The variable's contents is " + variable.value);
     //
     variable.value = 1;
     assertEquals(1, variable.value);
     codeBlock.assignmentValue = 37;
     codeBlock.assign(variable);
     assertEquals(37, variable.value);
     System.out.println("The variable's contents is " + variable.value);
     //
     System.out.println("The code block assigns the value " + codeBlock.assignmentValue + " to the variable value.");
   }
 }

It's a hack and a cheat. But it does seem to pass the test. ;->

Ahem:

It's an interesting hack, though, and such things can be quite valuable.

(Oh, and I wouldn't claim that this code is homoiconic at all. Nor is it polymorphic.)


Some people seem to be missing the point here: we are talking two completely different languages that do not necessarily have anything to do with each other: the Java programming language and the Java Virtual Machine. Aside from the fact that they have the same author (of the spec and reference implementations), carry the same brand in their name, and were designed to complement one another, there is nothing that ties these two. The JVM isn't just a "java implementation" like gcc is a C implementation, it's a language (and a platform, i.e. a reference implementation) in itself. The JVM's language is not Homoiconic because there is no way to access your code at run time. Even if you'd parse your class file, there's no way to actually know where the class file is, because it could come from anywhere (file system, memory, or the web). Java isn't homoiconic either, since it doesn't even have a native way to represent code. How can you even check the basic premise of homoiconicity if you can't even beg the question. - WouterLievens?

"The JVM's language is not Homoiconic because there is no way to access your code at run time."

There isn't? If the JVM's "language" consists of the JVM bytecode spec, then there does seem to be a way for the JVM to access it's code at runtime. It doesn't matter where it came from (any more than it matters where Lisp code comes from), just that it is available. -- EricHodges

"Java isn't homoiconic either, since it doesn't even have a native way to represent code."

I can beg all sorts of questions. What do you mean by "native"? Java code is represented by Unicode strings, and Java has a "native" class for those. Those strings are translated to JVM bytecodes (in the reference implementation), but you've argued for the separation of Java and its VM, so we can't consider JVM bytecodes as the "native" representation of Java code. -- EricHodges


EditText of this page (last edited November 13, 2014) or FindPage with title or text search