Homoiconic Example In Java

[Sprung forth from the brow of HomoiconicLanguages]

Related concepts: reflection, metacircular, bootstrapping, lazy evaluation...

Note to googlers: I see this page is popular with google; be warned that this page is currently low signal to noise ratio, contains mostly unfinished arguments between multiple people (including some who do not use homoiconic languages), and despite the title, does not mean that Java is a homoiconic language (it's not).

[now-deleted example in Java is] Not homoiconic for quite a few reasons, but the easiest one is that it doesn't work on my system, nor on millions of other systems that support Java, because "javac" doesn't exist.

The Lisp example always works.

That just means it isn't portable. It still does the same things the Lisp example did.

[No it doesn't, the Lisp example doesn't invoke an outside program. This page is silly, how many times must you be told you can't invoke a compiler to do this. And writing a file and compiling it is the same ignorance I've seen in C when trying to say they can generate code at runtime, it completely misses the point, namely, native language support for this capability. Understand... NATIVE, it must be defined in the language spec, the language must provide this ability, or it isn't homoiconic. If you write your own eval, it's still not homoiconic, because your eval isn't defined in the language spec. OK, do you get it yet, if you have to hack up eval, it isn't homoiconic!]

The point isn't so much that it isn't portable, the point is that you are putting all of the important part of the computation into this string that contains "javac"; the semantics of that string are unknown and unknowable to the implementation of Java. On my system, "javac" might start up tetris, or do "rm -rf /", or fire up a lisp interpreter that runs the Lisp example, or absolutely any other computation.

The point is, you can't write the Java program you are attempting in such a way that it runs on my system; you're doing things that are not part of Java.

Precisely the same critique would apply if your Java program did nothing except fire up a Lisp interpreter on a Lisp program that your Java program synthesized. It demonstrates nothing whatsoever about the properties of Java to call an exterior executable.

I agree that my "eval" method is not part of Java, but for any system on which Sun's JDK is installed an "eval" method can be written. Lisp provides eval (and other conveniences), but eval is possible in Java.

Remember, I'm not claiming that Java or C are homoiconic languages. I'm suggesting that the set of homoiconic languages is not a discrete set. I think it's a fuzzy set where membership is determined by the ease with which homoiconic programs can be written. This would let us do things like place machine code less in the set than Lisp, for instance.

And if meta-circularity was equivalent to homoiconicity, then you'd have yourself a good example. But it doesn't, so you don't. Eval does not a HomoiconicLanguage make. Meta-circularity is possibly a fuzzy set. Here's an interesting difference. In the lisp example, it's impossible to make an improperly formed program. The program may not be logical, it may not reference valid symbols, and thusly it may not run, but it is always well formed and parses correctly. When your code and your primary data type are the same, you can't even ENTER an invalidly formed program into the system. Because your code and data types are the same, you can't make a program which does.

Sure I can. I can (and did a few years ago) create an AST package for Java that only allowed the creation of properly formed Java. I wouldn't be using a "primary" or "fundamental" data type, but I that wouldn't stop me.

Which misses the point. You added that functionality to the language. It wasn't there to start with.

I'm a pragmatist. It doesn't really matter to me what functionality is part of the language and what functionality is provided by other code. If I can do it, I can do it. And I can do it.

Boss: "Ok, group, we need to do something homoiconic, which language should we use?" You: "I'm a pragmatist, so it doesn't matter which language we use; I can always write thousands of lines of code to make up for the fact that the language isn't homoiconic." Boss: "That's not pragmatic. Pragmatic would be picking a homoiconic language so that you could avoid that extra work."
No, I'd say use a language that makes it easier. Haven't I made that clear by now? I'd draw axes representing the different features we plan to use, weight them, plot all the possible languages' position along them and pick the one that best suits our needs. I wouldn't exclude a language that was slightly less in one set than another because the features we needed weren't provided with the language. I would have a more accurate model of the capabilities of all of the languages under consideration than if I created discrete sets that weren't based on ease of use.
- You're dodging the issue. You (and forgive me if I am wrong, this page is a ThreadMess) are contending that the Homoiconic set is fuzzy. It is not. We've given crystal-clear criterion for homoiconicty. You seem to be trying to redefine it. Java is not homoiconic. Your example, as given, doesn't even implement homoiconicity, just a weak and hacky meta-circularity. It's about the definition of homoiconic. It's like arguing that a flat tire isn't really flat because there is still a bit of air in there, when for all intents and purposes the tire is utterly useless.
- Only if you introduce the arbitrary criteria of "fundamental" data types. Even with that, there's still room to debate the membership of Turing's imaginary typewriter language, machine code, BrainfuckLanguage, etc. Why is it so important that the set be discrete?
- The idea of a fundamental data type is not arbitrary. If there is room to debate these other languages, let's do it. Either they're in or they're out (and I'm pretty sure most of those are out). A homoiconic language is a meta-circular language (a language defined in terms of itself) with the idea that the code and data are represented in the same data type. The set isn't fuzzy, your understanding of the set's boundaries are. You keep confusing the issue here, but it's really quite clear.
  - I'm going to try explaining this from a different tack. Most languages keep the code on one side and the data on another, conceptually. Like a Turing Machine. A Turing Machine can be made to operate on another Turing Machine, or take instructions (obviously), but a Turing Machine is not homoiconic. Most languages follow in this mold. C, for example, keeps its current code in one section and that code operates on data. The fact that on some implementations of C, it was possible to write (very tricky) code to self-modify (and in fact, this was common practice once), it was still not "part of C", in the sense that it was implementation and architecture specific. Java, following in C's footsteps, also has this distinction. Java has a standard library (which is not part of the language) which allows some weak runtime code modification (loading of new classes, introspection, no modifications though), but is nowhere near homoiconic. The code is still in a no-touch-special-place between the conceptual legs of the system. Lisp, on the other hand, doesn't have this concept.
  - Why aren't Turing machines homoiconic? They can create, manipulate and write and execute their own instructions. That was the core of Turing's insight. By allowing it to operate on its own instructions as it does on data he saw that it could emulate any other machine.
  - I'm pretty sure the author above meant Harvard Architecture machines. Machines which keep instructions and data separate, like the Intel CPU does.
  - My bad. I said one thing, meant another. Woo, indeed.

This is a notable difference from say, using a regex library to alter your code, where you CAN make invalid code that will not parse. This is because you're representing code in a datatype which has no realization that it's code. Sure, it's code, but that's a decision that context makes. A bug in your program means it's no longer valid code (oops, erased a semicolon!). You cannot do this in Lisp. If you enter in an invalid thing (like an unclosed list, or an invalid symbol (they do exist)) the program can't even read it. It's not a valid data-structure.

And I can do the same in Java. You can make Java mimic it, but you can't make Java, as a language itself, homoiconic. Please show me this AST type in your Java language spec.

Creating Eval in Java is cool. But it's not a homoiconic language, even if you succeed. And you didn't succeed. Even if you wrote a parser and made new datatypes so that you could mimic this property, it wouldn't be a homoiconic language because... it is not a core part of the language. Sure it's implementable (albeit awkwardly), but it's not part of the language.

Lots of things aren't part of the language, but we use them all the same. I don't understand the significance of that. It is true. However, that's the crux of this argument. The language, itself, as a standalone spec, does not include this data. Adding a library to Java to support these interactions does not mean that JAVA ITSELF has these features. Further, compare and contrast the code you'd use. The Java code is blocky, clumsy, laborious. Its meta-manipulation involves a lot of cumbersome work. The lisp version has support in the core of the language, way down at the most basic commands of the language (car, cdr, cons are essential to lisp. Without them, lisp does not exist, they aren't part of a library. They're like the equals operator in C, or the new operator in C++. Immutable. You can hook into them, of course, but you can't change what they fundamentally do).

Here's another example of Java code, using ASTs to build the expression instead of strings:

b = 3;
System.out.println("b=" + b);
SimpleAssignmentOperator? assignment = 
new SimpleAssignmentOperator?("com.ebh.fuzzyjava.FuzzyJava?.b", "15");
System.out.println("a=" + assignment);
eval(new Expression(assignment));
System.out.println("b=" + b);
assignment.setValue("37");
System.out.println("a=" + assignment);
eval(new Expression(assignment));
System.out.println("b=" + b);

This is more "blocky" than the Lisp example, but that's going to be true no matter what the code does. I agree that this doesn't make Java more homoiconic, but it shows that the behavior of the Lisp example can be added to Java. If Java (or any other language) can do what a homoiconic language does, why should I care if it uses a library to do it?

You can do in Java everything you can do in Lisp, and you can do in Lisp everything you can do in Java. They are both TuringComplete. But Lisp clearly fits the definition of homoiconic, and Java clearly doesn't. Homoiconicity isn't fuzzy. It's just a few people's understanding of it.

Then let's make the distinction clear to everyone. What part of the definition excludes Java? What statement can be made about homoiconic languages that can't be made about Java?

There are no parts. AFAIK the definition of 'language foo is homoiconic' is that foo represents code and data with exactly the same data types. That's yes or no, black or white, and it does exclude Java, because Java doesn't fit the definition. And this very definition is, trivially, a statement that can be made about every homoiconic language, and at the same time not a correct statement when applied to Java. If you have a Java program on your screen, you can interpret the source as a parse tree, imagine what would happen if you swapped two branches of the tree, modified a numerical value in one of the leafs, and so on. But the JavaLanguage itself does not allow you to actually do that, because the fundamental data types of Java are boolean, char, byte, short, int, long, float, double, and Object, and a Java program is none of them. The fundamental data types of LISP are atom and list, and every LISP program _is_ a list. A Java program is just ... a Java program.

But a Java program is a string of Unicode characters. Java has a String data type that is a string of Unicode characters. Therefore it represents code and data with exactly the same data type and is homoiconic. That's my problem with the definition. It makes Java just as homoiconic as Lisp, and no one wants that to be the case.

No, a Java program is not a java.lang.String. If it was and if Java was homoiconic (the two conditions are orthogonal), then a Java program could access the very java.lang.String object in which it is contained and, say, call the length() method. Pseudo-Java example:

 public class Example {
     public static void main(String[] arg) {
         System.out.println("length: " + Example.length());
     }
 }

However, this is not possible. (Maybe someone could give an example in LISP that prints the number of atoms the program itself contains?) Even if you implemented the length() method for the type Example and make it check the length of the file Example.java, this wouldn't have anything to do with proving homoiconicity. At the time this method was executed, it would (a) be expressed as Java bytecode, which is an entirely different language - even if Java bytecode was homoiconic, that wouldn't make JavaLanguage homoiconic - and (b) have no way of knowing which file it was compiled from. As I said, Java's datatypes are the so-called primitive types and java.lang.Object, and a Java program is neither of them.

Again: nope! I addressed this at bottom of page already (where you didn't comment): [we informally call language text in a file, such as C code, a "string", and yet that is using the word "string" in a somewhat different sense than the string literals that may also appear in that same file.]

Note that well: "different sense". I then tried to clarify a little: [I'm just commenting on the very loose informal practice of speaking of any sequence of bytes or characters as a "string", because that practice confuses things. For our current purposes, I don't want to ever use "string" to refer to a sequence of bytes in a file, only to refer to a true language data type, strictly.]

So no, I contradict you: a Java program is not a string nor a String of Unicode characters, not in any sense that is useful in this context, only in a loose sense which can only confuse things. On this page it is essential to be more careful with terminology: a Java program is a sequence of characters, and this is different than the language datatype "string".

I also refreshed my memory about some issues in Tcl strings, and commented on that at bottom of page.

I don't see how anything at the bottom of the page leads us to believe that a Java program is anything other than a string of Unicode characters. The Java compiler reads it as a String from the source file.

So? What was in the file to start with was a "string" only in loose parlance.

A file of Java source contains a sequence of characters, and a Java string contains a sequence of characters, so the contents of the file can be read into a Java string, but that doesn't make them the same thing by any means. Consider that the file of Java source can itself contain Java string literals, and that there is something different about that string compared with the sequence of characters in which it is embedded. If you call them both "strings" then you lose that important distinction. So to speak more carefully, "character sequence" should be distinguished from "instance of Java string type".

Huh? There's no difference between a Java String literal and the Java String the compiler compiles. They are both strings (in the general definition) and Strings (in the Java language definition).

That is 100% unrelated to what I said. We are now talking about something that is not at all exotic, but rather commonplace, so I'm afraid you're going to have to just put some thought into getting the point expressed above, rather than allowing yourself to get a bit confused based on a fast reading.

''You said "what was in the file to start with was a 'string' only in loose parlance." In fact, what is in the file is a String in Java's specific definition of that term from the compiler's perspective. There is no "loose parlance" there.

False on all counts. You need to think harder about what's already been said. You can equally well say that the file is a sequence of bits. A Java integer literal is only a sequence of bits. Therefore the file is an integer. No.
No one is listening to me. I give up.
- [I was listening...listening and disagreeing. -- dm]
- You're wrong. We are listening. The fact that DougMerritt has written such huge, detailed and through responses to you is a testament to his patience. The only person refusing to listen here seems to be you, since you keep bringing up tiny, and only tangentially related issues. My only recommendation for you at this point is to learn a homoiconic language if you don't know one already. You'll really see what people are talking about here.
- Thanks. You know, credit is due Eric here. I think it is no coincidence that he gave up in despair at the same time that you accused him of being non-Arisian, so clearly being accused of being an Eddorian caused him real anguish; a true Eddorian wouldn't do that, they'd just destroy your whole galaxy in response. So clearly he is no Eddorian.
- Addendum by another AnonymousDonor who wrote a bit of the responses here: Please, don't give up. Even if you think that whoever said you should think harder was just becoming defiant and angry (attacking you instead of coming up with any logical arguments), you shouldn't take it that way. It has often been said that it might be very hard to tell a homoiconic language from a heteroiconic one if you've never really experienced working with a homoiconic one, and that a programmer will benefit from learning the alternate way of think of, for example, LISP, even if he never needs the language in practice. It has become obvious that we have no good way of explaining the concept of homoiconicity clearly and succinctly, but we might be able to come up with one. This is what makes this discussion interesting.
  - Agreed. Nor do I even think, a week later, that what I said was out of line. I already covered the material Eric was arguing (not once, not twice, at least three times, probably more), so I think it is perfectly reasonable to say so rather than repeating myself. But yes, coming up with a more clear and succinct definition of homoiconicity is the goal.
- What's wrong with 'reflection of source code'? It's clear and succinct, and a Smalltalker wouldn't need to have it explained to them or even scratch their head even if they've never, ever encountered a homoiconic feature in their life. You'd be leveraging the much broader understanding of reflection.
  - The problem is that "reflection of source code" is not synonymous with homoiconic, although it certainly is interestingly related.
- Is a Smalltalk program aware iof its own source code? If a block of code refers to object A, then to A again, then to B, the program knows that the code first deals with one and the same object twice, and then with another object - but does it also know that in some other world they have the names 'A' and 'B'?

You also said there is something different between the String the compiler reads and the Strings embedded in it. That's true, but no different from the lists Lisp interprets and the lists embedded in them. I think you should put some thought into how to better express your point, or if you have a point at all.

Lisp programs are lists. When you parse Java, C, or C++ you need to not only figure out the tokens that are incoming, but also figure out (from the grammar of the language) the tree structure. There is enough information in the file, but the actual tree structure (before any kind of optimization) bears almost no resemblance to the source code. Lisp, on the other hand, does not need to infer any tree structure. We have explicitly told it. There is still a lexing stage, of course (remember, a file that contains data need not be a lisp program, we need to identify and of course provide a rudimentary sanity check).

Forget about string literals or other notationally different datatypes for a moment as you think of this. They're immaterial. It's the structure we're talking about. Lisp has two data types, a list and a symbol. A symbol is anything besides the list begin and end characters. Basically, Lisp skips the complex parsing step, instead just doing what amounts to a matching brace check.

But this is overly technical. Java has no native AST type. Therefore, it could never be homoiconic. Java uses different formats for code and data. It cannot be homoiconic. Java conceptually treats data and code as totally different things. It cannot be homoiconic. You might be able to force java to perform operations we typically associate with a homoiconic language, but that means nothing. Homoiconicity is a property of a language, not a property of any specific action.

You're making this harder than it needs to be. The Arisians would be very disappointed at your muddy, unclear thoughts.

"I knew the Lensman. The Lensman was a friend of mine. You, sir, are no Lensman."
- Good thing too. Who wants to be a Lensman? Not me. You've got to have a lot of... well... intestinal fortitude for that job.
It might be helpful to remind people what a compiler does to a collection of bytes that may (or may not!) be a program. It takes them, uses a lexer to resolve distinct tokens. These tokens are then "parsed", meaning they are checked and if okayed, placed into a syntax tree. These two components (the lex'd tokens and the tree itself) are what we are talking about, because these are the actual program.]
In that light, it becomes clear that Java converts a string into an AST to do work. Further, Java has no built-in notation or type for an AST. Each user must derive their own depending on what they want to do. Meaning that it doesn't really have the actual program when it has a string, nor is the capability to manipulate a program really embedded into the language.
If someone were to argue, "But why are user-defined types excluded from this? The distinction seems arbitrary!" Consider this: If you say that a language has a feature set that is equal to all its builtin features and the sum of any feature possibly implemented by the language, then you can claim any language has any feature, and those features become meaningless. We already have a term for the totality of computability as we know it, it's called "Turing Complete". Features that are "natively supported" do matter, though. Perl wouldn't be half the language it is without its regular expression syntax. Ruby would be much less useful without blocks. Clearly, we need to make the distinction to allow ourselves to meaningfully compare languages.
That's as crystal-clear as I think it's possible to say, "Java is not, and probably will never be, homoiconic."

There's no need to exec javac. Something like the following, from ApacheAnt (simplified), would work fine:

 Class c = Class.forName("sun.tools.javac.Main");
 Constructor cons = c.getConstructor(new Class[] { OutputStream?.class, String.class });
 Object compiler = cons.newInstance(new Object[] { logstr, "javac" });
 Method compile = c.getMethod("compile", new Class [] { String[].class }); 
 compile.invoke(compiler, new Object[] {cmd.getArguments()});

If the dependency on sun.tools.javac.Main bothers, supply a different one.

No need for all of that reflection. You can call the compiler directly like this:

 sun.tools.javac.Main.compile(new String[]{fileName})

Does the above invoke an outside compiler? If it does, I'm quite willing to delete this page on the basis that it aggravates me to even watch Eric get the same answer a thousand different times and keep dismissing it.

Here's another example which doesn't invoke javac or java executables and doesn't use files to share the value of b. It creates a new Java class, compiles it and loads it into the running VM. New new class modifies a static variable in the calling class.

 package com.ebh.fuzzyjava;

 import java.io.FileWriter;
 import java.io.IOException;
 import java.lang.reflect.InvocationTargetException;
 import java.lang.reflect.Method;

 import com.sun.tools.javac.Main;

 public class FuzzyJava {
private static final String AFTER_VALUE = ";";
private static final String BEFORE_VALUE = "FuzzyJava.b=";
private static final String AFTER_CLASS_NAME = " {";
private static final String BEFORE_CLASS_NAME = "public class ";
public static int b;
public static void main(String[] args)
throws
SecurityException,
IllegalArgumentException,
IOException,
InterruptedException,
NoSuchMethodException,
InstantiationException,
IllegalAccessException,
InvocationTargetException,
ClassNotFoundException {
b = 3;
System.out.println("b=" + b);
String a = createVariableSetterCode("test1",15);
System.out.println("a=" + a);
eval("test1", a);
System.out.println("b=" + b);
System.out.println("current variable setter value=" + getVariableSetterValue(a));
a = changeVariableSetterValue("test2", a, 37);
System.out.println("a=" + a);
eval("test2", a);System.out.println("b=" + b);
}

private static void compile(String fileName)
throws
SecurityException,
NoSuchMethodException,
IllegalArgumentException,
InstantiationException,
IllegalAccessException,
InvocationTargetException,
ClassNotFoundException {
System.out.println(Main.compile(new String[] { fileName }));
}

private static String changeVariableSetterValue(String className, String a, int i) {
a = changeClassName(className, a);
int startPos = a.indexOf(BEFORE_VALUE) + BEFORE_VALUE.length();
int endPos = a.indexOf(AFTER_VALUE, startPos);
return a.substring(0, startPos) + i + a.substring(endPos);
}

private static String changeClassName(String className, String a) {
int startPos = a.indexOf(BEFORE_CLASS_NAME) + BEFORE_CLASS_NAME.length();
int endPos = a.indexOf(AFTER_CLASS_NAME, startPos);
return a.substring(0, startPos) + className + a.substring(endPos);
}

private static int getVariableSetterValue(String a) {
int startPos = a.indexOf(BEFORE_VALUE) + BEFORE_VALUE.length();
int endPos = a.indexOf(AFTER_VALUE, startPos);
String valueString = a.substring(startPos, endPos);
return Integer.parseInt(valueString);
}

private static void eval(String className, String a)
throws
IOException,
InterruptedException,
SecurityException,
IllegalArgumentException,
NoSuchMethodException,
InstantiationException,
IllegalAccessException,
InvocationTargetException,
ClassNotFoundException {
FileWriter writer = new FileWriter(className+".java");
writer.write(a);
writer.close();
compile(className+".java");
execute(className);
}

private static void execute(String className) 
              throws IllegalArgumentException, 
              InvocationTargetException, 
              SecurityException, 
              NoSuchMethodException, 
              InstantiationException, 
              IllegalAccessException, 
              ClassNotFoundException {
Class c = Class.forName(className);
Object newVariableSetter = c.newInstance();
Method setter = c.getMethod("setB", null);
setter.invoke(newVariableSetter, null);
}

private static String createVariableSetterCode(String className, int i) {
return "import java.io.DataOutputStream;"
+ "import java.io.FileOutputStream;"
+ "import java.io.IOException;"
+ "public class "+className+" {"
+ "public void setB() throws IOException "
+ "{"
+ "com.ebh.fuzzyjava.FuzzyJava.b="+i+";"
+ "}}";
}
 }

Are you purposely being dense, or do you really not see how your sample proves Java isn't homoiconic?

I'm not arguing that Java is homoiconic, and I'm probably accidentally being dense.

Then why is the page named homoiconic Java example when Java clearly isn't homoiconic?

Because Doug asked me to translate the Lisp example into Java. I'll rename the page to avoid confusion.

He asked you that so that you'd realize while doing it, that Java wasn't homoiconic and the example can't be translated without violating the rule that the language itself must do it. Did you not learn that lesson? Do you not see that your example isn't homoiconic, you stepped outside the language to accomplish it? You wrote to the filesystem, then called a compiler, so you didn't translate the Lisp example, you butchered it. This page title is still misleading, Java isn't homoiconic.

Java isn't homoiconic. This example is. So, I'd say the page title fits perfectly well. -- JonathanTang No, it certainly is not homoiconic. -- DaveFayram

[FuzzyJava?.eval("int x = 10;"); doesn't work, does it? How can the FuzzyJava? example work with the local variable in the method I call eval() from?]

Hmm, I disagree, since homoiconicy is a language property, not a property of any particular piece of code that's faking it. Just my opinion though, I could be wrong.

Where did I step "outside" the language? I agree that I wrote to the filesystem, but that isn't "outside" the language. I called the compiler inside the language and it executed inside the language. My program performs the same behaviors as the Lisp example. If I wanted I could build an AST of my code and modify that to more closely match the way Lisp does it, but I thought my example demonstrated the principles well enough without requiring the curious to install something like ANTLR. All of this looks like it's inside the Java language from where I sit. I'm sorry if the page title is misleading. Rename it if you have a better one. Doug asked me to translate the homoiconic example from Lisp to Java, so "homoiconic example in Java" seemed descriptive to me. Once again, I'm not claiming Java is a homoiconic language.

-- EricHodges

OK, maybe I'm off base here, if so I apologize, but show me where the "compiler" is defined in the Java language specification. If it's not, then you went outside the language twice, once for the filesystem, and once for the compiler. Using a library to do your work for you isn't doing it inside the language.

Every Java program that does anything other than change its own memory and exit goes outside the language by that definition. My program went outside the language just to print the value of "b". All Java programs that stay inside the language are no-ops of varying lengths.

[Sorry, but the filesystem is certainly outside the language from my point of view. From the point of view of an OS designer. What would happen if the filesystem you were writing to turned out to be over a network and someone forgot to keep track of the endianness? Your code wouldn't compile despite there being nothing wrong with your Java compiler, despite the problem being completely outside Java's power to prevent or correct. And that's the key factor that determines whether something is in the language or outside of it; if the facility doesn't work, does it mean that the language is broken?]

And if the OS forgot to keep track of the endianness of RAM the Lisp example wouldn't work either. -- EH

It is true that needing to use a filesystem in this example is an inelegant workaround, at best - I presume you would actually agree, and that your point here is that this is an aspect of the "fuzziness" you've been mentioning - that the need to do a filesystem workaround distances this from an ideal example.

On the other hand, using methods from com.sun.tools.javac.Main makes this actually an interesting example, much more so than the previous one that gave the "javac" executable file to execute.

It does, alas, suffer from the weakness that it is unsupported and installation dependent prior to J2SE 1.5, but I suppose that's a minor issue, since it's standard in 1.5. This is obviously a pretty handy thing to have in the language.

So it would seem that your point is that you can translate the algorithm of the Lisp example, but would agree that it is further from an ideal example in that:

it is currently nonstandard (although that will change)
it is in the Java tools extension rather than the Java core that is in all implementations/platforms
it uses a file in a filesystem, which is a workaround that is more indirect than the Lisp example. (Also I suppose the Java core doesn't include file operations, since Java can run in some highly minimal environments.)
it took significantly more lines of code and effort to accomplish the algorithm in question.

I'm not going to argue the difference between properties and algorithms, nor the "fuzziness" issue, this instant; I'm just summarizing. -- dm

Absolutely. I wouldn't do something like this in Java. (But I do it in JavaScript all the time.) My point is that the language doesn't make it easy, but neither does it prevent me from doing it. I used to write object oriented code in C even though the language didn't make that easy. It was more difficult than C++ but still possible. Therefore I'd assign C less membership in the OO language set than C++.

If I find a way to stream the code to the compiler without writing a file then I might start thinking about actually doing this in production code. I'd write a package to make it easier, though.

And I don't really think that using com.sun.tools.javac.Main made the example more interesting. The previous example demonstrated the idea. It doesn't really matter if a step involves the operating system or some platform specific configuration. If I wrote it in bash I'd probably involve the OS at every step.

-- EH

Your central point here is what I thought you meant. As a side issue, introducing extra facilities to help do a translation does matter. Convenience in writing code matters, ease of making changes matters, a speed difference of conceivably a million fold can often matter. Maybe you mean it's not the central issue.

I mean what I said. Convenience is not binary. The set of homoiconic languages seems to be fuzzy and membership values are possibly subjective. -- EH

Bash is an interesting example, I've been thinking of bringing it up for a while. The fact that it involves the OS at most every step is just part of what it is, though; once you start writing in bash, that's practically a given, so it's not quite the same thing (although it certainly has motivated people to translate bash scripts to other languages).

How much a language provides and how much it gets from the OS doesn't matter when we look at the definition of something like "homoiconic". If Lisp invoked a separate executable for each keyword it wouldn't make it less homoiconic. -- EH

On a different issue: Your example is using strings to represent the manipulatable code that you're then compiling and executing; you said "I could build an AST of my code and modify that to more closely match the way Lisp does it". Could you clarify what you mean? At first glance I don't see how going down that road would lead to a closer match.

-- dm

ANTLR comes with a Java grammar that can translate between Java source and ASTs. If I used it I could build an AST representing the source. The value to be assigned to "b" would be a node on an AST. I wouldn't search for BEFORE_VALUE and AFTER_VALUE; I'd just walk the tree as you did in the Lisp example. -- EH

Oh, I see. Ok. I'll say something about that later. -- dm

[The whole point of the discussion, Eric, was to define the term homoiconic, which you've been battling. The term was meant to point out the difference between languages that have that feature, and languages that don't. The hoops you're jumping through to accomplish this task in Java, should make you stop and think what makes it so much work to do in comparison to the Lisp example, because that which makes it so much work in Java is is the lack of what's being defined as homoiconic. You don't seem willing to separate the term homoiconic from turing complete, because your examples only prove java is turing complete, not homoiconic. So rather than repeating endless arguments and examples, why don't you just clearly state what your problem is with the term or its definition.]

My problem is that I don't think languages "have" or "don't have" this feature. I think it's more complex than that and languages lie along an axis of homoiconicity. I thought I'd clearly stated this several times. That's why I created DefiningDiscreteSetsOfLanguages. -- EH

Ahhh, okay I understand now. You're a LanguageAbuser. You're the type of person who says that C is OO because it is remotely conceivable to write OO code in C, given that it's Turing complete. You're an extreme relativist out to destroy any succinct communication by assaulting the meaning behind words, smearing them into meaninglessness. Nice. -- ??

No, I don't say C is OO. I say OO programs can be written in C with some difficulty. (I also say OO programs can be written in C++ with some difficulty, albeit less difficulty than in C.) I say C has an infinitesimal membership value in the set of OO languages for that reason. By treating the set of OO languages as a fuzzy set and acknowledging membership as subjective we can avoid any debate about which language is OO and which isn't, which is "more" OO, etc. I'm hardly an extreme relativist (I created EverythingIsRelativeStrangeLoop to show why extreme relativism is self-contradictory). Sometimes succinct communication sacrifices validity for brevity. -- EH

[If we can't define meaning to words and give examples of "here it is" and "here it isn't", then it's damn hard to communicate about anything. C is not OO, and no matter how you hack up OO in C, it still isn't OO. Java and C aren't homoiconic, and no matter how you hack up homoiconicy, they still won't be homoiconic. How are we to discuss these differences if we can't define words to describe them? You are attempting to make any such definition meaningless, how does that further discussion about those differences? How does anything you've done here contribute to the discussion rather that try to avoid it? There is a difference between these languages, that difference is, one is homoiconic, and one isn't, it's not a continuum.]

In addition to having meaning, I think our words should be as accurate as possible. It's fine to give examples of where it is and isn't, but we should also acknowledge where parts of it are. The current definition of HomoiconicLanguages is:

"Languages in which program code is represented as the language's fundamental data type are called 'homoiconic'. Such languages allow code and data to be DeeplyIntertwingled, so that new code can be generated and manipulated by the program itself at runtime."

My Java example represents Java code in one of Java's fundamental data types. Java doesn't provide ASTs as a "fundamental" data type, but it does provide strings. It "intertwingles" data and code (although not as "deeply" (again, a subjective qualifier) as Lisp). New code is generated and manipulated by the program itself at runtime. That doesn't make it as homoiconic as the Lisp example, but it shows that it has some partial membership in that set of languages. We can compare the ease of doing this in different languages and establish some ordering between them. It gives us a mathematical framework for determining if Smalltalk is more homoiconic than machine language, if Lisp is more homoiconic than JavaScript, or if Java is more homoiconic than C. I don't want to abuse language. I want to prevent its abuse.

-- EricHodges

Sigh... it's pointless discussing this with you then because the word has no meaning if you continue to claim that every language has it. Java has no membership in that set of languages just as C has no membership in the set of OO languages. If you put C or Java into a string, it's no longer C or Java, it's a string. You don't have to put Lisp into a list, because Lisp is already a list. Do you not understand that? All lisp code is already a list, all Java and C code is not already a string, you have to put it into a string. Never mind, I'm sick of talking to you, you're as pig headed and obtuse as Top.

Java code is strings, not ASTs or lists.

Oh really... then compile this in Java.... "int aNumber = 1;", oh that's right, it won't compile, because you have to remove the quotes for it to be valid Java. Java may be seen as a string by the compiler... but Java isn't a string at the language level... Lisp is a list at the language level, that is a significant difference, regardless of what you want to label it. I think you guys don't really grok what we're saying when we say at the language level. Maybe we're not explaining it well enough, I don't know, but it's so blindingly obvious to me that I don't know how to say it any simpler.

The value in exercises like this is not in whether we browbeat dissenters to come around to our point of view, the value is in discovering how to clarify our thinking and our phrasings more than we would have reason to do otherwise.

The compiler converts strings to ASTs of bytecodes, but those ASTs aren't Java. The word "homoiconic" doesn't define a discrete set of languages, but that doesn't strip it of meaning. The word "tree" doesn't define a discrete set of plants (one plant might be x% bush and y% tree, and its membership value in those sets may change over the course of its life), but that doesn't strip the word "tree" of meaning.

Let's focus the discussion a bit. Almost all languages (from BASIC to Java) can generate source code because source code can be strings, invoke a compiler, write a compiler from scratch, represent the parsed ASTs recover that from the compiler, manipulate ASTs, all in all they can do whatever Lisp can. After all they are turing complete. The problem is that they "can", only in theory. Just like in theory you can write business applications in Assembly language walking up indexes, b-trees and other file structure to gather data using video interrupts to put things on the screen and so on, so forth.

The fact that a language "can" in theory do something is irrelevant for discussing language design issues. Languages like Java and assembly can do everything in theory. The question is how easy it is to do this or that. And here we have both a tentative formal theoretical framework to compare languages for expressiveness, but we also have empirical observations. Out of the armies of Java programmers how many do manipulate Java ASTs or Java bytecodes. Very few. Even frameworks like gnu.bytecode or Apache's BCEL are exceedingly difficult to use and error prone. One cannot implement a meta-programming library (say aspects) in them with the same ease of use that one can do it in LISP. By the way such meta-programming is cited as example in LongFunctionsInLisp. Writing a mini-CLOS in Java is of an essential technical difficulty, orders of magnitudes more difficult than writing the full CLOS in Lisp. And mind you, CLOS is not LISP specific, it can be viewed as an abstract specification of an object system that could be ported to programming languages other than LISP.

So the question is: how easy it is to do meta-programming in Java ? And the answer is that it is essentially difficult, orders of magnitudes more difficult than in Lisp or Scheme, and this comes from a language design decision. The designer of Java decided that the familiarity of C-style syntax, and consequently the potential for commercial success were fundamentally more important than the ability to support LISP style meta-programming, by making language elements uniform and simple. This disqualifies Java from any claim to "homiconicity" (what a strange word), or more simply it disqualifies java as is from meta-programming. Aspect Oriented frameworks do recover a bit from the meta-programming power that homoiconic languages have.

But "ease" is relative, expressed as a point along an axis, not a boolean value. It's easier (for me, at least) to do meta-programming in Java than C. It's easier to do meta-programming in JavaScript than Java. It's easier to do meta-programming in Lisp than JavaScript. These form a continuum of languages, not discrete sets. I've never argued that it's easy to write homoiconic code in PDP-11 assembly, just that it's possible.

What part of orders of magnitude or essential difficulty you do not understand ? These are fairly strong criteria. Is the same kind of criteria that allows us to say that x86 machine code is not object oriented. Can you write object oriented systems in machine code ? Of course you can. It's just that it's orders of magnitude more difficult. That's the criteria and it's sufficiently discriminatory. How long does it take to an experienced Java programmer to write a better object system in Java ? Months if not years. Probably would have to invent a modified Java along the way just like it happened in case of AspectJ. How long would it take for a similar effort to be done in Lisp. To boot up, within 8 hours you're ready. You can finesse it for one week and that's it, and you haven't come up with a different Lisp in the process. It's not just easier in the sense that writing system programs in C is easier than in Pascal, it's essentially easier in the same way that writing OO software in Java is essentially easier than writing them in assembly. Plus, isn't it ironic to come up with 5 languages and call that "continuum" ? What kind of math have you studied ?
We agree. I can write OO code in assembly, it's just more difficult than in Smalltalk. The difference in difficulty can't be expressed by a boolean, it's a real number. I assert that there's a continuum because there exist an infinite number of imaginary languages that could occupy the spaces between the real ones.
No way! First off your mathematics is wrong. An infinite set does not in itself constitute a continuum. Also, although the number of languages is unbounded in principle, it obviously will always be a rather small number in practice.
More importantly, you have just put your finger on one of the critical issues, but you said it backwards. You can write OO in assembly, but assembly is not an OO language! The OO stuff is not supported by the language. That's all it takes for a language to either have or not have the property of being OO.
The difference most certainly can be expressed as a boolean. Assembly is not OO, and no argument can be made otherwise. Are there border cases where certain languages are kinda OO, sure, but this isn't one of them. There's always a little grey around the edges, but for the most part, it's a black and white issue. In the same vein, Java is not homoiconic, homoiconicy is a discrete set, for the most part. Your inability to distinguish this is astounding. Writing OO code in assembly does not make assembly an OO language, and writing homoiconic code in any language does not make that language homoiconic, this isn't turing equivalence. A language is homoiconic when and only when it provides the NATIVE ability to manipulate code, get that... NATIVE, if you have to hack it up yourself, it's not homoiconic. Homoiconicy is a property of the LANGUAGE, not of the code. It either is, or it isn't in most cases. In the case of Java and C, it quite obviously isn't. In the case of Smalltalk, well, that could be a grey area. In the case of Lisp, it's a shining example of what Homoiconicy is.
If there's "a little grey around the edges" then the set is not discrete.
Look, don't be an ass, nothing is absolute, but that doesn't mean it's not discrete in a practical sense. For the most part... male and female are discrete sets, and it's common to discuss them as if they are. Are there exceptions, sure, there's exception to just about everything, but for practical matters they are a discrete set, same goes for homoiconic languages. How do we have any productive discussion about anything if you can't accept common meaning to words so we can communicate?
I am using the common meaning for the words "discrete set" and "fuzzy set". Smalltalk has partial membership in the set of homoiconic languages. You said so yourself. Therefore the set is fuzzy and not discrete. Why can't we just call it what it is? What's the source of your resistance?
The source of the resistance is that you're using "discrete" and "fuzzy" to obliterate the meaning of the word homoiconic by trying to include every language on a continuum, and it doesn't work that way. Of all the languages being discussed, there's only smalltalk that I'd consider "fuzzy", the rest are either homoiconic, or they aren't. Your unwillingness to recognize this simple fact, makes you unreasonable in my eyes.
And the fact that you agree Smalltalk has partial membership but refuse to call the set fuzzy makes you unreasonable in mine.
Fuzzy is a word you use when you can't explain what you actually mean.
See http://www.cs.cf.ac.uk/Dave/AI2/node99.html for what fuzzy actually means.
- The definition given there is reasonable, but FYI there are some other definitions too, that apply to domains that the above definition does not apply to.
You misunderstand. So far we have 4 categories: non-homoiconic, weakly homoiconic, strongly homoiconic, and completely homoiconic. That is 4 discrete sets, that is not a fuzzy spectrum. You have never made an argument for turning those 4 discrete sets into a spectrum. -- dm
They don't sound like 4 discrete sets in their definitions.
Oh?
No. Which set contains Smalltalk? They don't address the ease of use issue, so machine code is exactly as homoiconic as Lisp. They use subjective phrases like "sort of tacked onto the side".
Ease of use issues may argue for fuzziness in categorizing something as a "homoiconic algorithm", but viewing this as a question of algorithms has been your agenda, explicitly not my agenda. I long ago said that "homoiconic" was a property, not an algorithm. The Lisp example is an algorithm, true, and your Java example is an algorithm, true, but they are algorithms that are intended to illustrate issues pertaining to the homoiconic property.
- I'm talking about the ease of use of a language. Machine code and Lisp may both be homoiconic, but it's easier to use the homoiconic features of Lisp.
RK and I discussed Smalltalk briefly and tentatively put it in the weakly homoiconic set.
- It isn't apparent from the definition that Smalltalk is weakly homoiconic.
- Smalltalk isn't so easily parsable as Lisp, nobody's gotten into the habit of using parse trees, and blocks don't store source code but compile it down. All Smalltalk's got going for it is evaluate: and storeOn: (the opposite of eval) but some fairly basic language features are implemented in terms of these things. I'm not talking about metacircularity here, which ST can do also, but much more basic things like reading in source code from files and initializing shared variables. Intertwigling of code and data, at a rudimentary level, is not only a supported core language feature but it's depended upon in the implementations of some other core language features.
- [yes; and this touches on reflection issues]
- The definition of weakly homoiconic was given as "a language (not a program) that has homoiconic features sort of tacked onto the side, but the core language itself is not homoiconic". Why isn't the "core" Smalltalk language homoiconic? In what way are these features "tacked onto" Smalltalk?
- None of these features is something a programmer would encounter on a routine basis. As far as they're concerned, the language isn't homoiconic. A systems programmer would encounter these features.
I have not denied that machine code is homoiconic, I merely expressed reservations about it, largely for reasons I haven't expressed very much because they continue to plague me.
DF just pointed out a new point of view about what is wrong with the attempted definition of weakly homoiconic.
My personal ability (or lack thereof) to correctly categorize languages, or to come up with ideal definitions, is of course irrelevant. Most of what you seem to think is an argument for fuzzy categories seems to me to be merely pointing out defects in the way that I defined "weakly homoiconic". Mea culpa, mea maxima culpa. I'm not perfect. Let's fix it.
- OK. I suggest we define the ideal properties and features of a homoiconic language using LISP 1.0 as an example. We then categorize languages by which of these properties and features they provide and assign them relative ease of use values.
[Suggestion: I've been thinking for several months that it would be appropriate to create a page contrasting modern thinking about definitions via prototypes versus classical categorical definitions - i.e. a meta page about what approach should be used for definitions in general and how to decide whether individual elements meet those definitions. Perhaps it's time to address those meta issues, on ApproachesToDefinitions. -- dm]

You're wrong. It is a discrete set, either the language represents its source code as its fundamental data type, or it doesn't, period. Pay attention to what everyone's been telling you and quit repeating the same wrong argument over and over.

[Unfortunately, I have to side with EH against you since HomoiconicLanguages defines pure, strong and weak categories of homoiconicity. And you're supporting yourself on the strong, or pure even, category of homonicity. Smalltalk for example is only weakly homoiconic but the language makes heavy use of even the little homoiconicity it does have. Moreover, it would take only minor changes to Smalltalk to make it much more strongly homoiconic. Java is not at all homoiconic and introducing an eval function will only make it barely register on the scale.]

Those terms were invented to try and make Eric understand what homoiconic means, he never did, I'm tired of discussing it. So believe whatever you like Eric, you obviously care little for the truth.

We've been down this road. PDP-11 assembly source is represented in a fundamental data type of PDP-11 assembly language. If PDP-11 is therefore a member of the discrete set of homoiconic languages, then "ease" is clearly not a determining factor, as was argued immediately above my response.

Can someone explain why there's so much anger in some of the arguments in favor of a discrete set of homoiconic languages? Why does it matter so much?

There's no anger, you're just misinterpreting me, that's frustration, not anger. Since you have yet to show an understanding of what homoiconic means, I don't think you're in a position to argue about what it should mean and what languages are or aren't. You've got to demonstrate that you "get it" before you can attempt to redefine it.

How would I demonstrate that I "got it" (other than agreeing that there exists a discrete set of homoiconic languages)? Doug told me if I translated the Lisp example into Java that would help me "get it", but if anything, it's made me more aware of how fuzzy this set is.

[I viewed it as an important step, to make things more concrete. I do not at all think that discussion of your translated example is complete, by any means, I'm just not feeling into fast and furious edits. Several people have made rather good comments that might be worth summarizing. -- dm]

From above, regarding the fact that the terminology definitions I attempted on HomoiconicLanguages have defects:

OK. I suggest we define the ideal properties and features of a homoiconic language using LISP 1.0 as an example. We then categorize languages by which of these properties and features they provide and assign them relative ease of use values.

Ok. I think we should also look at what most practical TuringEquivalent languages can do: typically we have

Natively-available strings which can contain anything we like, including constructs that happen coincidentally to be constructs in any language we like, including but not limited to the host language
Often not natively available, but nonetheless the ability to write an evaluator for any language we like, including but not limited to the host language, which can potentially accept as input the aforementioned strings.

We don't want "homoiconic" to refer merely to that, because then "homoiconic" wouldn't have any useful ability to distinguish unusual languages such as Lisp 1.0, yet that was the intent of coining the term in the first place.

So we need to be careful to maintain, at minimum, the differentiated polar opposites represented by Lisp 1.0, and by "most languages".

We also need to remember the literal definition: homo-iconic: same representation of code and data.

Exactly what this means in practice is a headache, because we informally call language text in a file, such as C code, a "string", and yet that is using the word "string" in a somewhat different sense than the string literals that may also appear in that same file.

I also think we need to look at why Tcl is widely called "homoiconic", since it is claimed to represent everything as a string. (I'm mildly familiar with Tcl, and have a book on it, but I'm not a Tcl expert as such.) Whatever Tcl is doing with strings should be distinguished somehow from what C does with strings.

Not in Smalltalk. In Smalltalk, a file is a file, and a string is what you get from executing '/home/user/myParcel.pcl' asFilename contentsOfEntireFile. And strings is what the Smalltalk compiler works on, not files. Can someone explain how C reads in code?
- Yes, and that's similar to what is true in other languages, too. I'm just commenting on the very loose informal practice of speaking of any sequence of bytes or characters as a "string", because that practice confuses things. For our current purposes, I don't want to ever use "string" to refer to a sequence of bytes in a file, only to refer to a true language data type, strictly.
- Oh wait...lemme grab my ST book...Ok, I'm confused again. We have a string issue in Smalltalk, too. It'll probably have to wait on the Tcl issue.
- I hope you're not going by ST-80. I certainly am not. And can you describe the "string issue"? Btw, what about homoiconic as reflection of source code?
- You caught me, I grabbed a ST-80 book. But what you said hasn't changed, it said what you said about Smalltalk source and strings, so does it matter? Which Smalltalk book should I have looked at?
- The "string issue" is the above comment about Tcl that needs resolution. Homoiconic as reflection of source code: if you've got an idea, let's hear it.
- Reflection of source code. That is the idea. :)
- It seems to me that a language could reflect all compiled code in such a way that it's extremely easy to manipulate. Would it be homoiconic then? I don't think so. Meanwhile, the manipulability of source code is the critical difference between Lisp and Smalltalk. And that can be stated as that Lisp reflects source code in a much deeper and more powerful manner.
- I'm not too sure about the "one fundamental data type" idea. Smalltalk doesn't have a SourceCode class. But if it did then everything's an object and source code is an object too. Would 'object' be the one fundamental data type? Or would it fail because not everything is represented by SourceCode objects?
- (I don't use any books. I doubt any of them talk about namespaces, for example, despite namespaces being pretty standard in Smalltalks).

I agree. I find this part of the definition deeply unsatisfying. If someone can elaborate on why interpreting strings makes TCL homoiconic but interpreting or compiling strings doesn't make other languages homoiconic, please do.

Looking at RK's comment about Smalltalk above, we should also look at the relationship to reflection. And of course, metacircular interpreters. These three terms have interesting connections, but nonetheless are not synonyms.

Tcl is single typed. Everything is a string. Everything. Sometimes they are quoted in one of several ways (including but not limited to double quotes), and sometimes they don't need to be, but they are always treated as strings and stored as strings.

It does have e.g. numeric operators that interpret the strings as a numeral representing a number - it would have to, to be really usable - but they still accept and produce strings.

When a variable $a is encountered during interpreter evaluation, it is replaced macro-style by its string value.

Furthermore, all strings are parsed by the Tcl language implementation. Quoting mechanisms can turn off evaluation, which is a second stage after parsing.

Tcl uses this to implement e.g. conditionals; it evaluates the IF string or the ELSE string depending on the outcome of the test condition.

All of this is extremely different from the way that e.g. C or Java treats strings.

It appears to be significant that, in both Lisp and Tcl, the primary language definition of conditionals depends on quoting the IF and ELSE blocks of code, and evaluating just one of them.

Phrasing this in a bullet-proof way is hard, because a similar phrasing might apply to the C language...but here's a clarification: it is furthermore possible in both Lisp and Tcl to write a function MY_IF that takes 3 parameters: a boolean, an IF clause, and an ELSE clause, that behaves exactly the same way as the native conditional.

That is not possible in e.g. C or Java, where MY_IF(condition, if_clause, else_clause) inherently forces evaluation of all three clauses before the MY_IF function is even called.

[Not if if_clause is expressed as a string.]
- Nay nay; string is not the native representation of code in C or Java, so that's not a homoiconic ("same representation") form. In Tcl, code is natively represented as a string. -- dm

This is possible in Smalltalk, however Smalltalk does not allow manipulation of the internals of the code blocks.

This seems to be related to the core notions of homoiconicity, without encompassing the whole notion.

So if the above were suitably paraphrased, I think it would become a portion of the new improved definition of homoiconicity (not the whole definition). -- dm

But on the other hand writing the same function in a non-homoiconic language but a language with lazy evaluation, is trivial. So it is a bad test.
- That is an excellent point! Thanks for that interesting observation. (Note however that I was only suggesting it might be necessary but not sufficient, which isn't necessarily disproven yet.) -- dm

Here's an alternate proposal: write a function that when given a program (pre-condition: it doesn't contain macros and it is not self modifying), adds instrumentation code to all if statements, and then provide path coverage upon running the program. Now that you can trivially do it in Lisp, but you can't as easily do it in Java, Smalltalk, C, etc.

[You can do it (with varying degrees of difficulty) in any TuringEquivalent language. I want something other than "easier" or "harder" to distinguish homoiconic, otherwise it's a fuzzy set.]

Ha, ha. But your requirements conflict with turing completeness anyways. Any test you can ever come up with will be solvable even in assembly -- in the worst case by implementing a Lisp interpeter :) So to disambiguate, you have the following criteria: given only the standard libraries of the language (one can always imagine a suitable library to do almost anything), if it takes one hour or more than a day, then that's your difference. As I explained somewhere above, all the differentiation between programming language X versus Y involve something that can be done orders of magnitude easier in one versus the other. otherwise, most of them are turing complete.

This is interesting, and I don't disagree, but don't forget that "homoiconic" is a language property, not a language algorithm. -- dm

Well, but to make it a meaningful property, we should provide a suitable test that a non-homiconic language will fail. Implementing a function (it doesn't need to be IF for that matter) with the semantics of lazy evaluation will show just that: the language has support for lazy evaluation, or even semantics for evaluation. In the end, Eric might have been onto something. Bravo, Eric.

It just dawned on me while writing the above paragraph that it is not a property of the language, it can be only a property of standard libraries and the runtime. Java does not have the convenience of LISP's cons ? Screw the LISP cons. I have it already in my little library, and it is not even that important. How many student examples have we seen in LISP like language where a linked list is improperly use just for the sake of convenience whereas a programmer in an Algol derived languages would have used a better suited data structure like an array or a hashtable ? After all, LISP does have other arrays, LISP has tons of other structures that are better for other domains. CONS is the machine code of data structures. Who really cares if all the structure of a LISP program can be reduced to a huge tree of conses ? The programs of a language with a fancier syntax can also be reduced to a tree of structures that can fit a nice hierarchy.

In the end all it takes for Java to be homoiconic is for Sun to make public some classes that are used by the compiler. Imagine java.lang.reflect had publicized classes like Statement, Expression, Assignment, CompoundStatement?, ForStatement?, WhileStatement?, Sequence, Expression, PlusExpression?, ChoiceOperator? ... etc. Then, voila, Java can do all LISP can do with regards to its homoiconicity. Would Java hierarchy of language elements be more complex ? Well, of course, but this is just because LISP's complexity is shifted in other places. The basic element of LISP is the cons cell, but then there's a lot of things in syntax elements, standard macros and standard functions. Would the addition of such reflective classes to Java change Java's fundamental design as a language ? I do not think so. It would be one more powerful package added to the runtime.

By the way, did I express the opinion "homoiconic" has a clumsy if not altogether ugly sound ? If it was something sexier, maybe it was worth the fight. In the end it goes like this: either the language exposes the elements of its syntax in the standard library or it doesn't. Java doesn't do it, but the decision whether or not to do it, is as trivial as the decision on whether or not to include javax.net.ssl as part of the distribution, it is not really a language design problem. --CostinCozianu

I find it's much sexier when pronounced "ho-MOY-ko-nik". -- EH

Costin, you have somehow gotten off track. Don't forget the central point: "homo-iconic" means same representation of code and data.

No I don't think I've gotten off track at all. The central data structure in Java is objects. If you are familiar with the Java compiler, it represents code as objects. The only difference is that they choose to hide that, and also not make it available at runtime. You are also very unclear when you say the language should represent code as data. What code ? Source code at compile time ? Source code at interpreter run-time ? Parsing tree at both runtime or compile time ? Compiled to byte-code code ? Just in-time compile code ? Compiled code ? I'm affraid you'll continuously shoehorn your definition into "only Lisp/Scheme running under interpreter".

Yes. Completely off track, without question. Look at your last claim, that I'm heading for only Lisp/Scheme is homoiconic. Complete BS. I have said quite the opposite.

The only way you can make the definition relevant is by specifying what functionality it is good for, how it benefits the programmer. I gave you such an example, that can be done in Lisp an order of magnitude easier than in Java taking advantage of what you call homoiconicity. Except that the same thing can be done in Java without modifying the language, but by making public some classes already existent in the compiler.' And this is your proof that homiconicity fundamentally dfoes not belong to languaged design space. Algol languages can be just as homoiconic as CONS based languages. There's nothing mystical about that cons that makes it so much better than the classical ALGOL structure: blocks, statements, procedures, etc. You can put them in a tree just the same way you put conses in a tree. So the difference is irrelevant.

If Java repesent code as Object then you should be able to do something like this

 Statement x = if (y == 0 ) { System.out.println("yes") };

 y.execute(5);

But you can't, right?

Your definition of Java being homoiconic is like saying Java is lazy evaluation language because you can create an Object with Callback in it. Or may be i could also say Java is also language which allow no side effect at all because every time I compile my code I run a program that check all byte code of the resulting program and reject if it contains any mutating statements. Or I could say Assembly is Visual Programming language since I write All my code In Visual Programming tool which have me only point and click (no coding need), then when I push 'generate' it generate assembly code.

Homoiconic is the property of the language. It's like saying birds are animal that can fly.

Human can fly now, using Aeroplane. Since Human is considered turing complete, we can do anything. But that doesn't make human bird, does it? Or else we can carry a gold fish on to the aeroplane with us and call that fish is also bird.

See ExcerptionNotAbstraction for more explanation along these lines.

New attempt at definition moved to NewAttemptedHomoiconicDefinition

Primarily to illustrate the limitations of the definition on HomoiconicExampleInManyProgrammingLanguages, here's a simple Java program that does this:

Create a "code" data structure (block) for assigning 15 to a variable
Evaluate it and view the variable's contents (15)
Modify that data structure to assign 37 to the variable
Evaluate it and view the variable's contents (37)
Optional: view the code blocks, if the environment allows it.

and does not do this:

Escaping to the OperatingSystem to save source files to disk or perform compilation
Accessing language internals that are not portably defined in the language definition

 import junit.framework.TestCase;
 public class HomoiconicTest extends TestCase {
   public void testHomoiconicJava() {
     class Variable {
       int value;
     }
     class CodeBlock {
       int assignmentValue = 15;
       void assign(final Variable variable) {
         variable.value = this.assignmentValue;
       }
     }
     final Variable variable = new Variable();
     variable.value = 1;
     assertEquals(1, variable.value);
     final CodeBlock codeBlock = new CodeBlock();
     codeBlock.assign(variable);
     assertEquals(15, variable.value);
     System.out.println("The variable's contents is " + variable.value);
     //
     variable.value = 1;
     assertEquals(1, variable.value);
     codeBlock.assignmentValue = 37;
     codeBlock.assign(variable);
     assertEquals(37, variable.value);
     System.out.println("The variable's contents is " + variable.value);
     //
     System.out.println("The code block assigns the value " + codeBlock.assignmentValue + " to the variable value.");
   }
 }

It's a hack and a cheat. But it does seem to pass the test. ;->

Ahem:

It wasn't a "test", it was an example I offered to try to illustrate the idea.
It was disputed that it was a good example, and I agreed with the critique.
Even disregarding that, as discussed on a related page today, such things might be homoiconic facilities, but as add-ons, don't make the language without the add-on homoiconic.

It's an interesting hack, though, and such things can be quite valuable.

(Oh, and I wouldn't claim that this code is homoiconic at all. Nor is it polymorphic.)

Some people seem to be missing the point here: we are talking two completely different languages that do not necessarily have anything to do with each other: the Java programming language and the Java Virtual Machine. Aside from the fact that they have the same author (of the spec and reference implementations), carry the same brand in their name, and were designed to complement one another, there is nothing that ties these two. The JVM isn't just a "java implementation" like gcc is a C implementation, it's a language (and a platform, i.e. a reference implementation) in itself. The JVM's language is not Homoiconic because there is no way to access your code at run time. Even if you'd parse your class file, there's no way to actually know where the class file is, because it could come from anywhere (file system, memory, or the web). Java isn't homoiconic either, since it doesn't even have a native way to represent code. How can you even check the basic premise of homoiconicity if you can't even beg the question. - WouterLievens?

"The JVM's language is not Homoiconic because there is no way to access your code at run time."

There isn't? If the JVM's "language" consists of the JVM bytecode spec, then there does seem to be a way for the JVM to access it's code at runtime. It doesn't matter where it came from (any more than it matters where Lisp code comes from), just that it is available. -- EricHodges

"Java isn't homoiconic either, since it doesn't even have a native way to represent code."

I can beg all sorts of questions. What do you mean by "native"? Java code is represented by Unicode strings, and Java has a "native" class for those. Those strings are translated to JVM bytecodes (in the reference implementation), but you've argued for the separation of Java and its VM, so we can't consider JVM bytecodes as the "native" representation of Java code. -- EricHodges