Homoiconic Faq

This page comes from refactoring HomoiconicLanguages, which is way too big. Despite skepticism from after-the-fact commentators, this FAQ absolutely did in fact derive from back and forth conversation, with (for the most part) different people answering questions than those who posed the questions...i.e. no, it's not a fake FAQ, it's a real FAQ, regardless of opinions about its quality.

FAQ

Q: So it's the same thing as TuringEquivalent, and basically all programming languages are homoiconic?

A: Not at all. The term exists to differentiate programs where data and code are interchangeable. Lisp, Tcl, and Snobol are examples of homoiconic languages. Fortran, Cobol, C, C++, Java, Ada, Basic are examples of "heteroiconic" languages (this term is reasonable but not in wide use). Also, although in common usage Turing Equivalence is usually assumed, but that's not strictly necessary. A regular expression language that allowed regex search and replace on other regexes which were then executed would be homoiconic (regular expressions are not Turing Equivalent; see ChomskyHierarchy).

Q: Why not? I can write C programs to manipulate C programs.

A: Because the manipulation isn't supported natively in the language; the manipulation of C programs done by e.g. a C compiler written in C is done only by that particular program, and is not possible for other programs written in C.

(What about C# and the whole System.Reflection namespace? Using the same analogy I've seen with regular expressions, does it mean that C# is not homoiconic the same way it does not support regular expressions, even if it has a System.Text.Regex full of classes/methods to support them?)

Q: Was AlanTuring's imaginary typewriter (or its imaginary language) homoiconic?

A: Yes, because data and code were interchangeable. The characters it read and wrote could be instructions, data or both.

Q: If a TuringMachine is homoiconic, and a TuringEquivalent language is equivalent to a TuringMachine, how can a TuringEquivalent language not be homoiconic?

A: Because homoiconicity is a property, not an algorithm. By the Church-Turing hypothesis, a TuringMachine can compute any possible algorithm - but that does not mean that TuringMachines have every possible property, nor does anything that is TuringEquivalent. It would be a RussellParadox if it were otherwise (trivially; anything with all possible properties would obviously have the opposite of all of those properties, too).

Properties are, in general, not transferable between entities. Quick sort and bubble sort are equivalent algorithms, but they have some significantly different properties, such as their different computational complexities. A Lisp interpreter and a C++ compiler both implement TuringEquivalent languages, yet C++ has the property of statically typed while Lisp does not, etc. A Pentium and a Sparc are both TuringEquivalent CPUs, but they have different properties, such as the number of registers.

Q: Can the term "homoiconic" be applied to a program rather than a language?

A: No; it means "same representation of program and data"; a program may implement a language in which that is true, but the statement is about representation, and so it is about the language implemented by the program, not about the program itself. Lisp is homoiconic, a Lisp interpreter is not. People may however be sloppy about this, and say that e.g. a Lisp interpreter is homoiconic, when strictly speaking, they're referring to Lisp the language.

Q: Smalltalk, Self, Slate? They have implementations written in themselves.

A: No, implementing a language in itself is insufficient, that's merely bootstrapping (see also MetaCircularEvaluator). In those languages you can force evaluation of a previously-unevaluated chunk of code, but you can't e.g. write code that creates arbitrary code which is then executed, producing still more code which can then be dissected as data - all as supported natively by the language. Counter-claims should actually demonstrate any such code.

Counter: Class variables have an initString in their definition. The initString is Smalltalk source code that's stored and executed when the LiteralBindingReference? for that variable receives #initialize. Additionally, #readFrom: creates an object of the receiver's class from the string argument. Parcels' preload and postload operations are just more Smalltalk code. This wouldn't be significant in a file-oriented language but any files of Smalltalk source code have to be automatically generated by Smalltalk, using, you guessed it, #storeOn:. Meanwhile, any distribution framework in Smalltalk passes not only objects but classes and methods, which are all, at a minimum, decompilable (since you lose only the comments and temporary variable names, I rarely think of it as not being actual source code). Contexts, variables and namespaces are all available from within Smalltalk, all easily accessible from source code (thisContext, Smalltalk at: #Foo, #{GlobalVariable}). And the RefactoringBrowser ought to prove how easy it is to manipulate source code in Smalltalk. The fact that ST has incremental compilation and an interpreted environment are suggestive clues. The only difference between ST and Lisp in this regard is that Lisp stores code as lists and trees. Coupled with InfixNotation, that makes it just slightly annoying to handle source code strings.

Counter Counter: Smalltalk ability to make library features appear as if they were built-in language features makes many things very easy, including handling code. That does not make Smalltalk homoiconic, not even weakly, though.

Q: What's this about Tcl?

A: Tcl is homoiconic because evaluation of data is part of the language: force evaluation of data, and it becomes program, and that's part of the language definition - that's how while works, for instance, in Tcl: it forces evaluation of its first string argument, and if the result is true, then it forces evaluation of its second string argument. Essentially the same is true of Lisp S-expressions as program or data (and is not true of Lisp hash tables nor arrays). The same is true of a subset of Snobol. It is not true of C, Java, C++, even though they're TuringEquivalent.

Q: So a language is homoiconic if it provides access to its own interpreter and/or compiler?''

A: That is a necessary precondition, but not a sufficient precondition.

Q: So what else make a language homoiconic?''

A: As above: same representation of code and data. But a translation of the representation to a new representation typically won't still have that property. Assuming otherwise confuses the issue.

Q: Given Slate's literal blocks (the equivalent of ('blah blah), you can't say it's not homoiconic without throwing Lisp out of the club.

A: Not so. Correct me if I'm wrong, but all you can do is quote code blocks created at edit time, you can't construct code blocks at run time.

Q: Well, the C language doesn't natively support constructs to manipulate C code, true, but what if, when compiled, it...

A: Nope. Doesn't matter. If a language isn't homoiconic at the source level, then no example of what can be done once it's compiled will change that. Why not? Because compilation means translation to a different language (such as machine language). Anything you can say about the compiled program is a statement about a language other than C.

Q: Well, but I could write a C program that, when run, would...

A: Nope. Doesn't matter. Same issue. You could write a C program that implements a Lisp interpreter. That doesn't make C into Lisp.

Q: So machine language is homoiconic?

A: Maybe, in some cases, if self-modifying code is possible. But this is a stretch; it's not as if self-modifying code is actually supported as such, it's just sometimes possible. Nor self-modification usual practice; it is actively strongly frowned upon nearly universally (we can nitpick exceptions on another page). Nor have I ever heard of machine code or assembly code called "homoiconic" in practice - you'll get funny looks if you do that, since it's not self-modifying in practice.

Remark: If machine language is not homoiconic then nothing else is. It does not matter whether self-modifying code is common, forbidden, whatever. Code and data are *exactly* the same in machine language. You can execute data without 'eval'.

I've written an 8086 assembly search routine for 3K editor that inspected and modified its code to switch between searching forward and searching backward. This practice is uncommon these days due to code caches and pipelining on workstation processors and due to code in ROM on embedded processors. This is more common in simple virtual machines, such as for CoreWars.

However, even if you stretch the point, note that there are problems like Harvard architecture, which means treating program and data separately and differently; different RAM, different buses to the processor, etc. And it is not always the case that machine words are the same thing as data words (sometimes they're different bit lengths, sometimes one has tags and the other doesn't, etc). And then, even if the CPU seems to implement data and code interchangeably, the OS itself may (and often does) forbid read access to code and execute access to data. So this is all problematic and full of caveats and special cases.

[How does Lisp work on a Harvard architecture machine?]

[I didn't say anything about compilation. What about interpreted Lisp on a Harvard architecture?] Lastly, treating code as data is, as it says above, often possible, but never supported: if you want to replace the opcode of a machine instruction, you have to write special software to parse the machine instructions to find the opcode part of the machine word in the first place. This extra requirement means that, in practice, machine code is not actually homoiconic by itself.

Sure it's supported. A machine language programmer who wishes to program in homoiconic style is not forbidden to do so. You assemble the instructions you want to insert elsewhere and treat them as data to be copied where you need them. All you need for this is a von Neumann architecture, relative addressing, and labels. This used to be an optimization trick; reprogram the logic in an inner loop instead of embedding conditional jumps. Nowadays nobody does this due to high level languages, fast processors, plenty of memory, and processor caches.

Of course, you can't say machine language in general is homoiconic. You can only make that determination per system.

Some obscure instances of assembly code may have had direct support for this kind of thing, so conceivably there is some homoiconic assembly language that has existed in history (although I doubt it). Same thing with some kind of exotic homoiconic machine language - anyone want to make an argument for/against the LispMachine machine code as homoiconic?

Q: Can't all program code be represented as a basic data type (byte)? Can't all languages generate and manipulate new code at run time (i.e. a C++ compiler written in C++)? C++ programs are arrays of bytes, which are fundamental data types in that language.

A: False. There's a sharp difference between byte arrays that are manipulated by the language, such as quoted strings in C/C++, versus byte arrays residing in the source file that are not natively manipulatable by language constructs; the latter are manipulatable by the compiler, but that's different. Joe and Sue are both humans, but they're not the same person. Byte arrays arise in two contexts here, but they're not the same kind of construct. C/C++ can manipulate "char*" types, but it can't execute the result.

In Lisp, programs are lists, the fundamental data type in that language. In Tcl, programs are strings. In C++, the program that the compiler sees is not represented as any data type within the language; it could be regarded as a syntax tree, which is not part of C++, or as a stream of tokens, which is not part of C++. And no matter how you represent generated C++ (e.g. as a string), you cannot make the compiler aware of the content of that representation. C++ is a perfect example of a heteroiconic language; there's no way to claim that it is homoiconic without destroying all meaning of the term.

Q: What about the very good and funny example from the Obfuscated C contest, see http://www.ioccc.org/1984/mullender.c

A: Whatever unusual properties that program has, they apply only to that program, not to the language itself. It also is not constructing code at runtime. It also depends on translation to another language (machine language) to achieve its funny properties. There's nothing about that that makes the C language itself homoiconic.

A: http://www.ioccc.org/1984/mullender.hint gives the answer: "If your machine is not a Vax-11 or pdp-11, this program will not execute correctly." In other words, this is not ANSI Standard C. It's syntactically valid, but the behavior of this program, when run, is formally "undefined" in ANSI Standard C. In a formal sense, this is not an ANSI Standard C program. It's not a K&R C program either. And it's not portable. It would fail on Intel-based PCs, for instance.

Q: If the C++ language supported an eval function that accepted a string containing C++ and executed the contents, would it would be homoiconic?

A: It would go a long way, since that would allow treating data as code, but what about treating code as data? It's not sufficient.

Q: What about the approach taken by the GooLanguage REPL? Goo is definitely homoiconic: program code is represented as an AST of Goo objects, manipulable by generic functions. And the Goo compiler (there's no interpreter) is written in Goo, besides a small C runtime. But the Goo REPL reads in the form, converts it to Goo's AST objects, compiles the AST to C, runs gcc on the emitted C code, and then dynamically links the generated object code into the running image. If you strip away the Goo veneer, does this mean that C is homoiconic, since a C program is generating C expressions, compiling them, and executing them?

A: Goo is homoiconic, since the interchangeability of code and data is defined by the language itself. If you strip away the Goo veneer, revealing C, you have revealed a language that does not natively define interchangeability of code and data, so of course it is not homoiconic.

Q: What about Java programs that include a Java compiler, then have a dynamic classloader to pull the generated code into the running image? I've personally seen programs that do this, and it's a very powerful technique for rapid application development. Does this make Java a homoiconic language? What if instead of textual Java code, you hacked the compiler to take a parse tree of Java objects?

A: Yeah, fancy mechanisms like that are very handy, but what they are doing is not directly natively supported by the Java language. They are in effect extending the language - call this new language Java++. Then (for the sake of argument) say that Java++ is homoiconic. That still does not make Java itself homoiconic. Other people cannot use those features without using that fancy infrastructure, yet that fancy infrastructure is not part of the Java language. If it were added to the Java language definition, then we could take a closer look at the features that it adds to see if in fact it has made code and data interchangeable. Meanwhile, no, Java is not homoiconic.

Q: I still don't get it.

A: Homoiconicity is not something that is going to be blindingly obvious to all programmers. My experience is that Lisp programmers understand the term the first time they hear it, because of the properties of Lisp that they are familiar with. This is no different than any other concept. Quantum mechanics, for instance, is immediately understandable to many math grad students because of the kinds of math they've studied, whereas it's notoriously counter-intuitive to students in general.

If thinking about the stuff on this page makes your brain hurt, then it would probably be a good idea to go off and learn some Lisp and/or Tcl, do some programming in those languages that specifically involves treating data as code, and code as data, and then come back and read the page again. It will then make much more sense.

Q: I am not convinced by your explanations; they contradict your definition, and besides, your definition is too dogmatic. I want to change the definition to be synonymous with TuringEquivalent.

A:

Q: Is there an objective test to tell if a language is Homoiconic?

A:

Q: Is Homoiconic all or nothing, or can a language be so in degrees?

A:

The above two questions seem (given extensive arguments on c2 in 2004) to have rather controversial answers, at least hereabouts, FYI. I believe that the most common answers in the Lisp community would likely be "yes" and "all or nothing", but if so, those may or may not be the best possible answers; there is room for discussion.


Ironically, the person who criticized the lack of community involvement at top of page deleted some community involvement in the process of moving material from HomoiconicLanguages to this page. I recovered the missing material from the history before it expired:

weakly homoiconic (term invented for this page): a language (not a program) that has homoiconic features sort of tacked onto the side, but the core language itself is not homoiconic (like TickC below). Such languages can be called "homoiconic", but they are not the best archetypal examples of the category. So far every example I've seen of usage on this, in this page, seems to be an example of Meta-circularity, not Homo-iconicity. But, this page is nearly TooBigToEdit. Did I miss an example?

... I didn't want to add too much detail until I got the gist of it. From other examples and disputes, it seems we need another prerequisite: block inspection and mutability. And perhaps a stricter definition of "first-class" is needed. Perhpas one could give a HomoiconicityClassification of languages?


Q: Is Homoiconic much ado about nothing ? A: Yes, after much wasted bandwidth on c2 this seems to be the only logical conclusion. In particular "homoiconicity" should, in principle, facilitate meta-programming techniques, on its own it is of very little value. Languages without homoiconicity have managed to accomplish a lot in this area using lighter techniques. See for example AspectWerkz?, RubyOnRails, etc. In the same time some homoiconic languages like Common Lisp fall far short from being fully reflective environments, and this subtracts further from the value of being homoiconic. In the end, what the client programmer should ask for is results. Whether or not a language is fully, 50% or 0% homoiconic matters very little.


CategoryQuestionsAnswers CategoryFaq


EditText of this page (last edited November 23, 2014) or FindPage with title or text search