Homoiconic Faq

This page comes from refactoring HomoiconicLanguages, which is way too big. Despite skepticism from after-the-fact commentators, this FAQ absolutely did in fact derive from back and forth conversation, with (for the most part) different people answering questions than those who posed the questions...i.e. no, it's not a fake FAQ, it's a real FAQ, regardless of opinions about its quality.

FAQ

Q: So it's the same thing as TuringEquivalent, and basically all programming languages are homoiconic?

A: Not at all. The term exists to differentiate programs where data and code are interchangeable. Lisp, Tcl, and Snobol are examples of homoiconic languages. Fortran, Cobol, C, C++, Java, Ada, Basic are examples of "heteroiconic" languages (this term is reasonable but not in wide use). Also, although in common usage Turing Equivalence is usually assumed, but that's not strictly necessary. A regular expression language that allowed regex search and replace on other regexes which were then executed would be homoiconic (regular expressions are not Turing Equivalent; see ChomskyHierarchy).

Q: Why not? I can write C programs to manipulate C programs.

A: Because the manipulation isn't supported natively in the language; the manipulation of C programs done by e.g. a C compiler written in C is done only by that particular program, and is not possible for other programs written in C.

(What about C# and the whole System.Reflection namespace? Using the same analogy I've seen with regular expressions, does it mean that C# is not homoiconic the same way it does not support regular expressions, even if it has a System.Text.Regex full of classes/methods to support them?)

Q: Was AlanTuring's imaginary typewriter (or its imaginary language) homoiconic?

A: Yes, because data and code were interchangeable. The characters it read and wrote could be instructions, data or both.

Q: If a TuringMachine is homoiconic, and a TuringEquivalent language is equivalent to a TuringMachine, how can a TuringEquivalent language not be homoiconic?

A: Because homoiconicity is a property, not an algorithm. By the Church-Turing hypothesis, a TuringMachine can compute any possible algorithm - but that does not mean that TuringMachines have every possible property, nor does anything that is TuringEquivalent. It would be a RussellParadox if it were otherwise (trivially; anything with all possible properties would obviously have the opposite of all of those properties, too).

Properties are, in general, not transferable between entities. Quick sort and bubble sort are equivalent algorithms, but they have some significantly different properties, such as their different computational complexities. A Lisp interpreter and a C++ compiler both implement TuringEquivalent languages, yet C++ has the property of statically typed while Lisp does not, etc. A Pentium and a Sparc are both TuringEquivalent CPUs, but they have different properties, such as the number of registers.

Q: Can the term "homoiconic" be applied to a program rather than a language?

A: No; it means "same representation of program and data"; a program may implement a language in which that is true, but the statement is about representation, and so it is about the language implemented by the program, not about the program itself. Lisp is homoiconic, a Lisp interpreter is not. People may however be sloppy about this, and say that e.g. a Lisp interpreter is homoiconic, when strictly speaking, they're referring to Lisp the language.

Q: Smalltalk, Self, Slate? They have implementations written in themselves.

A: No, implementing a language in itself is insufficient, that's merely bootstrapping (see also MetaCircularEvaluator). In those languages you can force evaluation of a previously-unevaluated chunk of code, but you can't e.g. write code that creates arbitrary code which is then executed, producing still more code which can then be dissected as data - all as supported natively by the language. Counter-claims should actually demonstrate any such code.

Counter: Class variables have an initString in their definition. The initString is Smalltalk source code that's stored and executed when the LiteralBindingReference? for that variable receives #initialize. Additionally, #readFrom: creates an object of the receiver's class from the string argument. Parcels' preload and postload operations are just more Smalltalk code. This wouldn't be significant in a file-oriented language but any files of Smalltalk source code have to be automatically generated by Smalltalk, using, you guessed it, #storeOn:. Meanwhile, any distribution framework in Smalltalk passes not only objects but classes and methods, which are all, at a minimum, decompilable (since you lose only the comments and temporary variable names, I rarely think of it as not being actual source code). Contexts, variables and namespaces are all available from within Smalltalk, all easily accessible from source code (thisContext, Smalltalk at: #Foo, #{GlobalVariable}). And the RefactoringBrowser ought to prove how easy it is to manipulate source code in Smalltalk. The fact that ST has incremental compilation and an interpreted environment are suggestive clues. The only difference between ST and Lisp in this regard is that Lisp stores code as lists and trees. Coupled with InfixNotation, that makes it just slightly annoying to handle source code strings.

At first glance, that does look like an argument that Smalltalk is weakly homoiconic.
What would it take to make it strongly homoiconic? And what's block mutability below? Never mind, I think you're right about ST being only weakly homoiconic since I can't seem to change a block's source code even interactively from within an inspector. That still leaves Self and Slate to consider.

Counter Counter: Smalltalk ability to make library features appear as if they were built-in language features makes many things very easy, including handling code. That does not make Smalltalk homoiconic, not even weakly, though.

Q: What's this about Tcl?

A: Tcl is homoiconic because evaluation of data is part of the language: force evaluation of data, and it becomes program, and that's part of the language definition - that's how while works, for instance, in Tcl: it forces evaluation of its first string argument, and if the result is true, then it forces evaluation of its second string argument. Essentially the same is true of Lisp S-expressions as program or data (and is not true of Lisp hash tables nor arrays). The same is true of a subset of Snobol. It is not true of C, Java, C++, even though they're TuringEquivalent.

Q: So a language is homoiconic if it provides access to its own interpreter and/or compiler?''

A: That is a necessary precondition, but not a sufficient precondition.

Q: So what else make a language homoiconic?''

A: As above: same representation of code and data. But a translation of the representation to a new representation typically won't still have that property. Assuming otherwise confuses the issue.

Q: Given Slate's literal blocks (the equivalent of ('blah blah), you can't say it's not homoiconic without throwing Lisp out of the club.

A: Not so. Correct me if I'm wrong, but all you can do is quote code blocks created at edit time, you can't construct code blocks at run time.

Q: Well, the C language doesn't natively support constructs to manipulate C code, true, but what if, when compiled, it...

A: Nope. Doesn't matter. If a language isn't homoiconic at the source level, then no example of what can be done once it's compiled will change that. Why not? Because compilation means translation to a different language (such as machine language). Anything you can say about the compiled program is a statement about a language other than C.

Q: Well, but I could write a C program that, when run, would...

A: Nope. Doesn't matter. Same issue. You could write a C program that implements a Lisp interpreter. That doesn't make C into Lisp.

Q: So machine language is homoiconic?

A: Maybe, in some cases, if self-modifying code is possible. But this is a stretch; it's not as if self-modifying code is actually supported as such, it's just sometimes possible. Nor self-modification usual practice; it is actively strongly frowned upon nearly universally (we can nitpick exceptions on another page). Nor have I ever heard of machine code or assembly code called "homoiconic" in practice - you'll get funny looks if you do that, since it's not self-modifying in practice.

Remark: If machine language is not homoiconic then nothing else is. It does not matter whether self-modifying code is common, forbidden, whatever. Code and data are *exactly* the same in machine language. You can execute data without 'eval'.

I've written an 8086 assembly search routine for 3K editor that inspected and modified its code to switch between searching forward and searching backward. This practice is uncommon these days due to code caches and pipelining on workstation processors and due to code in ROM on embedded processors. This is more common in simple virtual machines, such as for CoreWars.

Yep. However I don't see how this comment supports or contradicts anything that was already in the FAQ..
Since you made other edits without responding to this, maybe I wasn't clear. More directly: why did you add this comment about self modifying 8086 assembly? I don't see the point at all.
- Just giving an example of machine language blurring the distinction between code and data. If not apropos, please delete. However, the whole point of CoreWars is self-propagating code, so I think it does qualify as homoiconic.
- I'd have to think about the question of CoreWars in particular, that might be a special case - but as to the rest, the question of self modifying machine code was supposedly already addressed in detail in this FAQ section. Thus my question. (You may also have overlooked my similar comment below in response to your Forth comment.)

However, even if you stretch the point, note that there are problems like Harvard architecture, which means treating program and data separately and differently; different RAM, different buses to the processor, etc. And it is not always the case that machine words are the same thing as data words (sometimes they're different bit lengths, sometimes one has tags and the other doesn't, etc). And then, even if the CPU seems to implement data and code interchangeably, the OS itself may (and often does) forbid read access to code and execute access to data. So this is all problematic and full of caveats and special cases.

[How does Lisp work on a Harvard architecture machine?]

Same as on non-Harvard: won't be homoiconic, because the pertinent issue is whether the Lisp is compiled, not what architecture it is compiled to. A Lisp program compiled to machine code cannot manipulate Lisp constructs that are compiled to machine code. This isn't a theoretical matter of hairsplitting, I mean that, if you try it, it won't work, and can't be made to work - although you have to be very careful of the experiment, because the language implementation may well keep both the interpreted as well as the compiled form of the code around, which could easily confuse results.

[I didn't say anything about compilation. What about interpreted Lisp on a Harvard architecture?]

Interpreted Lisp does not use self-modifying machine code, which is the only kind of code that behaves differently on a Harvard architecture, so it behaves the same way as always.

Lastly, treating code as data is, as it says above, often possible, but never supported: if you want to replace the opcode of a machine instruction, you have to write special software to parse the machine instructions to find the opcode part of the machine word in the first place. This extra requirement means that, in practice, machine code is not actually homoiconic by itself.

Sure it's supported. A machine language programmer who wishes to program in homoiconic style is not forbidden to do so. You assemble the instructions you want to insert elsewhere and treat them as data to be copied where you need them. All you need for this is a von Neumann architecture, relative addressing, and labels. This used to be an optimization trick; reprogram the logic in an inner loop instead of embedding conditional jumps. Nowadays nobody does this due to high level languages, fast processors, plenty of memory, and processor caches.

Yes, I know, I know, been there, done that, got a stained T-shirt. But you're not understanding the issue with "supported". It means more than just "possible". C strings allow me to put regular expressions in them. But that certainly doesn't mean that C supports regular expressions. Similarly with machine code. Self-modification is possible sometimes, but not supported.
- Sorry, I just don't get your distinction between possible and supported. Feel free to clean my comments out of your Q&A-mode text, and I'll think about it some more [DeleteMe].
- I, in turn, don't see why the C string/regular expression comparison didn't make it clear, so see if you can figure out what's wrong with that explanation, so that I can try to improve it.
- OK: when we're talking about machine language, it's the processor itself (or VM) that implements "eval", and the fundamental data type is the code/data address. The processor supports reading and writing executable code. How does this differ from LISP's eval and lists besides convenience in code manipulation? If convenience (as opposed to straight out impossibility) is the whole point, then we are in agreement: Lisp makes it very easy to manipulate code, since the structure is preserved and regular. (BTW: I didn't get the regex comparison.)
  - (What's not to get? C allows regexes - in strings. C doesn't support regexes - obviously; it has no idea what you put into strings. You already know all of that is true.)
  - Note that above it says:
    - Q: So machine language is homoiconic?
    - A: Maybe, in some cases, if self-modifying code is possible. But this is a stretch [...]
  - It doesn't say machine language absolutely cannot be considered to be homoiconic, it just says "maybe" and also "but this is a stretch". It's not the best possible example.
  - If you are sufficiently clear on why Lisp is definitely a homoiconic language, and why C is definitely not a homoiconic language, then that's good enough, and trying to figure out the fuzziest of examples is probably pointless. Contrary-wise, if you don't get why that's the case with Lisp and respectively C, it won't help you to get clearer about it by looking at fuzzy cases.
  - I do get the distinction between Lisp and C. I also get that it is cheating to invoke compilers or environment variables from C. I think we're pretty much in agreement now. - IAO
Also I accuse you of speed-reading in spots rather than truly looking to see if your points are already addressed here. It's a big page because there were so many arguments originally, before they were boiled down; I don't want to unnecessarily go through that again. (Corewars, on the other hand, is a very interesting point.)
- I'm not getting a clear sense of the boundaries of homoiconicity. Web references mostly just refer back to Lisp rather than point out abstract properties. I was surprised not to find the term mentioned much on LambdaTheUltimate. (I've been following this page with interest (and puzzlement) from the start and it's getting too big to grok; that's why I am trying to define the boundaries via succinct examples further down.)
  - Well, there's one brief (not necessarily very helpful) LambdaTheUltimate reference: http://lambda-the-ultimate.org/classic/message10980.html
  - But as I just said above, try to understand the archetypes of the polar extremes, Lisp and C; don't worry about the boundaries, since they're difficult - and ultimately, why would anyone actually care if machine code should be called homoiconic or not? Whereas with Lisp, it makes a difference in day to day practice!!! Try translating the Lisp example I wrote today at top of page into C. Ta-da - there's a practical difference. Trivial algorithm in Lisp, impossible in C.
- [I don't get the distinction, either, but I was unable to identify a set of statements that were true for homoiconic languages and false for heteroiconic languages. Until someone can provide them I'm operating under the belief that the set is fuzzy and somewhat subjective. -- EricHodges]
  - I can provide one for you: "A homoiconic language can trivially (no new complex data types and using only language primitives) implement a code walker, code generator, parser and evaluator of your language without using any feature not explicitly in the 'core' language. Standard libraries are not fair game, although the generated, parsed or evaluated code may in fact refer to them." You just can't do it in, say, C++. If you can, please show me.
  - This is a bad definition, because it cannot be applied to a language in which everything is in the library. In Smalltalk, for example, strings, booleans, numbers, and if statements are all in the library. So, you can't do anything without using a standard library. That includes manipulating programs. But it is fairly easy to write code walker and code generator. Writing a parser and evaluator is more work.
    - The fact that the definition makes Smalltalk not homoiconic does not mean that the definition is bad. Smalltalk puts a lot of things into the library. That's good. It also puts handling of code in the library. That makes it not homoiconic. Period.
  - The reason Lisp gets away with this is because the very foundation of Lisp is built on (eval). It may not seem very fair to say, "Lisp is homoiconic because it defines itself to be homoiconic," but that's the very foundation of this MetaCircularInterpreter.
  - MetaCircularity? is a good property, and is easily done when you have a HomoiconicLanguage. However, you do not need to have be homoiconic to be metacircular. Because of this fact, we tend to see more metacircular languages. Homoiconicity is a relatively rare and obscure property.
- If you are right, it wouldn't be the only thing in the technical world that is a bit fuzzy. How about "good design" versus "bad design"? Are there any fuzzy cases? Ever disagree with someone on that? Once again, if you at least understand the Lisp and C cases, that should be sufficient. Lisp being homoiconic matters. Machine code being homoiconic only matters one way or the other to someone who is going to do a great deal of self-modifying code - actually, they wouldn't care either, they've got too many other problems on their mind.
- [But when I proposed an alternate definition that said it was fuzzy you deleted that and said it was wrong. I like fuzzy logic, but think there are discrete sets of fuzzy things and discrete things. In which of those sets does "homoiconic" belong? -- EH]
- I still haven't said it is fuzzy! I said "if you are right [about it being fuzzy]...". It's not, and it's not subjective, so that shouldn't be in the definition. The point is that there is no harm in you, personally, perceiving it as being fuzzy around the edges, as long as you understand why Lisp is definitely homoiconic and C is definitely not homoiconic. The core of the definition should be seen very clearly to be non-fuzzy, regardless of corner cases. But so long as you say that you don't get the C and Lisp examples, then the more difficult cases are going to be impossible to deal with.
- [I know you haven't said it was fuzzy, but I thought you said before that it was discrete. I don't presume to know anything about this term. Is the set of homoiconic languages discrete or fuzzy?]
  - Given the discussion so far, I still believe that it's not fuzzy. The example that I think is least clear, and that therefore might be perceived after this discussion as a fuzzy issue (whether it actually is or not) is self-modifying machine code. But do you or don't you get that it's clear in the case of Lisp and C?
  - [I don't. It's easier to call "eval" in Lisp than it is to call "cc" and execute the result in C. "Easier" and "harder" are indicators of fuzzy sets. If it was impossible for a C program to operate on itself then I agree that the sets are discrete. It isn't impossible, just harder. We've been down this road before and I don't want to be accused of trolling again. I understand that you see a sharp distinction between the two. Please understand that I do not and your explanations have not led me to that understanding so far. I have no dog in this race. If the set of homoiconic languages is discrete it won't affect my life in the least. I'm just trying to understand the term.]
  - You may call cc (if it is available that is, in general portable C programs cannot rely on the availability of cc as a system command, whereas potable Lisp program can rely that read/eval is there), and spawn another program - and do not be surprised if it is not guaranteed to work on Windows, but you can't modify the currently running program - that's not fuzzy. And you do not have access to compiled structure of cc's output - again not fuzzy.
  - Thank you for saying that you are speaking in good faith.
  - There's a critical issue that you raise here that perhaps I was taking too much for granted. The "cc" command is not part of the C standard! So any attempt to go down that road will be using constructs that are not part of the C language per se, and will fail on some standards-complying implementation of C. (If "cc" were part of the standard, there would still be more to say, btw, but this observation cuts off the entire train of thought at its root.)
  - That's not nitpicking; in addition to "cc" not being part of the language, consider that if you suggest calling "cc", someone else could suggest having a C program execute a Lisp interpreter. But that doesn't mean that Lisp is part of C.
  - The new Lisp example at top of page translates trivially on a line-to-line basis into any other homoiconic language. If you have to write even just a parser to translate, that already means it's not homoiconic.
  - I recommend that you seriously attempt to translate the new Lisp example into standards-compliant portable C. I know that you're not a SmugLispWeenie, but the example has lots of comments that should make the code relatively clear (in conjunction with some web page intro to Lisp, if need be). At this point, I doubt that anything else at all other than attempting the translation has any bearing on helping you understand.
  - [I haven't kept up with C standards, but I can translate the Lisp example into the C of my youth. I would use an environment variable for 'b', pipe the assignment code to the C compiler, execute the result, modify the operand and repeat. It isn't as easy as in Lisp, but I can do the same things.]
  - I addressed that. No, you cannot. The C language does not include pipes and C compilers. Those are both things provided by Unix. The question is not whether you can implement some algorithm in C in conjunction with the operating system. When I say "standard C", I'm not talking about Language Lawyer hairsplitting details, I just mean you can't do that stuff in terms of the LANGUAGE. You are perfectly aware that the Unix host OS is not part of the language. Translate the Lisp example into C that uses only C features, and that will run on Windows, Unix, VMS, VxWorks - that will run anywhere that you've got a C compiler. You can't. The Lisp example, on the other hand, will.
  - [The current C standard may not include pipes and compilers, but every C I've used provided access to them. Yes, they are provided by the operating system, but then so is memory, I/O, scheduling, etc. The OSs you mentioned all provide Posix pipes, so it wouldn't be hard to write one C program that did this on all of them. Even if they didn't, it would be possible to write one C program that ran on all of them. Not as easy as Lisp, but still possible. Hence my belief that this set is not discrete.]
  - I programmed C on MSDOS. No pipes. C is common in embedded machines with essentially no OS. -- RalphJohnson
  - Third party opinion here, feel free to take it with a grain of salt. Eric, it seems to me that you're so wrapped up in trying to fit this new term into your existing knowledge base, that you're overlooking the distinctions being made, or trying to rationalize them away. May I suggest going Zen for a second, you got to empty your cup before you can learn something new. What you think you know is getting in the way of what you're trying to learn here. Learn the term, accept the differences, then try to fit it into what you already know, but wait until after you "get it" to try and integrate it. Lisp and C are perfect examples, Lisp is clearly Homoiconic, C is clearly not, it's not debatable, it just is.
  - [It's the "try to fit it into what I already know" step that breaks down. I can pretend I get it, but when I test what I've got, it fails.]
  - OK, then forget what you know for a bit. Lisp source code "is" also a Lisp list, it is structured data. C source code is just a string, it's completely unstructured. In Lisp, you can take any piece of code, pass it to a procedure as a list, where it can be modified before allowing evaluation. In C, then best you could do is pass a bunch of bytes, but then it's no longer source code, it's just a bunch of bytes, plus there's no way for the C language to execute a list of bytes as C source code. We're talking about the C language, not about any specific implementation of it. C can't do what Lisp can do, natively represent it's own source code as data or code, and flip back and forth between data and code at will. In C, the macro preprocessor can transform code, but the macro's aren't themselves using C the language to do that work. It's using a simple text replacement. In Lisp's macros, the macro is much more than that, the Lisp macro is Lisp. When the macro is invoked, the macro itself runs Lisp, and you have full access to the entire Lisp environment, including the actual data passed to the macro. The C macro expands at compile time, the Lisp macro expands when it's actually invoked, so the Lisp macro has access to the actual variables contained within the code being passed in. Lisp calls this read time, read time is between compile time and runtime. The Lisp macro, could expand differently each time it was invoked, depending on the value of lexical variables at expansion time. The C macro doesn't exist once compiled, therefore it cannot take advantage of runtime data. The Lisp macro can actually work with and on runtime data and code, and transform them at will because of it's Homoiconic abilities, C can't because of it's lack of those same abilities.
    - A few factual corrections that can hopefully be worked into your explanation once you've read them:
    - Lisp macros expand at macroexpand time. This is between read time and compile time (in compiled implementations; read time and runtime in interpreted implementations). Read time is different: it's when the textual source is parsed and converted to the Lisp lists. The order is (usually) ReadTime?->MacroexpandTime?->CompileTime->RunTime.
    - There are also "reader macros" that do operate at read time. These allow you to hack the Lisp reader so you can eg. use InfixNotation, substitute braces for parens, create "super parens" that close all open parens, and so on. These are not what a typical Lisp programmer means when they speak of macros.
    - All the various "times" are available all the time. (read) is available at runtime as a function that takes an input stream and returns a list. (macroexpand) is a function that takes a list that takes a list and returns a list. (compile) is a function that takes a symbol and returns a function. You can invoke the result of (compile) as you would any other function. And you can use all of these during macroexpand-time or read-time (ViaWeb? actually did this - it read new forms from web users using a custom reader, macroexpanded them into code, and then called compile on the fly to generate the store page). This is perhaps the best definition of homoiconicity that I can think of, and I think it's clearer and more unambiguous than anything else on this page.
    - AFAIK (and I'm shaky on the standard in this area), macros cannot access lexical variables (unless expanded manually via macrolet or macroexpand). The function invocation does not exist as a concept at macroexpand time. Macroexpanders can access their own lexical variables (which are bound to program fragments, not runtime values), but not those of the enclosing lexical scope.
    - Macros can only work on runtime data if invoked manually via macroexpand. Normally, macros don't even exist at runtime, because they've already been expanded. Runtime generation of code (and subsequent evals or compiles) is a tool of last resort for Lisp programmers, used when other languages would otherwise embed the whole language's interpreter. Most people use macros as a super-powerful preprocessor, so they can follow OnceAndOnlyOnce on a grand scale.
      - Sweet, that's excellent information, I'll adjust my interpretation appropriately, thanks, filled in a few missing details for me.
  - [But C can do what Lisp can do ("natively represent it's own source code as data or code, and flip back and forth between data and code at will"), it just doesn't make it as easy on the programmer.]
  - No it can't, and if you think it can, then translate the above lisp examples to C, without calling another program, until you figure out that it can't do it. Quit trying to defend C as if it's being attacked, and either accept the difference, or translate the code until you can see it yourself. How many times does it need to be explained before you either get it, or go prove it to yourself?
  - [I'm not "defending" C. It isn't being attacked. As I said above, I don't have a dog in this race. I haven't written C code in 12 years, so I won't bother translating the Lisp code. I can see how I would make a C program that does what the Lisp program does. It would have to use the C compiler to do it, as Lisp uses "eval" to interpret Lisp code. Perhaps I don't understand why that matters.]
    - It matters because CC is not part of the C language whereas Lisp's eval and Smalltalk's readFrom: ARE. It's not a native ability and hence doesn't count anymore than if you reimplemented Lisp in C. Homoiconicity is not meant to be Turing equivalence. This has already been explained to you. It has already been explained multiple times. It has been explained clearly multiple times. At this point I really think you just ought to shut up and stop participating in this discussion.
      - Since when and in which dialect is readFrom part of the Smalltalk language? It is not. Smalltalk is not homoiconic.
    - [I agree. I seem to have difficulty DefiningDiscreteSetsOfLanguages.]
      - It is well known that doing exercises in textbooks is important for learning material, which is why college courses require doing so, rather than just having students read. If you care, do the exercise, in any high level language you please.
  - Probably more times than you can stand. I really wonder at someone who claims C has something which Smalltalk lacks. Well, which Smalltalk only has in small amounts. This is obviously an example of the BlubParadox and no amount of explaining will ever ever be sufficient. Especially when talking to a lazy ass son of a bitch HostileStudent.
  - Moving on to more worthwhile pursuits, I think it's high time that this page were refactored to make it intelligible. Perhaps moving all of this knee-jerk defense of C to its own page, like TrashHeap or NoWhere.

Of course, you can't say machine language in general is homoiconic. You can only make that determination per system.

Some obscure instances of assembly code may have had direct support for this kind of thing, so conceivably there is some homoiconic assembly language that has existed in history (although I doubt it). Same thing with some kind of exotic homoiconic machine language - anyone want to make an argument for/against the LispMachine machine code as homoiconic?

Q: Can't all program code be represented as a basic data type (byte)? Can't all languages generate and manipulate new code at run time (i.e. a C++ compiler written in C++)? C++ programs are arrays of bytes, which are fundamental data types in that language.

A: False. There's a sharp difference between byte arrays that are manipulated by the language, such as quoted strings in C/C++, versus byte arrays residing in the source file that are not natively manipulatable by language constructs; the latter are manipulatable by the compiler, but that's different. Joe and Sue are both humans, but they're not the same person. Byte arrays arise in two contexts here, but they're not the same kind of construct. C/C++ can manipulate "char*" types, but it can't execute the result.

In Lisp, programs are lists, the fundamental data type in that language. In Tcl, programs are strings. In C++, the program that the compiler sees is not represented as any data type within the language; it could be regarded as a syntax tree, which is not part of C++, or as a stream of tokens, which is not part of C++. And no matter how you represent generated C++ (e.g. as a string), you cannot make the compiler aware of the content of that representation. C++ is a perfect example of a heteroiconic language; there's no way to claim that it is homoiconic without destroying all meaning of the term.

Q: What about the very good and funny example from the Obfuscated C contest, see http://www.ioccc.org/1984/mullender.c

A: Whatever unusual properties that program has, they apply only to that program, not to the language itself. It also is not constructing code at runtime. It also depends on translation to another language (machine language) to achieve its funny properties. There's nothing about that that makes the C language itself homoiconic.

A: http://www.ioccc.org/1984/mullender.hint gives the answer: "If your machine is not a Vax-11 or pdp-11, this program will not execute correctly." In other words, this is not ANSI Standard C. It's syntactically valid, but the behavior of this program, when run, is formally "undefined" in ANSI Standard C. In a formal sense, this is not an ANSI Standard C program. It's not a K&R C program either. And it's not portable. It would fail on Intel-based PCs, for instance.

Q: If the C++ language supported an eval function that accepted a string containing C++ and executed the contents, would it would be homoiconic?

A: It would go a long way, since that would allow treating data as code, but what about treating code as data? It's not sufficient.

Q: What about the approach taken by the GooLanguage REPL? Goo is definitely homoiconic: program code is represented as an AST of Goo objects, manipulable by generic functions. And the Goo compiler (there's no interpreter) is written in Goo, besides a small C runtime. But the Goo REPL reads in the form, converts it to Goo's AST objects, compiles the AST to C, runs gcc on the emitted C code, and then dynamically links the generated object code into the running image. If you strip away the Goo veneer, does this mean that C is homoiconic, since a C program is generating C expressions, compiling them, and executing them?

A: Goo is homoiconic, since the interchangeability of code and data is defined by the language itself. If you strip away the Goo veneer, revealing C, you have revealed a language that does not natively define interchangeability of code and data, so of course it is not homoiconic.

Q: What about Java programs that include a Java compiler, then have a dynamic classloader to pull the generated code into the running image? I've personally seen programs that do this, and it's a very powerful technique for rapid application development. Does this make Java a homoiconic language? What if instead of textual Java code, you hacked the compiler to take a parse tree of Java objects?

A: Yeah, fancy mechanisms like that are very handy, but what they are doing is not directly natively supported by the Java language. They are in effect extending the language - call this new language Java++. Then (for the sake of argument) say that Java++ is homoiconic. That still does not make Java itself homoiconic. Other people cannot use those features without using that fancy infrastructure, yet that fancy infrastructure is not part of the Java language. If it were added to the Java language definition, then we could take a closer look at the features that it adds to see if in fact it has made code and data interchangeable. Meanwhile, no, Java is not homoiconic.

Q: I still don't get it.

A: Homoiconicity is not something that is going to be blindingly obvious to all programmers. My experience is that Lisp programmers understand the term the first time they hear it, because of the properties of Lisp that they are familiar with. This is no different than any other concept. Quantum mechanics, for instance, is immediately understandable to many math grad students because of the kinds of math they've studied, whereas it's notoriously counter-intuitive to students in general.

If thinking about the stuff on this page makes your brain hurt, then it would probably be a good idea to go off and learn some Lisp and/or Tcl, do some programming in those languages that specifically involves treating data as code, and code as data, and then come back and read the page again. It will then make much more sense.

Q: I am not convinced by your explanations; they contradict your definition, and besides, your definition is too dogmatic. I want to change the definition to be synonymous with TuringEquivalent.

"When I use a word," Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean, neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master - that's all."

Q: Is there an objective test to tell if a language is Homoiconic?

Q: Is Homoiconic all or nothing, or can a language be so in degrees?

The above two questions seem (given extensive arguments on c2 in 2004) to have rather controversial answers, at least hereabouts, FYI. I believe that the most common answers in the Lisp community would likely be "yes" and "all or nothing", but if so, those may or may not be the best possible answers; there is room for discussion.

Ironically, the person who criticized the lack of community involvement at top of page deleted some community involvement in the process of moving material from HomoiconicLanguages to this page. I recovered the missing material from the history before it expired:

weakly homoiconic (term invented for this page): a language (not a program) that has homoiconic features sort of tacked onto the side, but the core language itself is not homoiconic (like TickC below). Such languages can be called "homoiconic", but they are not the best archetypal examples of the category. So far every example I've seen of usage on this, in this page, seems to be an example of Meta-circularity, not Homo-iconicity. But, this page is nearly TooBigToEdit. Did I miss an example?

The Lisp example above is not metacircular by any stretch of the imagination, but it is homoiconic. BTW a few weeks ago I made a foundational correction to the previously incorrect statements on MetaCircularEvaluator, so if you didn't see that change before, you will probably want to take a look.
- I was unclear, so I think you misunderstood. It seems like the arguments for the existence of "Weakly Homoiconic" are just talking about code primitives for dealing with data and an eval function. Basically, letting you write the language in terms of itself, meta-circularity (am I misunderstanding this?). I think that much of our subsequent confusion on HomoiconicExampleInJava has been caused by this classification.
  - Ah. Hmm, maybe you're right. Suggestions?

...

Interesting. Why does Lisp 1.0, which had only S-expressions, not qualify?
Now, I have never used a Lisp 1.0. My only direct knowledge of its practical limitations comes from talking with folks who worked with it (I know only a few). But, my line of thought is, the Lisp 1.0 wasn't implemented in Lisp. It was implemented in assembly. Those core routines couldn't be redefined, and you couldn't yank out their code as data. This was just a reality of implementation. This may be superfluous, and maybe I'm tracing lines that everyone else finds frivolous. But I stand by my statement about the 'Weakly Homoiconic' classification. I think it creates more confusion than it clears up.
Yes, but we can clear up this Lisp 1.0 issue: It was, in fact, implemented in itself, although the bootstrapping step was a hand-translation to machine code. Search for "ho, ho, you're confusing theory with practice" on http://www8.informatik.uni-erlangen.de/html/lisp/histlit1.html
But also, I thought you meant the ability to write a metacircular interpreter, not that such a thing had to be realized.

I didn't want to add too much detail until I got the gist of it. From other examples and disputes, it seems we need another prerequisite: block inspection and mutability. And perhaps a stricter definition of "first-class" is needed.

A good definition would be nice, but I know from past experience that it is difficult, probably even harder than nailing down "homoiconic", since homoiconic is actually one of many examples of first-classiness.
- Then again, the FirstClass page exists...

Perhpas one could give a HomoiconicityClassification of languages?

Q: Is Homoiconic much ado about nothing ? A: Yes, after much wasted bandwidth on c2 this seems to be the only logical conclusion. In particular "homoiconicity" should, in principle, facilitate meta-programming techniques, on its own it is of very little value. Languages without homoiconicity have managed to accomplish a lot in this area using lighter techniques. See for example AspectWerkz?, RubyOnRails, etc. In the same time some homoiconic languages like Common Lisp fall far short from being fully reflective environments, and this subtracts further from the value of being homoiconic. In the end, what the client programmer should ask for is results. Whether or not a language is fully, 50% or 0% homoiconic matters very little.

CategoryQuestionsAnswers CategoryFaq