[Note to those arriving here via Google: most of the following represents ideas and informal discussions between various wiki users, and is not represented to necessarily be authoritative in any particulars; it may or may not be. Caveat emptor.]
[Note to those arriving here from anywhere: the caveat above should be assumed for every page in this wiki.]
[Certainly. I only added this note because the flavor of the below seemed to me to come across with a definitive kind of tone, and last year I noticed that a similar homoiconic page was very popular with google -- which I noticed only after seeing it cited elsewhere on the web, which worried me a bit. We're still arguing the topic, I don't want someone to immortalize in some textbook their mistaken understanding of our arguments! :-]
A subjective continuum from strong to weak.
Strong
- Lisp
- TCL
- most machine languages (but the practice is frowned upon)
- dBASE & FoxPro
- Smalltalk
- .Net languages (including C# and VB.Net)
- Assembly
- Java
- C, C++, COBOL, FORTRAN, Pascal, etc.
Weak
Perhaps the following bit taken from HomoiconicLanguages is worth factoring in:
- LispFamily:
- data/code element type: atom (or defined atom?)
- block type: list/tree of atoms
- evaluator: eval (apply?)
- mutator: list operations
- TracLanguage
- GooLanguage:
- data/code element type: Goo object
- block type: AST of Goo objects
- evaluator: REPL (form -> AST -> C -> gcc -> dynamic link)
- mutator: generic functions
- IoLanguage
- JayLanguage
- data/code type: formally, the atomic representation '5!:0' (nested boxes of strings), but more commonly the linear representation '5!:5' (simple strings containing J code). You can hack it down to the bit level using the internal representation '3!:1'.
- block type: an array of the data/code type.
- evaluator: Many ways. Most commonly, J code strings are executed with '".' (do). Also used are: '~' (evoke) and '128!:2' (apply). Atomic reps are usually "given life" with 5!:1, but can also be evoked wit '@.' (agenda), '^:' (power), and ';.' (cut). Hmm, almost anything involving gerunds. In a sense, the copulae '=:' and '=.' too. An interal rep can be changed back using 3!:2.
- mutators: any normal data mutator in the language. '{' (select), '}' (amend), etc probably most useful. If manipulating as strings, using ';:' (word formation, the primitive that tokenizes J code (i.e. the builtin function that implements J's lexer)) along with the common 'subs' and 'rplc' verbs work well.
- extras: For the masochistic, the 15!: family can be used to access J's memory space and do anything you like. Of course, this has nothing to do with homoiconicity.
- FactorLanguage, JoyLanguage:
- data/code type: word
- block type: bracketed list of words
- evaluator: eval, higher order functions (IF, loops, linrec etc.)
- mutator: stack manipulators along with the block quote and unquote operators
- PostScript
- data/code type: operator
- block type: executable array of operators
- evaluator: exec, higher order operators (conditionals, loops and more)
- mutator: array operations
- PrologLanguage
- RebolLanguage:
- data/code element type: value
- block type: block!
- evaluator: do
- mutator: series operations
- SmalltalkLanguage
- data/code type: string
- evaluator: evaluate: (readFrom: just does evaluate: plus type checking)
- SnobolLanguage
- TclLanguage:
- data/code element type: string token
- block type: string ("everything is a string")
- evaluator: eval
- mutator: string functions
- TexLanguage
- A macro can be taken apart into its tokens and constructed from its tokens. Even the command name can be (de)composed.
- XsltLanguage
- data/code element type: node (XML Infoset)
- block type: nested nodes and node-sets
- evaluator: none (?)
- mutator: templates
Here is the above Lisp example, in REBOL, as a console session:
>> b: 3
== 3
>> b
== 3
>> a: [b: 15]
== [b: 15]
>> a
== [b: 15]
>> do a
== 15
>> b
== 15
>> last a
== 15
>> change back tail a 37
== []
>> a
== [b: 37]
>> do a
== 37
>> b
== 37
- MachineLanguage
- Everything is numbers in the computer's memory.
- Code can be modified, constructed and executed at runtime, using the same numeric representation for opcodes that is used to represent data.
- Note that AssemblyLanguage does not qualify, because it uses a symbolic representation of opcodes and operands that differs fundamentally from the representation of data. (Remember that it's not about what the language translates code to under the covers, but how the language is represented.)
Disputed
- MalbolgeLanguage:
- Show me a Malbolge interpreter written in itself (or prove that it is possible) and I might believe you. :) this comment doesn't mean that any language interpreter for language X that is also written in X shows that language X is homoiconic; read the rest of the page (the latter is directed to general readers of this page, not the original author of the "show me" comment)
- data/code type: trit (trinary digit)
- block type: tritword
- evaluator: interpreter
- mutator: every instruction modifies the code space
- ForthLanguage (maybe):
- Nope. Try taking a Forth word, treating it as data, picking out e.g. the loop in the middle of it, delete the loop (leaving just the loop body, not the conditional), and then execute that changed word. Forth doesn't allow that.
- Not the best counter-example: one can manipulate Lisp programs into being malformed as well. Given a Forth word equivalent to (setq b 45), another word could access its code field, find the XT for LITERAL, and change the following cell from 45 to something else, as in your Lisp example. Portable Forth doesn't allow that; the code field structure is unspecified (it could be machine code or ROM!), and many immediate words compile into XTs and inline data outside the scope of the spec (such as BRANCH, ?BRANCH, their target addresses, and literal numbers and strings). One can manipulate token threaded code fields, but it would be a hack. Forth is fine with dynamic code generation but balks at code mutation. Many Forth's also provide SEE for introspection, but the output is only for the user and SEE's implementation is not portable.
- I think you misunderstood the example. It's not about being well-formed. It's about the language naturally supporting manipulations of code because code and data have the same essential type. Forth lacks this trait. The example seems valid.
- The fundamental data type in Forth is the cell (or array of cells). The code field is an array of cells, each containing an XT or inline data, and as such can be manipulated just like any other cell array. (Similar arguments apply to machine language.) Note: I'm not arguing for Forth, I'm just feeling out the boundaries of the term "homoiconic".
- But as I said in the FAQ, none of this is supported by the language in either case; it is merely sometimes possible if you're clever, and will break in a different implementation. So it's not really an example of something about the language..
- That's interesting, I didn't know about SEE. But I wasn't talking about malformed Forth, I just meant an example of "modify the program and then run it".
- Also note that there is discussion about self-modifying machine code in the above FAQ that would apply to some aspects of this.
- Inner interpreter (threaded):
- data/code element type: execution token (XT)
- block type: dictionary entry (optional header + array of XT's)
- evaluator: EXECUTE
- constructor: defining words like ':' and IMMEDIATE words like IF but no portable code mutator
- Outer interpreter (which may be stripped out of a turnkey Forth application):
- How does the this differ from TCL?
- data/code element type: string token
- block type: string
- evaluator: EVALUATE
- mutator: string manipulation words
- There is no fundamental data type in Forth; it's untyped. There are two fundamental containers: cells and characters. Forth code is stored in characters, in the form of source; Forth is fully homoiconic when it's used to manipulate its source. Forth's way of doing this is via IMMEDIATE words. You cannot manipulate an already-compiled word in portable Forth (but this is very rare in other homoiconic languages). Forth source maps directly onto the structure of Forth compiled code, because both are represented as linear strings of references to other words. Lisp code is manipulated as a tree; Forth code is manipulated as a string.
- I sympathize with your argument, it is reasonable given a quick scan through this and related pages. However, your points have in fact been previously addressed, although it can be difficult to see that fact. If interested, I'd recommend re-reading this page again, carefully, and then the HomoiconicFaq page, carefully, and especially FirstClass and MetaCircularEvaluator (since they are much shorter than these other pages, and although they shed only indirect light on the topic, it is potentially bright light).
- There are many points of view presented on these pages, but there certainly are some things to consider, for instance, any language is capable of manipulating its own source code as strings, if only by reading its source file and fiddling with that. Therefore, that ability shouldn't be called "homoiconicity", since essentially all languages can do that, yet the people who coined the term (see history, top of page) meant to distinguish some languages from others.
- It's also important to be cautious of nuances about source vs interpreted vs compiled. Everyone agrees Lisp is homoiconic. Traditionally, Lisp is compiled as an option, but defaults to interpreted. Its homoiconicity is clear when confined to interpreted S-Expressions, but is less clear for implementations which compile to machine code. Similar concerns apply to interpreted versus compiled Forth (and more so, for reasons concerning Forth previously discussed)
- ObjectsAreDictionaries and many dynamic OOP languages generally treat objects and maps as the same thing. "x.y.z" is simply a reference to a nested map. However, it tends to only apply to object/class structure and not so much to implementation syntax, although Smalltalkers may argue otherwise.
- PerlLanguage
- data/code type: scalar variable holding a string
- evaluator: eval STRING (also eval BLOCK, different thing)
- note: playing around with strings is just the kind of bussiness Perl enjoys :-) (plus: you can do the same at the AST and bytecode level)
See the note below about
primary language evaluator being an implementation concept. The fact that multiple approaches (string or block, AST, bytecode) are cited makes me wonder, what is the "fundamental datatype" in Perl?