Re: "The 'internal, implementation' structure is, of course, opaque in any language."
Well, maybe this is the problem. Other languages may not have as a good syntax-to-structure design, but by not trying they make it easier to use structures themselves, i.e. datastructures not bound to syntax structure. There seems to be a cost lisp pays to keep the syntax simple enough to be close to the internal data structure, which is mainly that everything must be built as nested lists. Lisp's attempt at easy swappability between syntax and structure requires crippling the data structure power to some extent. I would rather have powerful structures than simple syntax-to-structure swappability. In designing a language, I would rather start with datastructures first and work backward toward syntax compatibility. CollectionOrientedProgramming is too grand to sacrifice. If an easy one-to-one translation cannot be provided, then give up the easy swappability goal. (BTW, I don't "hate lisp". I just think it is oversold.) Text is one-dimensional, and to represent anything more complex (or richer) than trees, your text gets ugly simply because a 1D structure has inborn representational limits. -- top
pure homoiconic (term invented for this page): a language where homoiconicity applies to all language constructs. This certainly applies to core Lisp 1.0 as defined by JohnMcCarthy in his landmark 1960 paper, which had only S-Expressions, but may not apply to most later dialects that supported data types other than EssExpressions, because e.g. an array can't be directly evaluated as code (although this is possible indirectly).
-- J. Jakes-Schauer
The old YouJustDontGetIt trick, usually by those who have difficulty arguing that their personal preference should be made a forced standard. --top
From HomoiconicExampleInManyProgrammingLanguages discussion
As to how homiconicity and halting problem are related, they aren't in general, except when it comes to program transformation. A function "instrumenter: s-expression -> s-expression" will not be able to decide which part of the input tree is code and which part of the tree is data. The same goes about a TCL program, which you claim to be homoiconic, but instrumenting TCL programs is just as unfeasible. In general, even if you modify the interpreter (which was not part of your example), you won't be able to quantify the dead code in the program because there's no algorithm to separate what is code and what is data. And I'm not talking here about self-modifying programs, I'm talking here about purely functional programs.
All true. And the same applies to any system that generates code dynamically, which is a larger class than what "self-modifying" usually refers to. Pragmatically, however, this would be solved by modifying the code generator, or the eval, or add tags to the quoted code-as-data, or something, to force the instrumentation to happen in the right places, since obviously it would be hopeless to attempt to solve the general HaltingProblem in the course of implementing instrumentation.
This is not a flaw of homoiconic languages nor of code generation systems, however, since after all the only way around it is to remove the ability to manipulate code dynamically. It is an example of the PrincipleOfLeastPower, and why it is sometimes better to use e.g. XML than e.g. Lisp; the HaltingProblem doesn't arise in XML.
Well, ok, maybe there's a case to be made for it being a flaw. Hmm... No, it's like I said, the point is merely that one would need to modify eval to do the instrumentation. The confusion here would seem to arise from the implied thought that "you can't statically change source code in a homoiconic language in order to add instrumentation", but that problem arises in non-homoiconic languages that do anything with generated code, too. So instrumentation of dynamic code must be done dynamically, by eval, by a code generator, whatever. That's all. -dm
I do see what you are getting at. If instrumenting conditionals, how do you decide whether quoted lists starting with 'cond' or 'if' are simply data (a list of keywords, say) or code blocks that might be evaluated later? When CodeIsData, there are some corner-cases you need to account for. This kind of thing doesn't happen when instrumenting C code (reminiscent of the difference between VonNeumannArchitecture and HarvardArchitecture). -- IanOsgood
Re: CommonLisp ...given a function name there's no portable way to recover the S-Expression defining that function.
I also acknowledge that CommonLisp's lack of a guaranteed means of getting a function's defining sexp is an impediment to some tasks. -- DanM
The supposed flaw in an instrumentation function's ability to handle eval is not a problem. Here's the original code:
((if (halting-question?) eval length) '(if #t (cons (proc1) (proc1)) (cons (proc1) (proc2))))If this is the body of a function, and the task is to time each expression that makes up the body of the function, then it's a single expression and no further examination of subexpressions is necessary. This may seem like a cop-out, but it's actually the most natural interpretation of the problem.
If the task is extended to require timing each subexpression lexically present in the function body, then there are nine additional subexpressions to be instrumented (assuming Scheme, where the first argument of an expression is treated as a subexpression):
(if (halting-question?) eval length) '(if #t (cons (proc1) (proc1)) (cons (proc1) (proc2))) if quote [implied by '] (quote (if #t (cons (proc1) (proc1)) (cons (proc1) (proc2))) (halting-question?) eval length halting-question?The arguments to QUOTE are of course not evaluated, so none of them constitute subexpressions that need to be timed. Again, no problem.
The notion that you have to worry about what's handed to eval would only arise if the requirement were to also instrument every expression in all called functions, which seems unreasonable. In the more general case, the data handed to eval may not even be a part of the function being evaluated, it might be data passed in as an argument or taken from a symbol value.
Even if eval were treated as a special case in the requirements, just to force the point that someone is trying to make, well, even this can be handled fairly trivially. The instrumented body of the function could take this basic form sort (assuming my recollection of Scheme is correct):
(let ((system-eval eval)) (let ((eval (lambda (code) (system-eval (instrument code))))) ... instrumented code ...))Feel free to correct the above code fragment if it needs it. :) This would probably need to be embellished to omit the instrumenter's execution overhead from timing of the intercepted calls to EVAL.
The main point seems to be that an instrumenter "...will not be able to decide which part of the input tree is code and which part of the tree is data". I disagree. From the point of view of a particular program fragment, it seems that it's always decidable based on lexical information. If a Lisp-family instrumenter can examine all the code in the function being instrumented, it can always determine which is the case from the point of view of the function being instrumented. An instrumenter, like most code-walkers, may of course have to include special handling for special forms (e.g. QUOTE -- but note that EVAL is not even a special form). The shift of viewpoint from data to code occurs inside EVAL; the situation within the lexical scope of an eval-calling function is unambiguous.
I'd love to see a true counter-example.
You could force the issue further by requiring that you instrument every s-expression lexically present in the target function that will eventually be treated as code, but not s-expression which are not lexically contained in the target function. That would seem to be the interpretation of the requirements that makes the example relevant. But this is an impossible problem no matter how you slice it, and is not related to homoiconicity. Even if the instrumenter examined all called functions, decisions on whether a literal fragment is eventually treated as data or code could depend on user input or states outside of the call tree being examined.
-- DanMuller
You could do well to read the challenge again: "Write a function that when given a program (precondition: it doesn't contain macros and it is not self modifying), adds instrumentation code to all if statements, and then provide path coverage upon running the program." and then the claim, that it can be done in Lisp, trivially so. If you want to still defend that claim, please provide the trivial instrumenter. See for example http://www.linuxjournal.com/article/6758 for an introduction to how gcc does it. In your trivial instrumenter, even if you go to replace the system eval -- for which you have not provided a full account, you have to decide which part of the S-Expression can be code under any possible run of the program, not just under the current one, in order to calculate coverage ratio.
Also, you need to clarify the requirements. Add instrumentation to all if statements of what? A single function? All code in a given source file? Any data in such that might eventually be interpreted as code? Or ...? "Program" is ambiguous; it can refer to a single function or expression in CommonLisp. In your example, the list (if ...) is clearly data -- and thus not part of the program's code, except as literal data. So you really seem to want to instrument if statements that appear in data, which is a different problem, very difficult unless calls to EVAL (and perhaps related functions, such as COMPILE) can be intercepted.
I made an error in the above, in that the rebinding of eval is lexical, not global. In CommonLisp, one might have to operate on all relevant code and replace references to EVAL by references to a replacement function's symbol. This, of course, can fail to work with particularly tricky types of code, unless the interpreter has special knowledge of certain built-in functions. Perhaps this can be more easily solved in Scheme?
I don't understand your comment "under any possible run". Is that referring to this error in scope? -- DanM
To make a long story short here's a C program:
int f(int x) { if (x) return 0; return 1; } int g(int x) { if (x) return 1; return 0; } int (*table[2])(int)= { &f,&g}; int main(int argc, char** argv){ return table[0](0); }The path coverage is 25%. There can be no doubt about that.
$ gcov -b test.c 66.67% of 3 lines executed in file test.c 25.00% of 4 branches executed in file test.c 25.00% of 4 branches taken at least once in file test.c ...Here's an equivalent Scheme program:
(define table `( ,(cons 0 '(lambda(x) (if x 0 1))) ,(cons 1 '(lambda(x) (if x 1 0))))) ;main ((eval (cdr (assoc 0 table))) false)You tell me how your instrumenter will get that 25% code coverage.
I'm fairly certain the above doesn't qualify as a 'drawback' of Homoiconic languages, any more than it is a 'drawback' that gcc and gcov can't handle calculating which 'if' conditionals will run when the core is written as an evaluator over complex data.
A comparable Scheme program would be:
(define f (lambda (x) (if x 0 1))) (define g (lambda (x) (if x 1 0))) (define table (list (0 . f) (1 . g)) ((eval (cdr (assoc 0 table)) false)That'd be easy to instrument.
Can C handle the opposite case:
typedef char* variable_name; typedef intopcode; typedef struct _expression_ { typecode t; union { variable_name x; intn; struct _op_ { opcodeo; _expression_* lhs; _expression_* rhs; }op; lambda*fn; apn* app; }; } expression; typedef struct{ expression* eCond; expression* eThen; expression* eElse; } if_struct; typedef struct _lamdba_ { variable_name x; expression*body; } lambda; typedef struct _apn_ { expression* eIn; } void eval(expression* eOut, expression* eIn); <... I'd go on to describe 'table', but that's a royal pain in C. >... If I asked 'gcov' to handle the sorts of 'if' structures described above, I'd be aiming for disappointment. The above is data, not code.
Is it right to say that this is somehow a real Drawback of HomoiconicLanguages? I'd argue that you can't call something a drawback until you have a strong position from which to draw back. C can represent data, but makes doing so far more difficult than Scheme, to the point that casually describing 'table' as was done in the argument above is a headache in the making. C, however, is turing complete; providing an equivalent program to the original Scheme representation and eval is certainly possible. Asking that the Scheme instrumenter be able to perform a more complicated task than the C instrumenter is reasonable if it is easier to do... but calling a failure to do so a drawback of any sort is quite dishonest. If Scheme's instrumenter can handle the same cases as Cs, and can be written with greater ease, that's an advantage, not a drawback. --DB
See also HomoiconicLanguages, WhyWeHateLisp, MaspBrainstorming