Flags Are Self Modifying Code

In this context, flags are variables that control loops or conditionals, and are set or modified for the purpose of controlling or changing the behavior of said loops or conditionals.

Flags are often used to simulate GoTo, break, continue, or return statements in languages such as PascalLanguage or in environments where programmers are encouraged to follow a "One Entry, One Exit" discipline for all code.

In Dijkstra's GotoConsideredHarmful letter, he argued that in order to minimize the difficulty of reasoning about programs, we should minimize the distance between the static code on the page and the dynamic process in the computer. Unfortunately, flags increase that distance. When a reader encounters a conditional or a loop that is controlled by a flag, the reader must figure out the previous dynamic behavior of the program in order to understand what conditions will lead to what behavior.

Because flags allow code to change the behavior of code (maybe unrelated code, maybe even the conditional controlling the modifying code!) they are a case of SelfModifyingCode.

It's a bit of WaterbedTheory at work.

Is the following PerlLanguage program self-modifying code? No.

 my $first = 1;
 while(<>) {
   chomp;
   if(/X/) {
     print "\n";
     $first = 1;
   } else {
     if($first == 0) {
       print ", ";
     } else {
       $first = 0;
     }
     print $_;
   }
 }
This is a program that takes a sequence of inputs, terminated by an "X", and prints them out on one comma separated line. For example, the input:
 1
 2
 3
 X
 4
 5
 6
 X
 7
 8
 9
 X
Becomes:
 1, 2, 3
 4, 5, 6
 7, 8, 9
How about this?
 my $action;
 sub print_comma { print ", "; }
 sub reassign { $action = \&print_comma; }
 $action = \&reassign;

while(<>) { chomp; if(/X/) { print "\n"; $action = \&reassign; } else { $action->(); print $_; } }
Or this?
 sub print_comma { print ", "; }
 sub reassign { *action = \&print_comma; }
 sub action { reassign(); }

while(<>) { chomp; if(/X/) { print "\n"; *action = \&reassign; } else { action(); print $_; } }
Or this?
 sub print_comma { print ", "; }
 sub reassign { eval "sub action { print_comma(); }"; }
 sub action { reassign(); }

while(<>) { chomp; if(/X/) { print "\n"; eval "sub action { reassign(); }"; } else { action(); print $_; } }

None of those programs are SelfModifyingCode. Use of scripts, FirstClassFunctions, etc. doesn't qualify. Consider keeping an array of characters and using the evaluator to evaluate code that accesses and manipulates that array of characters then loops back such that it continues execution without appealing to an external 'eval' function repeatedly, and you'd have a program that executes archetypal SelfModifyingCode. Alternatively, make your program open the file containing the program and manipulate it, and use it with a runtime that opens code files as shared memory, and you could do it. SelfModifyingCode is difficult to do once you get past machine-code, though.

Okay, how about this EmacsLisp code?

 (defun foo () (setcdr (cdr (symbol-function 'foo)) (list 2)) 1)
 (foo)
   => 1
 (foo)
   => 2
 (foo)
   => 2
I know that my view is more inclusive than average, but if you don't consider that to be SelfModifyingCode, then yours may be more exclusive than average. I think most people consider using eval() to redefine parts of the environment to be SelfModifyingCode, and I think most people would consider the above EmacsLisp to be SelfModifyingCode.

It was stated quite clearly in the page on SelfModifyingCode that most people do not, in fact, consider "the use of MetaCircularInterpreters (i.e. the "eval" function in languages like LispLanguage or JavaScript) to execute code fragments -- including code fragments that may have been computed at runtime." to be SelfModifyingCode. Indeed, it is also stated that "Usually, the term refers to code which causes the compiled instruction stream for the underlying processor to be mutated". My conservative use of the term seems to be in accordance with other people's views. I suspect your view on this matter is far more inclusive than average on each of two points: in your interpretation of the word 'self-modifying' and your interpretation of the word 'code'.

The above Lisp code might qualify as SelfModifyingCode (since you're tweaking the underlying representation for the function being called, and the 'overall' program includes instructions to call it more than once), though it would be much more impressive if foo was recursive such that you could say "the future instruction-stream in the execution of foo is being modified for the processor of foo".

The most common use of dynamically SelfModifyingCode code we have today is byte-code tooled internally for JustInTime compilation, and even that is difficult to justify as "SelfModifyingCode" because (a) if done correctly, the code does exactly the same thing it did before the JIT compilation, and (b) the code written by the programmer itself isn't instructing the use of JIT and removal of that instruction (i.e. you could point at the bytecode as self-modifying, but not the artifact of the programmer). Between these, the 'properties' of SelfModifyingCode (the code being run having changed from what was written by the programmer) aren't really applying. Nobody ever has a problem with a compiler choosing "goto", and there also aren't many problems with a compiler choosing to tool bytecode JIT.

Can we agree, unambiguously and completely, that the code on this page is, indeed, self modifying? http://public.carnet.hr/~jbrecak/

Well, the page certainly provides a framework for self-modifying code, and 'main' modifies its own code after running it once. It's obviously a toy example, though. I do note that the author is just as picky as I am on use of the word, e.g. saying in one comment: "also note, that that would not literally be self modifying code as we would be injecting over a code that has not yet run. now *thats* pointless (unless you are intentionally obscuring things).".

For what it's worth, I agree with you that JustInTime compilation is not SelfModifyingCode, and this is for basically the same reason why I agree with the bit you quoted. It seems to me that the same section of code needs to be able to behave differently from one execution to the next in order to count.

How's this for a more conservative claim that doesn't depend too heavily on having a broad definition of SelfModifyingCode?

Understanding FlaggyCode? is difficult for the same reason that it is difficult to understand SelfModifyingCode. The behavior of the flag controlled or self modified code cannot be predicted without understanding the dynamic behavior of the program prior to the execution of the flag controlled or self modified code.

I still don't agree with the claim, but I'm much happier with the wording. There is a great deal more about flaggy code that can be predicted, such as one can easily trace the code and mark the conditional operations and determine the potential code paths and the critical flags selecting these paths by examining the code. It is also usually no problem to see where these flags are modified within the loop. SelfModifyingCode, however, has a far greater potential to make code difficult to understand... to the point that dynamic behavior of code has nothing to do with what is written by the programmer.


In some weak philosophical sense, I suppose "self modifying" can be regarded as a fuzzy continuum with pure functional code on one side of the spectrum (i.e., no explicit "modification" at all, including change of state), changes to flow of control based on state (e.g., "flaggy code") and dynamic evaluation/execution mechanisms in the middle, and obvious self-modification -- such as scripts that re-write themselves to the point that the original source vanishes -- on the other side.

However, it seems to me that a relatively clear distinction can be made between (a) explicit changes only to flow of control, based on state, in static source (e.g., "flaggy" code) and, (b) explicit modification of the source and/or generated object code.

Traditionally, (a) is not considered "self modifying" because neither the source code, nor the code derived from it, is changed in any way that can be perceived by the developer. Item (b) certainly is considered self-modifying, because the source code and/or the code derived from it is explicitly changed in ways that can be perceived by the developer.


See also: DataAndCodeAreTheSameThing


MayZeroNine


CategoryCoding


EditText of this page (last edited November 26, 2014) or FindPage with title or text search