Run Time Engine Schema

This is a rough draft of parts of a "run-time engine" done in a relational style for a simplified, roughly Java-like toy interpreter. I am not promoting Java, only studying run-time engines assuming features that most are familiar with. It may be used for discussions on types and the nature of run-time engines. (Note that this version does not preserve the original control structures such as loops and conditionals, assuming translation of them to GOTO-based hopping instead, and thus is not useful for exploring internal representations of higher-level control structures.)

  table: module    // could also be a class
  -----------------
  modName  // p.k.
  modDescript
  filePath
  modType   // for inheritance

table: moduleInstance // mostly for OOP --------------- modInstID // p.k. (autogen) modInstName // object name if an object modRef

table: routines -------------------- routineID // p.k. (autogen) modRef routName isMain // (can name alone be sufficient?) routDescript returnType (constraint: modRef + routName must be unique)

table: parameters ------------------ routineRef paramName paramType direction (in, out, both)

table: routineInstance ---------------- routInstID routineRef modInstRef instLevel // level on the "stack"

table: varInstance // actual variable values (in the stack) ------------------- varName // pk 1 routInstRef // pk 2 varValue

table: variables // declared variables ----------------- varName // pk 1 routineRef // pk 2 varType

table: statement ----------------- routineRef stmtID // auto-gen p.k. lineNo // source line reference (for debugging and tracing only) assignVarName // optional rightPartText // (excludes parameters) rightType // F=function/method, V=variable, C=constant

table: callParams -------------------- stmtRef routineRef // f.k. to parameters table varName // remember, all params must be vars or numbers at this stage

This assumes that every module has at least one instance. It can have multiple instances if a module is used as an object. Each module is assumed to have one "main" routine (instance). This is where any "module-level" variables go. Module-level variables are assumed visible to other routines in given module.

This simplified language assumes:

All block operations are translated into goto's. A dummy goto function (such as "sys.goto") is created that references the target stmtID. Dummy compare operations are used conditional branch goto's such as gotoLT, gotoEQ, and gotoGT , etc. (less than, equal, greater than). Dummy variables, such as "~T345" are created for nested items. "~" is not allowed in normal code text var names, but are valid internally. They are treated just like any other variable. There are also a set of system variables used for internal constants. Example pre-translated code segment:

 if (m.a > m.b) {
   x = m.foo + m.bar(m.a);
   blah.glob(7);
   y = 7;
 }

The pseudo-code for the translation resembles:

 ~T005 = m.a;
 ~T006 = m.b;
 ~T007 = sys.compare(~sysGT, ~T005, ~T006);
 sys.gotoLTE(~T007, 13);   // [a gotoTrue and gotoFalse may be more generic]
 ~T008 = m.foo;
 ~T009 = m.a
 ~T010 = m.bar(~T009);
 x = sys.math.add(~T008, ~T010);
 ....
 13: // goto label

All nested statements or function calls are converted into dummy variable references first. Basically the parser reduces everything to a statement with one of the following formats:

  a = functionOrMethod(var1, var2, var3...)
  a = variable
  a = constant

The assignment part (left side) is optional. Numeric constants are also allowed as parameters. Once it is convertible to this format (perhaps conceptually), it is readily convertible into the above schema. (If you find any stumpers, please let me know).

Compound statements such as:

  a = b + c + d + e;

are converted to dummy variables:

  ~t1 = sys.math.add(b, c);  // pseudo-code
  ~t2 = sys.math.add(~t1, d);
  a = sys.math.add(~t2, e);

The order in which this is done is up to the parser. I am not defining how the parser works here. The only parsing done by the run-time engine is to interpret constants and values (usually by the built-in classes).

Note that recursion does not generate new dummy variable names because the dummy variables are only created once per definition, not once per call.

Types

All variables and routines must have a type or return type. All types are classes. Since this is a single inheritance system, all types can be traced back to the root, which will be one of the base built-in types (classes): int, float, string, etc. There is no way around this since every class either is defined by the program, is one of the root types, or is invalid. The built-in types are essentially reserved class names.

Types are checked before execution starts by searching up the inheritance tree for a match. The same is done for matching method calls. Example:

  // Pascal-style declarations
  class Point {
    x: float; 
    y: float;
    method setx(tempx: float) {x := tempx;)
    method getx: int {return(x);}
    ....
  }
  class Line {
    point1: point;
    point2: point;
    color: colorClass;
    method constructor(point1: point, point2: point) {...}
    ....
  }
  class ThickLine: Line {   // subclass of "Line"
    ....
  }

.... p1 = new Point(3.8, 4.4); p2 = new Point(2.1, 7.2); myLine = new Line(p1, p2); myLine2 = new ThickLine(p2, p1)

When the constructor for "myLine" is checked, p1 is known to be type "Point". The modType (in table "module") for "ThickLine" would be "Line".

The run-time steps are generally as follows:

1. Parser (if any) reads text code and fills up the above tables.

2. Execution engine checks structure for consistency and makes sure types are valid.

3. Information in tables is "executed". In other words, the very first "line" of the program is finally executed.

NOTES

-- top


And now since you created something vaguely of the power of C (you're missing arrays). I can tell you your system does not support polymorphism since you cannot write a sort routine that will sort any array of elements for which there is a compare function, now since all types are classes, you cannot have protocol as types, parameterized types, union types, function types. By the way the routine table may need to have a reference to the statement that is its entry point. So what was the point that you were trying to prove?

This is not meant to be a "rich" language by any stretch. It is supposed to have just enough features to illustrate Java-like typing issues and to serve as a starting point for later enhancements. It has "types" in the usual sense, that is all that is important right now. Does your pet definition require all that stuff to be considered "typed"?

As far as the entry point, it just executes the lowest statement ID for the given routine. I suppose having a reference in the routine side would speed things up, but speed is not the issue here.


Here is my line of reasoning. Please indicate where it breaks down according to your point of view.


Okay, let's put the battle over what "syntax" means to the side for now. We both agree that it is possible to have the code be represented as a structure of some kind, perhaps as an AbstractSyntaxTree or something like above. We don't need the text to have programs. How does the Reynolds definition apply to such a thing? Obviously such things can have "types" in the sense that they mirror what is called "types" in languages such as Java.

You're working off a different definition of 'syntax' than mainstream TypeTheorysts. Syntax means "text" for you, while syntax has the somewhat broader definition of "representation" for them. An AbstractSyntaxTree is still a representation of code. So are the database tables you have above. And TypeTheory is a discipline for determining whether such a representation is well-formed or not.

Your RunTime never rejects a statement as invalid. It may give an error when it tries to execute it, but it never says "This is not a valid program." That's why it doesn't have (static) types. -- JonathanTang

Actually his schema above encodes a very primitive syntax:

 <statement> ::= <assignment> | <routine call>
 <assignment> ::= <variableName> <routine call>
 <routine call> ::= <routine ID> <call parameters>
 ...
Well, when he polishes his table it will even be right. Where routine ID can be either something stored as in the schema or primitive routines like "goto". So, I wouldn't see in principle why he couldn't add even a type system to that. AbstractSyntaxTrees can be represented even in relational tables. So when he adds the type system that assigns types and performs typing judgements, that types system will operate off the above syntax. Just like Reynolds predicted. If he doesn't do any validation and just tries to execute statements at run-time until the program finishes or a error is detected (wrong number of parameters, variable not bound, wrong parameter to the system call) then he doesn't have a type system at all. He just has some trivial runtime checks. QED.

As described above it would check all the "type" columns before actual "execution". It searches up the class tree structure until one of the built-in (root) types are found. If you can find holes in the "type system", please do. -- top

Please note that in your system (as well as in other dynamic system) it is superfluous to check "types" for anything other than system primitive routines (goto, +, *, etc). If you want to call those checks a "type system", be my guest. I refer you to HumptyDumpty.

No. User-defined classes are "types". And, the type-checking here is NOT dynamic, although could perhaps be made so with only a few tweaks. It is fairly clear you did not read the description very thoroughly.


See also: AbstractSyntaxTree, WhyTypeSyntax


CategoryCompilers


EditText of this page (last edited September 27, 2007) or FindPage with title or text search