Tops Tag Model Two

Pseudo-Code Kit and Examples

A kit of samples and conventions to build language-specific models of type-related aspects of "typical" dynamic or scripting programming languages that follow an Algol-like influence (which includes languages resembling C, Pascal, VB, and Python syntax, for example). This can aid in predicting and remembering type-related results by allowing one to "x-ray" the parts and their actions. They can be viewed as mental exercises to help build and strengthen mental models of type-related processes in a target language.

This kit may not reflect actual construction of the workings of production interpreters, and the "parts" used in the model are model-specific and not intended to be "canonical" or "official" in a general sense[5], but this does not prevent it from being useful for prediction and serve as a memory aid. Actual interpreters have a lot of layers and artifacts that although may improve machine performance, are generally difficult to remember and mentally apply, and are thus not necessary nor helpful for our training-related goal.

The modelling kit can create descriptive imperative pseudo-code models that use XML representations of variables (and variable-like entities) in order to model the "parts" of a variable during the processing of a given operator in a given dynamic "Algol-style" language in order to better predict the language's behavior with regard to type-related issues. Sample implementations are given, and the model user can select among the sample implementations that best the behavior of the language. Test snippets are built to isolate the target operators in a way that allows for testing and experimentation. TypeTagDifferenceDiscussion gives examples typical tests that can be performed. TypeHandlingGrids can be used to track experiments in a systematic way.

If the pseudo-code is not detailed enough to explain the behavior, then in theory StepwiseRefinement can be used to eventually create actual implementations of the target operators. We only turn the refinement dial as detailed as we need to in order to understand the "reason" for the result in executable code.

We are essentially creating a model or models of an abstract interpreter of a given operator in question. We don't want to model an entire language if we don't have to such that we only create models of subsets of the language. In order to simplify the model description, certain deconstructions are done manually, such as expression reduction. Thus, it's up to the model user to perform the correct order of operations. These can be done by reading the documentation and/or experimentation. In an expression such as "x=a+b+c", it can sometimes make a difference whether the right "+" is evaluated before the left one (per language) where mixed types are involved.

An example of the internal structure generated for a variable [2]:

 foo = 123;

 <var name="foo" type_tag="number" value="123">

Here is the model we'd use for a non-tag language:

 <var name="foo" value="123">

Experiments are probably needed to determine if the language uses type-tags or not. Generally one finds this out if the type tag makes no difference. In other words, it's considered non-tagged if no experiment can be found that shows there is information other than the "value" as described in the structures above.

API's

(Pseudo-code syntax roughly based on AlternativesToCeeSyntax, roughly a C/Pascal hybrid.)

 func updateVar(vname: string, type_tag: string optional, value: string optional) {
   // Update the XML structure for the given variable; create new variable if not exists
   ifVarNotExistsInThisScope(vname) {
     if LANG_ALLOWS_DYNAMIC_VAR_CREATION {
        createVariable(vname, LANG_DEFAULT_TYPE, LANG_DEFAULT_VALUE);     
     } else {
        raiseError("Variable not defined" & vname)
     }
   }   
   if parameterGiven("type_tag") {
      updateTypeTag(vname, type_tag);           
   }
   if parameterGiven("value") {
      updateValueAttrib(vname, value);
   }
 }
 .
 func updateTypeTag(vname: string, typeName: string) {
   // code to update the "type_tag=" attribute of the 
   // corresponding XML structure for given variable 
 }
 func updateValueAttrib(vname: string, value: string) {
   // code to update the "value=" attribute of the 
   // corresponding XML structure for given variable 
 }
 func getTypeTag(vname: string):string {
   // Retrieves the value (X) of the "type_tag=X" attribute of a given 
   // variable's XML representation.
 }
 func getValueAttrib(vname: string):string {
   // Retrieves the value (X) of the "value=X" attribute of a given 
   // variable's XML representation.
 }
 .
 func passParam(oldVarName:string, newVarName:string, typeName:string optional) {
   // To emulate parameter passing. For languages that
   // allow for explicit parameter names, type coercion is available.
   var useTag: string, useValue: string;
   .
   useTag = getTypeTag(oldVarName);
   useValue = getValueAttrib(oldVarName);
   .
   if LANG_ADJUSTS_PARAM_TYPE And parameterGiven(typeName){  //[footnote future]
     if isParsableAsType(useValue, typeName) {
       useTag = typeName
     } else {
       raiseError("Parameter cannot be converted to given type.")
     }
   }
   setTypeTag(newVarName, useTag);
   setValueAttrib(newVarName, useValue);
 }
 .
 CONST-LIST-OF-VALID-QUOTES = {ascii(34), ascii(39)} // varies per language
 .
 func assignLiteral(vname: string, value:string, delimiter: string optional) {
   // Assign a literal to a variable. (Use copyVar() for non-literals.)
   var useTypeTag;
   // Is the literal quoted? (Is the literal one of the following...)
   if delimiter in CONST-LIST-OF-VALID-QUOTES {  
     useTypeTag = "String"
   } else {
     useTypeTag = "Number"
   }
   // (Insert any other "type markers" a given language may use above in an ELSE IF)
   updateVar(vname, useTypeTag, value);
 } 
 func copyVar(vname: string, copiedVarName: string) {
    // To copy one variable into another: "a=b;" would be copyVar("a", "b");
    var useValue: string, useTag: string;
    useValue = getValueAttrib(copiedVarName);  
    useTag = getTypeTag(copiedVarName);
    // Save the results
    updateVar(vname, useTag, useValue);
 }

(Dots used as work-around for a wiki spacing bug on some browsers.)

Example "Reduction"

The modelling kit is not intended to be an entire virtual interpreter, for that would over-complicate it. Instead, the human does most of the parsing and reducing. But it's only done in a fine enough level to leave type-related behavior to the pseudo-code models. If possible, every expression will be reduced to functions[3] with not more than 2 parameters.

Original Test Snippet:

  var a = 123;
  var b = "123";
  writeLine(a + b + 7);

Reduction level 1:

  var a = 123;
  var b = "123";
  var temp01 = a + b;
  var temp02 = 7;  // the order of evaluation may vary in some langs
  var temp03 = temp01 + temp02
  writeLine(temp03);
  // This rewrite allows us to examine each part using multiple techniques and
  // helps us write further reductions. Embedding sub-expressions reduces that ability
  // such that we de-embed our adjusted representation of the snippet. [4]

Reduction level 2:

  assignLiteral("a", "123");
  assignLiteral("b", "123", CONST_DOUBLE_QUOTE); // if a literal has quotes, we indicate such
  plus("temp01", "a", "b");
  assignLiteral("temp02", "7");
  plus("temp03", "temp01", "temp02");
  writeLine(temp03);

Plus-Sample-1 -- Example Implementation of "Plus" (from above)

  func plus(targetVarName, vnameLeft, vnameRight) {
    var valLeft, valRight, tagLeft, tagRight, useTag, useVal;
    // Extract parts of operands
    valLeft  = getValueAttrib(vnameLeft);
    valRight = getValueAttrib(vnameRight);
    tagLeft  = getTypeTag(vnameLeft);
    tagRight = getTypeTag(vnameRight);
    // Process the parts
    if isParsableAsType(valLeft, "Number") And isParsableAsType(valRight, "Number") { // 4837
      useVal = performMathAddition(valLeft, valRight);
      useTag = "Number";
    } else {
      useVal = performStringConcat(valLeft, valRight);
      useTag = "String";
    }
    // Save the results
    updateVar(targetVarName, useTag, useVal);
    // some langs will handle other types, like dates, not shown here.
  }
  // Different languages will use different logic. This is generally
  // the most flexible approach in terms of dynamic conversion.

Plus-Sample-2 -- Example Implementation of "Plus", ALTERNATIVE 2, to mirror JavaScript:

  // Sample partial experiment in JS for reference
  alert(2 + 3);   // 5   alert(typeof(2 + 3)); // number
  alert("2" + 3); // 23  alert(typeof("2" + 3)); // string
  alert(2 + "3"); // 23  alert(typeof(2 + "3")); // string

  func plus(targetVarName, vnameLeft, vnameRight) {
    var valLeft, valRight, tagLeft, tagRight, useVal, useTag;
    // Extract parts of operands
    valLeft  = getValueAttrib(vnameLeft);
    valRight = getValueAttrib(vnameRight);
    tagLeft  = getTypeTag(vnameLeft);
    tagRight = getTypeTag(vnameRight);
    // Process the parts
    if tagLeft == "Number" And tagRight="Number" {    // 94838
      useVal = performMathAddition(valLeft, valRight);
      useTag = "Number";
    } else {
      useVal = performStringConcat(valLeft, valRight);
      useTag = "String";
    }
    // Save the results
    updateVar(targetVarName, useTag, useVal);
  }

Templates

Template for typical one-operand operator:

  func myOp1(targetVarName, paramVname) {
    var paramVal, paramTag, useVal, useTag;
    // Extract parts of operand (parameter)
    paramVal = getValueAttrib(paramVname);  
    paramTag = getTypeTag(paramVname);
    // Do stuff with the parts we just unpacked, including conditionals 
    // if necessary, and update useTag and useVal.
    // ...
    // Save the results
    updateVar(targetVarName, useTag, useVal);
  }

Template for typical two-operand operator:

  func myOp2(targetVarName, vnameLeft, vnameRight) {
    var valLeft, valRight, tagLeft, tagRight, useTag, useVal;
    // Get parts of operands
    valLeft  = getValueAttrib(vnameLeft);
    valRight = getValueAttrib(vnameRight);
    tagLeft  = getTypeTag(vnameLeft);
    tagRight = getTypeTag(vnameRight);
    // Do stuff with the parts we just unpacked, including conditionals 
    // if necessary, and update useTag and useVal.
    // ...
    // Save the results
    updateVar(targetVarName, useTag, useVal);
  }

Notes and Footnotes

[1] The "value" portion will usually be modeled as strings to keep the model simple and dissect-able. When certain language-specific operations are performed, the model may need to convert values into other types such as Double, and then convert back to string for the general processing portions. In some cases this may lead to subtle rounding or truncation differences from the actual interpreter. But this model is primarily to study type-related issues, not floating-point processing issues/effects. A more elaborate model (tuned to a given language) may be needed to model such precisely, perhaps on a byte-by-byte basis on "compressed" values. Such models don't necessarily conflict with this "tag" model/kit, but will likely be heavily complicated by such additions/explicitness. However, if those kind of issues are of importance to a given project, then a static language is probably the better tool for the job because dynamic conversion is more likely to result in unexpected conversion surprises and/or loss of precision.
[2] This model of "variable" does not necessarily match actual interpreter implementation. It may also not match some people's notions of what a "value" is and is not. The model is factored for the particular purpose of predicting type-related results (output) in dynamic languages and may differ from other models or preferred notions and/or vocabulary usage. (I dispute that a canonical standard for terms like "value" exists, but other WikiZens disagree.)
[3] Usually we are modelling only built-in operations or functions. It's fairly rare that we need to model user-defined functions directly because we can already see the source code for those. One generally isolates type-specific issues or questions down to operations that we cannot see the source code for, which usually leaves just the built-in operations or functions. One possible exception is parameter handling in languages that allow explicit parameter type declarations or validation. The passParam API can be used to emulate such a transformation for input parameters. A return result emulator is left as a reader exercise. [More info on emulating function calls will be supplied later.]
[4] The expression deconstructed re-write does not have to be done in actual code, it's generally a "mental step" once one gets the hang of the model. However, model beginners should perhaps try a few to get a hang of it, and/or to verify the results are the same for both the original expression and the deconstructed version. Further, generally one should isolate the "type-related issue" to very short expressions or simple assignments if possible. It's similar to sending/logging a bug report with language vendor: the vendor does not want your entire Accounts Receivable system, but rather the smallest snippet possible to illustrate the issue.
[5] Whether there are canonical or "official" parts of interpreters or models of interpreters is subject to heated debate. The description here tries to match the actual "parts" of typical dynamic languages in terms of objective syntactically-identifiable elements, such as "variables", "expressions", etc. (as defined by the languages' grammar/parsing rules). But the "proper" names or classification of parts of the "inner workings" can be subject to debate.
Note how the type system of the language (pseudo-code) is NOT used for the model itself (at least not the "type" parts). The value (representation) and the type tag are all stored and moved around as strings for consistency and easy examine-ability (except where actual math is performed). An actual interpreter may "compress" certain values into binary for efficiency etc., but we are not modelling that in order to keep the model simple. A possible side-effect of that is subtle rounding or truncation differences for floating point results, and perhaps "overflow" handling. But our main concern is about "types", not numeric computation, so we live with those differences to keep the model simpler and focused.
I realize it's possible that breaking expressions down into or with intermediate "hidden" (temp) variables in theory could make it produce difference results from the original in some language designs, but I've yet to see it happen in practice for dynamic languages. (It definitely happens in static languages because each var must use a given type.) If and when I discover an actual problem with that practice in a given dynamic language, then I'll re-evaluate the model, or at least form a sub-model to handle such languages. It would be a curious specimen.
PageAnchor internal-variable-naming -- The "name" is the unique identifier in the XML models of variables. No two XML statements shall have the same name. This is a simplification, for in practice there would also be a scope identifier of some kind such that the primary key would be the scope identifier plus the name (of the variable), per relational rules. But without this, one has to take care to avoid unintended name overlaps in test samples. Also, the internal intermediate variables would ideally use naming conventions that are not allowed in production (external) code to prevent overlaps. For example, internal names could allow a carrot symbol ("^") but not production variables. Thus, the processor would generate names such as "^temp00034". The parsing step would reject the carrot in production code, but the internal names are generated at a step later than this parsing, and that's how they get around the carrot filter. This way the internal variables are immutable and inaccessible to "regular" application programs. (I've worked with debuggers that used this technique to allow the debugger-using programmer to analyze and change intermediate "steps".)
If you use actual code instead of pseudo-code, you don't have write it in the language you are modelling. It doesn't even have to be a scripting language. We are modeling the dynamicness of types directly rather than relying on the type system of the model interpreter. This better allows us to examine the "parts" of variables.
I didn't invent the "internal variable" modeling approach in this model; I copied the idea from another system I once saw on a VAX minicomputer (the details of which escape my memory). I found it a clever and parsimonious metaphor/technique for modeling the guts of an interpreter, which reused the same concept and mechanisms for internal and external "variables". I realize some are disturbed by the technique, but I find their case weak. I agree it's less efficient than stack-based approaches, but machine efficiency is not the goal of this model. -t

Why create a new page instead of using TopsTagModel?

I wish to have the reference material at the top for easy re-opening and finding. Perhaps TopsTagModel can be refactored to have the discussion part in TopsTagModelDiscussion??

Sounds good. Feel free to move the TopsTagModel discussion somewhere else.

Okay, but I have to think about the org a bit more, such as whether reference material should be a diff topic than examples. The resulting sizes will likely dictate that.

{I have been thinking for some time that all of this discussion should be in a category. Would anyone care to suggest one?}

In the spirit of CategoryOopDiscomfort, how about CategoryTypeDiscomfort?.

{Good.}

We already have CategoryTypingDebate, I now remember.

Discussion continued from ValueExistenceProof:

I don't see what purpose there is in giving variables a "type" property, nor do I see the purpose in dealing with quotes. (I'm happy to continue this threadlet on the appropriate page -- feel free to move it there.)

In the first "plus" sample above, the type tag is indeed not used. But if we modified it to better match say JavaScript, then it would, per Plus-Sample-2.

From TypeDefinitionsSmellBadly:

[Your] func updateTypeTag(vname: string, typeName: string) apparently changes the type of a variable independently of its value representation. Why?

That's just a low-level accessor. In practice, it probably would not be used in isolation, although the details of that depend on the specific language being modeled. In other words, both the type tag and the value are usually changed together in the same mid-level operation/function. An example of where the type tag may be changed in isolation is a "cast" (conversion) operation that converts a number to a string. However, that's not necessarily the only way to implement such.

How do we know what are "low-level accessors" (presumably inaccessible outside the API) vs high-level functions exposed by the API?

That depends on each specific language being modeled. Examples for common/typical languages would be given, but the details about how the parts are put together is up to the kit user. If a kit user wants to add extra GateKeeper levels, etc., that's up to them.
Why would it depend on the specific language being modelled? What language consistently meaningfully changes the type of a value but not its value representation? For example, how could you cast an integer 23981029 to a float 23981029.0 without changing its value representation?
Moved rest of discussion to TopsTagModelTwoDiscussion, under "Regarding updateTypeType()"

Why would you mutate a value in-situ as part of a typecast? I can see occasions where an optimiser might choose to do so, but it seems peculiar to model it that way. A typecast is better modelled as a function that accepts a value of type 'x' and returns a value of type 'y'.

Why is it "peculiar"? Another way is to simply pass a reference to the target variable (internal or external) and let the typecast operation do whatever it wants to with the referenced variable (which may simply be to change the type tag anyhow, depending on the target language to model and/or kit user personal preference).

It's peculiar because you're mutating a value, which means conflating values and variables. For something intended to simplify understanding of values, variables and types, it doesn't help to treat values as variables. I don't know what you mean by the text following "Another way is to ..."

The discussion chain starting with ValueExistenceProof discusses this issue. I don't want to reinvent those debates here and clutter it up.

CategoryTypingDebate, CategoryLanguageTyping