Php Type System Discussion

Continued from TypeSystemCategoriesInImperativeLanguagesTwo

Are you sure the problem is in understanding TypeSystem categories defined by your model [tag model], as opposed to simply developing a better understanding of the language-specific peculiarities of (I presume) PHP, ColdFusion and JavaScript? In other words, does your model do a better job of explaining PHP peculiarities than the PHP manual?

Yes!

Can you give an example of where your tag model explains something that the PHP manual does not?

For example, here's the PHP online manual's description of is_numeric:

  is_numeric — Finds whether a variable is a number or a numeric string 
It does not explain what a "numeric string" is, which is a seemingly contradicting term. Ideally they'd hyperlink that phrase to a decent explanation and a way to empirically test for it's existence.

A "numeric string" is a common term for string containing a sequence of characters that conforms to the language's definition of "numeric". It's tested by using is_numeric(). is_numeric returns true if its argument is of numeric type, and true if it's a string containing a sequence of characters that conforms to the language's definition of "numeric". How does your model better explain it?

That is incorrect. "is_numeric" can return True even if it's not a string. Notice the "or" in the definition/description. It implies there are two "kinds" of variables that can return True for this function.

I think you must have misread what I wrote. I did write that "is_numeric returns true if its argument is of numeric type ...", i.e., when it's not a string type but is a numeric type.

And where is the definition of "numeric" in the manual? It should be a hyperlink(s) on the term(s) if they wanted to "do it right". Thus, I already have evidence that the manual is crappy.

The online PHP manual is notoriously crappy. For example, in various places it describes function arguments as being "variables", even though they are values that can be the results of expressions. However, "numeric" is universally understood. You are correct that a good and comprehensive manual should define it, but isn't this simply an argument for a better PHP manual rather than a justification for an abstract model?

No, "numeric" is NOT universally understood, at least not in a concise way. "gettype()" can return "String" for an otherwise "number-looking" variable (output). Again, this gets back to colloquial notion-y concepts of "types" versus something more clearly modeled, i.e. "explained".

"Numeric" is certainly universally understood as "represents a number". What this means in terms of a given language is dependent on the language definition, though I doubt any languages would not recognise integers as numeric. This has nothing to do with "colloquial notion-y concepts of 'types'" and everything to do with individual language characteristics. Entirely independent of whether 547894597854 is a valid integer in one language and invalid (perhaps because it's too long) in another, or whether 10E12 is a valid double or not, or whether 0x67 is recognised as an integer or not, and so on, is the consistent notion of "type" -- a set of values and associated operations.

What gettype() returns is dependent on how gettype() is defined. It has nothing to do with what "type" means. In this case, it appears gettype() returns name of the type of its operand, even if it's a string that contains a sequence of nothing but numeric characters.

Yes, it "appears", based on samples etc.

I based "it appears" on your description, and the description in the PHP manual.

So you agree that "'numeric' is NOT universally understood, at least not in a concise way."?

"Numeric" is universally understood, and concisely, as "being a number". However, that is obviously not a rigorous definition; the rigorous definition of "numeric" varies somewhat from language to language.

And potentially per operator. Thanks for proving my point.

And what is your point?

It's not "universally understood in a concise manner". If it varies per language, then that strongly implies that there is no "universal".

The details vary from language to language, but the essential characteristic is universal: "It's a number", which is very concise, too. Details about whether "it's a number" includes or excludes exponential notation, the maximum number of digits to the left/right of the decimal point, whether NAN is recognised as a valid value or not, etc., all depend on the specific numeric types within a specific language but the overall concept is universally understood.

Note that "numeric", as used in is_numeric(), is clearly defined in the PHP manual. See http://php.net/manual/en/function.is-numeric.php in the "Description" section, to wit: "Finds whether the given variable is numeric. Numeric strings consist of optional sign, any number of digits, optional decimal part and optional exponential part. Thus +0123.45e6 is a valid numeric value. Hexadecimal (e.g. 0xf4c3b00c), Binary (e.g. 0b10100111001), Octal (e.g. 0777) notation is allowed too but only without sign, decimal and exponential part." That seems both clear and concise, though the "finds whether the given variable is numeric" should probably be "finds whether the given expression is numeric". Unless "variable" is meant to refer to the function parameter?

What is the "essential characteristic" exactly? You need to be careful when you toss out terms like that.

The essential characteristic is that "it's a number".

And getType() could tell us it's a String even if is_number() "says" it's a number. Is one of them lying, or is it relative? Which one is ignoring this magical "essential" thing?

PHP is a Category D1 language (see TypeSystemCategoriesInImperativeLanguages) which means types are associated with values, so getType() is returns the name of the type associated with the value of its operand. For is_numeric(), if the type associated with the value of the operand is numeric (i.e., float or integer), it returns true. The is_numeric() operator knows that a string may encode a numeric value as a sequence of characters, so if its operand value is associated with the "string" type, it parses the string to see if it encodes a numeric value as a string of characters and returns 'true' if it does. Otherwise, if none of the above are true, it returns false.

But a D1 language could ignore the parsing step, such as what is_bool() does (looks at tag only). I and some language commentators believe that to be a mistake, but it is a language design option for a given operator nevertheless. They could have very well done the same with is_number(). The colloquial notions don't address these kinds of issues specifically and won't tell us exactly how is_x functions act in Php or any other dynamic language. QED. Give it up, you lost the colloquial argument yet again. Enough already, pull the plug on your bloated ego and accept defeat, for I'm tired of re-winning the colloquial argument over and over. My shelf can no longer hold all the trophies.

Sure, a D1 language can "ignore the parsing step" and why not? The author of a function can make a function do whatever he or she likes, especially in dynamically typed languages where function parameters are rarely statically typed. Whether some functions are a good or bad idea is a different issue entirely.

I'm not sure what you're congratulating yourself for, or what "colloquial argument" you think you've won.

I do agree that linguistic conventions could be setup such that language alone could perhaps be "good enough" for such issues and we wouldn't need XML models etc. However, that's not the current state of affairs. If you want to pursue that, fine, but until it's been presented and vetted, the tag model is superior because it relies far less on English.

That's impossible to judge, because although how popular imperative programming language TypeSystems work is described at the top of TypeSystemCategoriesInImperativeLanguages, there is no equivalent exposition of TopsTagModel. How can we compare in the absence of that?

Why would I want to copy your vague, poorly-written description or use it as a model?

Have I suggested that you should? It seems somewhat disingenuous to claim that your "tag model is superior because it relies far less on English" without a clear exposition of TopsTagModel so that we can test it to see if it's "superior because it relies far less on English". Since it appears to be spread over many pages, it's hard to tell whether it "relies far less on English" or not, especially as it appears you've expended a lot of English in an attempt to explain it.

Perhaps both approaches are sucking (or are WetWare mismatches). I didn't intend to write so much about it, but for reasons that escape me it's not sinking in with you. I at least attempt to model the processes of concern via reading and writing of a reference data structure (XML). One can see the data structure being analyzed via pointers showing what the model interpreter is "looking at" during a given step and/or changing the structure in a similar fashion. I don't know how to be any more explicit than that without writing a per-language or per-op interpreter that uses the reference structure. Would doing such be helpful?

That's what my WetWare likes to see: a reference representative data structure using familiar industry data structure idioms, such as XML or SQL, and step-by-step rules for how and when these change. The "home" of state information and the relationship to other state information is clearer that way because it's using known industry-standard sub-parts. Your use of English as a substitute confounds the hell out of me and appears inconsistent with other writers of typeness.

{Why it's not sinking in with us is quite clear. It's not sinking in because you won't tell us the important parts. E.g. the rules for setting up your model from a given language. Without those, we can't know how to map the language parts to your model's parts. You appear to have some rules since you've indicated indirectly that certain mappings aren't correct, but I have no idea what those might be. (Especially since those mappings still preserved the relationship between source, input, and output.)}

I cannot offer concrete rules, only suggested patterns of exploration such as "here are some things to try", because every language is different. If a newly-encountered language fits common and known patterns, then the suggestions will be of immediate help. No model can anticipate all possible languages. Sherlocking around will sometimes be necessary. This is back to the Trek planet analysis analogy. Thus, are you asking for concrete rules, or are suggested tests okay with you? If the second, most of them have already been given or are obvious extensions (such as testing more than just numbers and strings if a lang offers them). I agree they could be cleaned up and organized better, but you've already seen most of it multiple times but didn't seem moved by such the first, second, third, etc. time such that I doubt repetition number 4+ will be the magic repetition number.

A model has clearly-defined and identified parts and precisely articulated rules for how they are put together and how they relate to the real world being modelled. If you "cannot offer concrete rules, only suggested patterns of exploration", then what you have is a guide for exploring some aspects of a language. That's fine, and perhaps useful for studying language behaviour, but it's not a model.

Re: "(Especially since those mappings still preserved the relationship between source, input, and output.)" -- Please elaborate.

{It means that given the same code and input, I was able to produce an alternative mapping to the parts of your model that didn't break any of your stated rules and produced the same output as your model. (It did however result in the opposite conclusion.)}

I don't know what you are talking about.

{It's where you complained about shoving the quotes up the value on TypeSystemCategoriesInImperativeLanguagesTwo.}

Again, I was talking about parsing in an informal way there. Also, quotes are not part of the value: you put non-value parts in the value, making "value" lie. Stated another way, in the tag model you don't put declaration quotes (delimiters) in the value. If you want to make an alternative model where you do, be my guest.

{Like I said. You appear to have some rules for how to map the language to your model. Otherwise, how would you tell if the value is "lying"? And how would you handle the language described there where the quotes show up in the output if you concatenate in a different manner, if you don't put the declaration quotes in the value?}

Please illustrate with some pseudo-code.

{You haven't answered the questions yet. How can I illustrate it?}

I meant "concatenate in a different manner".

{Ok. Using the language where you complained about shoving the quotes up the value on TypeSystemCategoriesInImperativeLanguagesTwo. The following code produces the string "123""123" as output.}

 a = "123"
 alert(concatenate(a, a))

I have no idea why a language would do that. I don't claim the tag model (kit?) can model every language, but would need more info about this odd language of yours to suggest modelling approaches, including whether the tag model is a poor fit.

What's notable here is that the description at the top of TypeSystemCategoriesInImperativeLanguages has no difficulty modelling it.

In a language that only executes in your head. It's scramble-talk to me. That aside, you still haven't explained WHY one would want a language that behaves in such a way, regardless of how "types" are modeled.

It doesn't matter why. What matters is that the conventional model works to describe it, and yours does not.

Reversification. I went step-by-step through examples and showed how and when the data structures changed. You didn't.

In the conventional model, there is one and only one data structure change: Assignment to a variable replaces the variable's value with another value. That's it.

What about "a=3;print(a);a='3';print(a);" in tagged languages (like Php)? We don't even need to know or care if "the value" has changed under the hood. It's not necessary to model such a change such that one may choose to skip that in a model without consequences.

Yet, precisely what is happening is that the value in variable 'a' has been replaced with 3, and later replaced with '3'. What understanding about PHP is gained by trying to "skip that"?

Prove it.

(a) Do you really think PHP assigns a new value to 'a' in "$a=3;print($a);$a='4';print($a)" but not "$a=3;print($a);$a='3';print($a)"?

(b) What does this do: "$a=$x;$a=$y;"?

(c) See http://www.php.net/manual/en/language.variables.basics.php where it is written, "... variables are always assigned by value. That is to say, when you assign an expression to a variable, the entire value of the original expression is copied into the destination variable."

We are getting off track. What I meant is that "value" is ambiguous and that there is more than just the value being changed in "a=3;print(a);a='3';print(a);", but it largely depends on how "value" is observed/tested and defined in terms of such tests. If you say getType() "looks at" the "value", then the value has changed (is "different"). However, the output of "print" is identical. How does one empirically verify if the "value" is different between the first "a" and the second "a"? Forget the word "change", and focus on "different" instead. Does getType use the "value"?

No, in "a=3;print(a);a='3';print(a)" only the variable's old value is replaced with a new value. A value consists of a representation and a type reference, and gettype() only uses the value's type reference. That's why "$p=3;gettype($p)" and "gettype(3)" and "gettype(3 + 4)" produce the same result. Furthermore, "gettype(3 + 4)" can only be explained by the existence of values with type references, which corresponds with the definition of a value being the result of an expression, and with a value being composed of a representation and a type reference.

I'd like to rework your XML from TypeSystemCategoriesInImperativeLanguages a bit here for discussion. For one, I prefer the "parts" have named labels for clarity and discussion reference such that I'm making the "value's value" (for lack of a better term) be an attribute. Is this a "reasonable" representation to you?:

 <!-- Example PTX02, representation of a$='3' -->
 <variable name="a">
   <value type_reference="string" representation="3"/>
 </variable> 
The differences appear to be (a) converting the representation from content to an attribute, and (b) renaming the "type" attribute to "type_reference". I'm fine with that.

So if we use this structure to model variables, is it okay to say, "Some operators only look at (use) the type_reference attribute of a given operand and some only look at the representation attribute (or at least behave that way when analyzed)"? (I can't think of any that require looking at both, but won't rule it out just yet.)

Yes, that's fine.

Okay, so when I say "tag", I'm talking about the "type_reference attribute", and when I talk about "value", I'm talking about the "representation". Let's see if that holds up.

But then you're using a PrivateLanguage. Why not simply say "type reference" (or, as we usually do, just say "type") and "representation"? Also, what do you call the association of a "value" and a "type_reference attribute"? I call it a "value".

I believe we've been over that at least 3 times. Colloquial usage of "types" is vague and overloaded. You say it's not, I say it is. I don't want to re-re-re-re-re-re-argue that yet again repeatedly multiple times redundantly. Both side's arguments are based on anecdotal info about common/popular perceptions/beliefs and we are an at anecdotal impasse. Let is sleep.

Repeatedly, I have responded to your allegation that "colloquial usage of 'types' is vague and overloaded" with the same question: What problem does the alleged "vague and overloaded" use of the term "type" cause that use of the term "tag" solves? I haven't received a clear answer.

The "multiple simultaneous types" issue with isX() kind of functions, their difference/contradiction with typeName()-like functions, and the never-ending CFargument debate are examples.

There is only one type associated with the representation of a value. Values of other types can be encoded in the value. This is explained at TypeSystemCategoriesInImperativeLanguages.

"Encoded" is too subject to interpretation, per TypesAndAssociations.

What does "too subject to interpretation" mean, in this case? Can you give an example of how "interpretation" could be mistaken?


See the pointer diagram on page 98?:

http://www.tutorialspoint.com/pascal/pascal_tutorial.pdf?page=98

And more complex variations:

http://www.billthelizard.com/2010/10/sicp-23-rectangles-in-plane.html

Those kind of diagrams were very helpful at illustrating and modeling pointers in my opinion. "Types" needs something similar, and the best representation will probably split "type tag" off from "value", such as a two-chambered box.

Diagrams and notations are certainly helpful. Here are some two-chambered (where appropriate) ASCII boxes:

How's that?

That's a start, but I wouldn't call it "type" because "type" is overloaded as explained a jillion times already but you keep never getting such that we repeat the same argument over and over repeatedly multiple times redundantly. And why do you use the term "representation"?

Except that where I use "type", it is a "type" reference in every popular imperative programming language. "Type" will always be int or char or float or double or string or bool or some other type. Why does it matter if the term is overloaded? (Not that I'm saying it is overloaded, but let's assume it is.) How do you expect a model to be useful if it doesn't have recognisable parts?

I use the term representation because every value has a representation. In the vast majority of computer systems, it's inevitably a string of bits, but we can also consider it a string of bytes, words, characters, or whatever we like. The type reference tells the language internals how to interpret the representation, so that (for example) a bit string of 110110 with a type reference to 'integer' means it's an integer value 54, but the same bit string with a type reference to 'character' means it's the ASCII character '6'. A value is the association of a representation with a type. In other words, every value has a type reference, which we usually shorten to simply "every value has a type". In languages where all values have the same type -- usually a string of characters aka a character array -- the type reference may be implicit, as shown above in Category D2.

Re: "every value has a type" - Every value can have multiple "types" at the same time by some accounts. In fact, I'd say it's better to say it's up to each operator rather than try to tie it all to some universal central concept. Yes, there are common/typical conventions, but that's not good enough for rigid global declarations about "types".

Every value is associated with one and only one type in popular imperative programming languages, except in those languages that support multiple inheritance. However, it is frequently possible to encode a value of one type in another. For example, the string "true" can encode a boolean value, and the string "123.45" can encode a numeric value, and the number 1448 can encode an even number, and the number 13 can encode a prime number. This fact is fundamental to programming language operation, because a value of type string -- which happens to encode the source code for a program -- is parsed (according to the rules of a grammar) to identify the values of various types, expressions, statements, identifiers and so forth.

Sounds great, let's model it in a clear way now such that words like "encoding" are specific actions and visible state changes instead of just words.

The only state change is in assignment, where a new value replaces a variable's old value. An operator can be written to recognise any possible encoding in its operand(s). Both of these statements are true in all popular imperative programming languages.

Depends on how "value" is defined, per above.

The standard definition of "value" is that it is the result of evaluating an expression.

I mean in terms of something observable.

Sorry, not following you.


Note that I have been cavalier about including the dollar sign in some of my code snippet samples. -t

Sometimes, I wish the PHP interpreter were equally cavalier. What earthly reason did they have for requiring a bloody sigil?

{Really? You'd rather PHP showed a lack of proper concern?}

No, I'd rather PHP didn't use sigils. I should have written "cavalier" instead of cavalier.

I prefer Cuban cigils.

EyesRoll


SeptemberThirteen


EditText of this page (last edited October 9, 2013) or FindPage with title or text search