Why Type Syntax

From WhatAreTypes:

Type structure is a syntactic discipline for enforcing levels of abstraction. - JohnReynolds.

A type system is a syntactic method for automatically checking the absence of certain erroneous behaviors by classifying program phrases according to the kinds of values they compute. - BenjaminPierce, TypesAndProgrammingLanguages

I want to nip this one in the bud once and for all. Why do these guys focus on "syntax" for their definition of "types"? Syntax is just decoration for humans. Internally, programs are data structures and can also be viewed as data structures and programming can be done in data structures without "syntax", yet still have "types".

Open and shut logic right there. The connection to "syntax" seems highly silly, arbitrary, and limiting. Yes, it can probably by viewed as syntax, as there are many EverythingIsa to base models off of, but that seems like a poor starting point. -- top

The syntax is there even when you haven't labelled it as such.

Even data structures must have "syntax" in order to have meaning. Otherwise, you've nothing but a pile of bits (See also ThereIsNoObject)'. Specifying a syntax is about specifying the meaningful configurations or orderings of elements. You'll often see discussions of the separation of syntax and semantics. A distinction is being drawn between an abstract concept of meaning (semantics), and a formal means of representing that meaning (syntax). Data structures have meaning, but to make that meaning concrete, to represent it in some fashion, you have to impose an order on it that reflects the meaning - and it doesn't matter whether the elements of your syntax are words and punctuation, or bits, or electrons, or very small rocks. The specification of syntax is intended to be limiting. Without a scheme for what is and is not a meaningful arrangement of your elements, you cannot represent information.

There is no such thing as typeless data. If there are arrangements of data that carry meaning, then those arrangements can be assigned types. If you can recognize or devise a system for composing meaningful patterns, then you can specify how new, potentially meaningful patterns can be formed. Without such a system, you'd be left rearranging bits at random to see if anything meaningful emerged - which just means that you'd have an internal syntax, in your head, that you're matching it to and that you have not tried to describe.

So, there's a late-night shot at explaining it. I hope it made some sense.

I suppose that meets your definition of an "everything is a" model. But isn't that what much of human endeavor is about? Trying to find ways of describing what is. In this case, syntax is an attempt to find systematic ways of describing what are to us meaningful patterns - and then taking that understanding and synthesizing useful formal systems with it. Calling a model or technique "too broadly applicable" can be a compliment, rather than a refutation. -- DanMuller

I don't mean to be rude, but I don't find your explanation has enough precision to be useful. Anything with "rules" can be syntax by your definition, it would seem. Further, if the "meaning" is in somebody's head rather than something objectively measurable, then good bounded definitions are going to be very hard to come by because they are tied to personal psychology instead of math and logic.

What is the difference between a data structure with "syntax" and one without "syntax"?

Further, if they are interchangeable views of the same thing, why doesn't somebody write a data-structure version of Reynold's work?

I would love to see an example - does WhyTypeSyntax imply that there should be no syntactic sugar in languages to assist in the implementation of types? -- AnonymousDonor

No rudeness perceived.

Any rules for structure could at least be considered a candidate for the label 'syntax', yes. Consider where the term originally comes from -- the analysis of natural language. Nobody sat down thousands of years ago and decided what the structure of a natural language was going to be before it was spoken. Syntax is originally an attempt to find rules and structure in things that already exist. (I'm usually leery about taking terms from computer science and relating them explicitly to their original, non-technical meaning. This can often be misleading. But here, I think it's applicable and potentially instructive.)

You're asking circular questions: "What is the difference between a data structure with "syntax" and one without "syntax"?" As I said above, the latter doesn't exist. We don't call random collections of bits "data structures". They have to carry meaning in order to be dignified with the term "data structure", yes? What determines whether they carry meaning, top?

You wrote: "... if the "meaning" is in somebody's head ..., then good bounded definitions are going to be very hard to come by...". Exactly. You are agreeing with the people who describe types in terms of syntax. The entire point of describing a syntax is to avoid this situation; we aren't satisfied with "this looks like a valid data structure to Top and Dan, so it must be a legitimate arrangement of bits". We instead would like to find some rules, preferably simple ones that are easy to reason about in small steps, for how such legitimate arrangements of data are constructed. Syntax.

To write a "data structure version" of Reynold's work, you would have to first devise a way of describing the data structures, yes? And there would be rules and patterns to that description, would there not? Would you then object to calling that collection of rules a syntax?

Where you go desperately wrong is earlier: "A "program" can exist without syntax". It cannot. At the very, very least, a "program" is data arranged in a way that can be executed by a hypothetical processor. The rules for that arrangement are syntax. It's that simple, and yes, the concept of "syntax" is really that general. The arrangement of data into meaningful information always involves syntax; the question is simply whether we choose to explicitly define (or discover) that syntax, or not. There are obvious advantages to making the former choice, as you yourself implied.

I think perhaps you are trying to take a too-narrow view of what "syntax" means. Declaring that you will not use syntax does not eliminate the need for structure and rules of composition. Those rules are then an implicit syntax. Hiding the syntax in this way doesn't eliminate it, it just makes it harder to reason about. -- DanMuller

I appear to have been writing a response while DanMuller was writing his, and it covers the same ground, but I'll toss it in as an additional data point (and because I don't want to throw away the work I've done! :)

In general, "syntax" is a term for representation of rules that express the allowable manipulations of objects, entities, or symbols. It is not limited to text, though the linguistic origins of the term tend to imply this. Of course, even human languages are not limited to text -- sign language and native American whistle speech, for example, both have syntax.

A "data structure version of Reynold's work" would be an implementation of a type system. Since such an implementation would necessarily have to be expressed -- i.e., there would have to be a physical representation that might be visual, textual, or even mechanical -- it would exhibit syntax. As noted above, syntax is not limited to text -- a graphical programming language, for example, has a syntax even though it is not text-based. However, since a formal textual language may be the most effective way (at least for mathematicians) to appreciate and validate syntax, it is common to represent systems in terms of formal, abstract, text-based languages even if that is not how the system will ultimately be implemented.

As for "meaning," I would suggest not using the term as it implies a component of human understanding that is essentially not relevant. That appears to be what you've inferred above. Only the rules are relevant. It is better to think in terms of semantics -- i.e., allowable manipulations or rules which express the relationships between symbols (or whatever, including data structures) in terms of mathematics and logic -- rather than human interpretation or "meaning." -- DaveVoorhis

I would quibble - not outright disagree - with your last paragraph, Dave. Meaning is important to the discussion because it's what motivates the entire activity. Yes, it's a vague term, and that's why we try to describe syntax independently of it, because reasoning about vague terms leads to vague conclusions. But ultimately, if we find that a formal syntax and/or semantics can't describe the concepts, or meanings, that we wish to reason about, we have to go back and revise them. It's important to understand why people build these formal systems that seem at first glance so far removed from what practitioners care about, so that the practitioner can understand why those formal systems exist and why they're actually useful - and so s/he can understand when it's legitimate to question a formal system as being perhaps inadequate to their needs.

The distinction between syntax and semantics is a simplifying convention, attempting to separate meaning from structure in order to make structure simpler to work with. We put the meaning into semantics, the formal structure into syntax, and typically allow semantics to identify syntactic constructs which, although valid according to our simple syntax rules, are not "meaningful" (have ill-defined or invalid semantics). And - no surprise - semantics are notoriously more difficult to describe than syntax, usually relying ultimately on informal or semiformal narrative. We push the problem out a little farther, but we don't eliminate it.

I mention this only because I fear that if you dismiss "meaning" too lightly, you alienate exactly the audience that questions the usefulness of things such as formal syntaxes. I think most of this is in the category of assumptions that usually go unspoken, yet often leave us non-academicians feeling a bit mystified by formal systems.

-- DanMuller

Well, I think the meaning of "syntax" becomes too diluted and overloaded if any structure with rules is called "syntax". To me, the street meaning implies that syntax has to be parsed, while a data structure is already parsed and things are put into "cells" or records. "Atomized" is perhaps a way to describe a data structure versus syntax. A string of syntax does not have built-in atomization. Parsing is needed to produce atomization. I think "syntax" is not the best word for what you are trying to describe. "Structure" may be a better word.

Further, if we view programming languages and/or interpreters/compilers as data structures, then "types" are almost always some kind of "side flag", and we are back to my "flag model" of typing that has been rejected so violently by some Reynold fans. There is a "value" cell and an adjacent "type" cell(s) that tracks which type or type set the value belongs to or can go with. Thus, what you are saying still leads to the flag model it seems.

-- top

Sorry, you can't apply "street meaning" if you're going to opine about academic papers and ideas. You have to use the same terminology as the theories you critique, otherwise communication is impossible. You've made the effort to look at the theory; make the effort count for something by making the effort to understand the terminology.

Then what is the formal definition being used? I cannot find anything that is falsifiable. A definition that is not falsifiable is probably useless (beyond introducing vague notions).

What you're describing is a mechanism for type-tagging. It says nothing about how to identify what types are, how new ones are constructed, or what they mean. It's a far cry from a 'type system' in the sense used by the references at the top of the page. -- DanMuller

The "type system" is merely a regular data-structure traversal to test for substitute-ability. In single inheritance, it is usually a tree-search. As far as "what they mean", that is in the reader's head. If you want to play with specifics, see RunTimeEngineSchema. -- top

I took a brief look at it. The structure imposed by the relational schema implies a syntax - a set of rules for what can and can't validly be composed as a type. If you were to write it out as a syntax using common notation, you might find people more willing to have substantive discussions about it, since most people would find it easier to understand. It would be an interesting exercise, no doubt. You'd probably even find it easier to compare to other discussions on types yourself, since you'd then be recognizably similar notation. -- DanMuller

I find it more precise to talk about it as a relational structure because the rules of relational are generally universal and consistent. Reynolds seems to waste a lot of time inventing a new (confusing) notation/math for his purposes. I don't need to do that because I am piggybacking on relational. If you don't want to view types and compilers that way, then we will probably have a hard time communicating. Some people are linguistic thinkers, others are mechanical/visual thinkers. Presenting types to the widest audience may require using both views. Further, viewing it from a relational perspective may offer insite not readily visible from a notational perspective. It is like seeing your house from an airplane when you have only seen it from the side before (or visa versa). -- top

When you walk into a room full of Chinese FengShui experts to talk to them about their field, do you suggest to them that they all learn English first because it might broaden their horizons?

If they want others in the western world to know and understand their concepts, they should. In practice they often don't because that would invite scrutiny and competition. Hopefully, scrutiny is welcomed here.

I'm quite familiar with relational theory and structure, and yet cannot fathom your attitude. Notation for describing relations and constraints is in no wise standardized, and most forms I've seen have been anything but compact, do not make recursive relationships (which abound in a problem area like syntax description) immediately clear, and rarely include descriptions of constraints other than foreign keys, which are a critical component of a schema. Although it can be both instructive and ultimately practical to cast data in relational form, it's not a form well-suited for this kind of communication, which may speak to why the OO crowd prefers object wrappers. This is a problem worthy of solving in itself, perhaps coming up with a good, compact notation for schema including constraints, but it's a separate problem. And guess what? The solution will require devising a formal syntax! -- DanMuller

Clarity is what I am seeking, not compactness here. If Reynolds needed a whole book to write a definition, it can hardly qualify as compact anyhow. I doubt constraints would be a big part of the solution, but if they are they are usually easy to specify in pseudocode. Example:

  INSERT, UPDATE constraint: A > (B or C);
I doubt a relational-based explanation would take up a thick book because one is not reinventing an approach to describe something. It is called "reuse". And, of course the "OO crowd prefers object wrappers". They wouldn't be OO fans otherwise. The OO view of types still points to "flag theory" anyhow. The class of a variable is the "side flag". All roads are leading to Rome. (RunTimeEngineSchema makes no distinction between types and classes, I would note.) -- top


EditHint: Merge with AreTypesTiedToSyntax (I lost topic, but found it again)


AprilZeroSix


People's minds categoryze & group stuff, in real world.

Types exists without syntax, but, special syntax to declare types in a programming language, allow software developers to express, refinex, extend, implement & apply that concept in a software application.

The categories in my head overlap, and so do real-world, or at least useful categories. Something akin to SetTheory is a better model than "type theory" as often found in existing languages because it's multi-dimensional without ugly contortion. Categorization is useful, and probably necessary, but how to represent and manage it is the hard part.

{The limitations you identify are not of TypeTheory, which abstracts the notion of types in general. The limitations you identify are of particular software implementations of the notions described by TypeTheory. There is nothing that precludes invention and implementation of TypeSystems based on SetTheory, CategoryTheory, or You-name-itTheory. I've seen a fascinating experimental TypeSystem that could easily be described as "multi-dimensional without ugly contortion". I was shown an example using it that described a chess board and pieces (including movements and colour) as types, completely and accurately, in not many more lines of code than there are kinds of pieces. The fellow who invented it told me it was inspired by his experience working in library science.}

Possible, probably. EverythingIsa usually is. Whether most developers would actually relate to it sufficiently is another story.


CategoryLanguageTyping, CategoryTypeTheory


EditText of this page (last edited June 14, 2011) or FindPage with title or text search