Safety And Expressiveness Discussion

From IssuesForLanguageDesigners:

Quoted from below: "I think, however, that it is a relatively uncontroversial assertion that the amount of code in any nontrivial project that needs to be written in an unsafe language is a very small proportion of the total, if it is not zero. I can't prove this, but it has always been true for me in 15 years of doing system programming"

I think this is a starting point of possible agreement, since I agree that one does not need 100% of the code to be unsafe even under the worst of circumstances.
Word to the wise: just skip the temptation to say "in N years of doing..."

Quote from below: "I'm not sure what you're asking for at all. It sounds like what you're asking for is a demonstration that any possible program that could be written in an unsafe language, say C, could also be written in some safe language. You're not going to get any such demonstration: it isn't true, nor does it need to be true"

True. I merely have reservations about you going off and converting your project, whatever it is, to Cyclone, and then saying "ok, that proves the point, nothing left to say" - because although it would be interesting, doubtless there would in fact be more to say.
- I intend to give a more detailed account of how the conversion goes on the CycloneLanguage page. But it will probably be a few weeks before I can get round to trying this.

Quote from below: "The original poster put OCaml in the list of "expressive" languages, and it is also safe. So you can indeed have both."

FYI I wrote that (but obviously did not mean the sweeping conclusion you're taking from that). I originated and wrote most of the text on IssuesForLanguageDesigners, in fact. Not sure if that alters your perception of where I'm coming from. -- dm

Trying to refactor this interesting debate, hope to get some clarification. -- Costin

Let's stipulate that this discussion is in the context of system languages where efficiency is considered paramount. It doesn't make sense to discuss the trade off between safety and expressiveness in high level languages because languages like Scheme, Smalltalk, Java, etc have both.

No, actually, I disagree. High level languages have a huge variation in expressiveness. Agreed --dh. See discussion on PrincipleOfLeastPower and linked pages to see what's up on the topic of "expressiveness".

Now how efficient should a language be to be considered in this context is another matter for debate. For example OberonLanguage and its successor Active Oberon have successfully been used to write OS (including the drivers and all that), whereas they are clearly safe languages (compared to C), and they are at least as expressive as C. Also it puzzles me in what sense is Ada less expressive than C. Also, to correct a matter of fact, ObjectiveCaml is unsafe in the sense that syntactically legal programs can give segment violation (see OcamlTypeSafetyProblem). Of course a modified ObjectiveCaml could avoid by inserting appropriate run-time type information and checks, but it wasn't the choice of Ocaml designers and we don't known if such a modified run-time can perform as well as the current one to be considered (at least theoretically) as a candidate for system programming.

And this is one of my points: "safe" can mean a variety of things.
On some of the type pages people have claimed that "safe" means 100% guarantees from the compiler at compile time, ruling out various kinds of dynamic safety, ruling out type systems for which subtyping is not decidable in general (unlike HindleyMilnerTypeInference), ruling out strongly typed languages that have any way of escaping strong typing and claiming that they just plain aren't strongly typed (which strikes me as a terminological problem driven by ideological purity)
- Agreed.
Efficiency is sometimes important - usually important for system programming - but it can be accepted for the sake of argument that OCaml and Cyclone are generally efficient enough. The ObjectiveCaml page points to a post concerning OCaml speed vs C, which says that simple direct solutions in OCaml are very fast - competitive with or a bit faster than simple direct solutions in C - but that nonetheless "To be honest: given a specific algorithm and hand-tailoring it an eternity you will in most cases be able to beat OCaml and other functional languages using C. However, to come even *close* to the average OCaml-program requires an effort you definitely do not want to pay. Some aspects of C++, for example, (e.g. STL-I/O-libraries) are indeed slow compared to OCaml." <http://groups.google.com/groups?oi=djq&ic=1&selm=an_571405096>. This is a very plausible claim, because there are many programmers who know how to make things run very fast if they work long and hard enough at it. Compilers are not smarter at this than programmers, but compilers are vastly faster and get quite reasonable results.

So let's try to identify precisely language issues where the triangle Safety/Expressiveness/Efficiency involves a trade-off. Maybe give each of them a separate wiki page.

Garbage collection, versus malloc/free, versus middle ground (partially on stack, partially gc, regions, etc). Is this the corner issue?
- Good.
Use of uninitialized data, especially uninitialized pointers, or invalid pointers (pointers that are used after a free() ). Apparently there's no real justification for allowing in a language access to uninitialized individual pointers. Languages like Ml (et. all), NiceLanguage, CycloneLanguage prove that. If the language should prevent pointer usage after a free (in case of manual memory allocation) looks that it may be debatable although Cyclone have a strong claim to the contrary. Also an array of pointers will need to be automatically initialized in a safe language (to NULL or something, to a valid reference in ML and family) and this is a potential source of inefficiency [does it really matter that much?], Or else there's a recent proposal based on linear types to prove that an array will be used only [? incomplete sentence].
- Yes...except note that there's the potential situation of languages that allow variables to be conditionally introduced dynamically, and it may be TuringComplete to determine lack of initialization without extra mechanisms like single assignment. (I'm not advocating anything, I'm just enlarging the scope of this issue.)
  - Just require the variable to be null if it hasn't been definitely assigned. This isn't a complex problem.
array bounds checking. Is the trade-off still valid? Can a special syntax for cycling through arrays be enough to solve a lot of cases. There will always be some algorithms that may jump through arrays. How expensive bounds checking really is in modern processors?
- Good example. There are always some cases where you definitely don't want it to happen (never say never; my trump card is inner loop on weather simulation), and ideally have verified that the index will never go out of bounds, whether the compiler believes that to be true or not.
pointer arithmetic, unlimited integer to pointer conversion, etc. It doesn't look that it's relevant anymore.
- Mmmm...I think that's overly hasty.
  - Setting up a pointer at a particular machine address - for example to do memory-mapped I/O - can be useful, but this kind of thing should be considered part of the "very small proportion" of code that doesn't need to be done in a high-level language.
Anything else? TypeInference looks like a side issue here.

Safety Versus Expressiveness

Is the language primarily going to protect programmers from themselves by making certain categories of mistakes impossible (Java, Ada, Pascal, BondageAndDisciplineLanguages), or is it primarily going to be as expressive as possible to maximize the power of the programmer (Lisp, Ocaml, C/C++)?

There is a possibility that this trade-off is not necessarily inherent, and that a way to have safety without losing any expressiveness (even in terms of systems programming) may be found. The experimental CycloneLanguage is such an attempt.

Part of the reason that this is the traditional tradeoff is that expressiveness is often valued by self-styled pragmatists, while safety is most often stressed by self-styled purists/ideologues; the latter have frequently said quite explicitly that loss of expressiveness is acceptable, or even desirable, if it leads to greater safety, while the former camp have frequently explicitly expressed the opposite.

This means that attempts to have maximum safety combined with maximum expressiveness have been very much in the minority compared with efforts to simply trade one for the other.

It's also true that it is relatively rare for maximum-safety advocates to address e.g. SystemProgramming needs.

[...]

Safety versus Expressiveness is a false dichotomy - you can have both. Compare ObjectiveCaml with CeePlusPlus: OCaml obtains expressiveness without compromising safety, while C++ obtains it by throwing away safety. The latter is just bad design. -- DavidSarahHopwood

It is not a false dichotomy, you are just blinded by the shining light that is the glittering pure perfection of OCaml, the single exception to this, and all other tradeoffs. Faster than assembler, more OO than Smalltalk, more expressive than English, safer than a mother's kiss - what's not to like?

But other than OCaml, yes, it's a tradeoff. You utterly misunderstand C++, for instance. I know it's everyone's favorite whipping boy these days, and there are things I hate about it myself, but the point of C and its direct descendents is systems programming, and the lack of safety in the C family is not bad design in that regard, and anyone who says otherwise is an ideologue, as is proven by the simple fact that there is no adequate replacement language for systems programming.

In general, I mean. Things like gets() should obviously be removed from the language.

Systems programming does not necessarily require loss of safety. I write embedded programs all the time, and 99% of the code does not make any necessary use of unsafe language features. The 1% of the code that does is written in assembler, not C or C++ (in fact, it couldn't be written in C/C++).

So? Lisp and Smalltalk have been used for systems programming in certain environments, that doesn't make them generally appropriate.

As it happens, most of these programs are written in C. But that is not because of any technical attributes of the C language; it's because of the lack of support for higher level language implementations that target the embedded systems I use. I've been seriously considering switching to CycloneLanguage, which is also both expressive and safe (and which can cross-compile to and interoperate with C, solving the compiler and library support problems). -- DavidSarahHopwood

So hypothetically other languages work, but for whatever reason, you use C. Not a strong case.
- I explained why I use C. It has nothing to do with language design, it has to do with the fact that many embedded platform vendors expect you to use C, and won't support anything else.

If you look, not at OCaml (which, despite my sarcasm, is unusual in some regards), and not at C++, but at Java, what you see is a language whose major selling point is safety, which it achieves by making many dangerous operations illegal. Pointer manipulation, for instance. Does expressiveness suffer because Java lacks pointers? Hell, yes; people get used to it, and many enjoy and approve of this, but there are many famous classic algorithms that can't be directly written in Java; one must do little workaround idioms. The question isn't whether that is good or bad, the point is just that it is a tradeoff.

Any differences between pointer arithmetic and array indexing are completely trivial; I don't consider that to be a loss of expressiveness (anyway, CycloneLanguage does support pointer arithmetic).

This isn't about your tastes, it's about what the issues are. There are many algorithms (even in Knuth) where array indexing cannot substitute one-for-one with pointers, and then you have to start doing workarounds, so this is disingenuous.
- Array indexing can always substitute for pointer arithmetic. (Note that AnsiCee effectively defines any use of pointer arithmetic that is not equivalent to array indexing as UndefinedBehavior.) Whatever; Cyclone is an absolutely safe language, and it does support pointer arithmetic. So it is not the case that a safe language must incur a loss of expressiveness as a result of not having pointer arithmetic, Cyclone being a counterexample. This is not a matter of opinion.
  - Don't be ridiculous! Not only is it a matter of opinion, you haven't backed up your assertion, nor can you, because you are incorrect. Cyclone, for instance, is only hypothetically useful - to date it does trade safety for expressiveness! See URLs I just put on that page. Thus it doesn't serve as an example, only as a hypothetical that remains to be proven if they ever fix their existing problems.
    - As I've already pointed out, this page is about language design. Whether any particular implementation of a design is entirely cooked yet is beside the point.
    - That should be my line. The point is that you claim that there is no tradeoff between expressiveness and safety, and that this is a matter of fact, not opinion, and when pressed for proof of this claim, you cite Cyclone - which makes a far larger claim about it than even its inventors make. Cyclone doesn't prove anything of the sort. Its flaws make it unsuitable for current general use, which means there's a tight bound on its expressiveness, and therefore it does not serve as an example, which leaves you with no evidence on this page of your thesis. Basically I think you should just say "someday we will have a safe language that doesn't sacrifice any expressiveness in any area of systems programming" - i.e. state it as an opinion about the future. In which case I think I would cautiously agree, although I don't think it's a sure bet.
    - What I said was: "Safety versus Expressiveness is a false dichotomy - you can have both." The original poster put OCaml in the list of "expressive" languages, and it is also safe (excluding some specific library modules such as Marshal). So you can indeed have both.
    - What flaws are you talking about, specifically? The design of Cyclone is not unsuitable for general use. The argument relating to garbage collection on the CycloneLanguage page is not convincing at all. (Type systems don't have to be complete, anyway.)
  - And as for array indexing, note that if you implement linked lists in Java, you do so with pointers, regardless of whether Java wants to call them that or not (see related discussion under JavaPassesByValue).
    - This has nothing to do with the issue. "References" have been called that since Algol, and yes, pointers are references. A typical implementation of a simple linked list in any language would not use either pointer arithmetic or array indexing, so I don't know why you brought this up. DeleteWhenCooked.
  - And in other languages: an array index can't meaningfully reference another array, so to completely substitute indexes for pointers, you have to e.g. screw around with objects that reference both the base of the array in question along with the index into that array - which is a workaround - which is absolutely in line with everything I've said, but contradicts your points.
    - Yes, the simulation requires a "fat pointer" object/class. This can either be built into the language (as Cyclone does), or in object-oriented languages with overloading, a user-defined fat pointer class can be made essentially as easy to use as a built-in construct.
  - {May I suggest that you're responding to a different thing here? Your interlocutor mentioned "pointer arithmetic" while you gave an example to disprove "substituting indexes for pointers". Not the same thing.}
    - Hmm. Interesting point. I was assuming that the phrase was intended to mean general pointer manipulation, but if in fact addition and subtraction with pointers was what was meant, that is certainly a narrower subject. In C, array indexing is exactly the same thing as adding the scaled index to a pointer, so this then gets into issues where questions of terminology cloud the underlying issues. In any case, my point remains; attempts to remove pointers from languages decreases expressiveness, whether that is liveable/desirable or not. Cyclone introduces multiple kinds of pointers in order to attempt (not wholly successfully) to retain expressiveness but enforcing safety.
      - What is unsuccessful about the way Cyclone does this? Cyclone supports subtyping between pointer types in precisely the cases where it is statically safe.
      - It is a misconception among many C programmers that C allows general manipulation of pointers as if they were addresses. It does not. Pointer arithmetic in C is defined only for cases in which the resulting pointer points within an allocated object of a compatible type. In other cases, the result is undefined. If you use pointers as if they were addresses, you are relying on properties of the C implementation, not of the language.
  - You seem to keep trying to substitute superficial knowledge of theoretical subjects and hypotheticals for experience. Won't wash. You can't out-intimidate me; I've got theory and experience both, so bullying tactics won't work. Just chill out and discuss rather than throwing your weight around. So far you haven't explained much, mostly only made assertions. If you are sure you are right, explain why, and if it checks out, I will admit I was wrong and you are right. I am more interested in learning than in sticking to a hopeless position.

Type safety is another such issue; it is achieved by preventing the programmer from doing certain kinds of things. Functional languages rely on HM type inference, chosen because it is not TuringEquivalent, meaning it is decideable, meaning it prevents programmers from certain areas of expressiveness.

This is not a controversial issue. It's an old, well-known, tired issue.

The fact that it's an old issue doesn't mean that the most common opinion about it is right. If I can have both expressiveness and safety by using OCaml or Cyclone, why should I use C++? For the area to which C++ is supposedly targeted - systems programming - I don't find the design decisions it embodies to be well-motivated at all.

So what? This page is about issues, not one person's judgement about the one true way to settle any given issue.
- Saying that it is an "old, well-known, tired issue" suggests that there's nothing new to discuss. That is clearly wrong; in fact it is a controversial issue, highly relevant to the design of new languages.
  - Ok, I didn't mean it that way. I already explained what I meant: I want some evidence to support your assertion, not just the bare assertion itself.
The fact that it is generally accepted to be a tradeoff means that you have a much heavier burden than just asserting to the contrary: it is appropriate for you to say why so that we can judge whether the standard industry opinion for decades has nonetheless been incorrect. It is pointless to leave out discussion of such a thing. Give enough info so that we can verify and learn.
- I agree. I will try converting some of my existing C programs to Cyclone, as a practical test of how expressive it is.
- That sounds cool! But that's not what I'm asking for, you understand. Anecdotal reports of what does work is not the same thing as an assessment of the whole universe of what doesn't work.
  - I'm not sure what you're asking for at all. It sounds like what you're asking for is a demonstration that any possible program that could be written in an unsafe language, say C, could also be written in some safe language. You're not going to get any such demonstration: it isn't true, nor does it need to be true.
  - I think, however, that it is a relatively uncontroversial assertion that the amount of code in any nontrivial project that needs to be written in an unsafe language is a very small proportion of the total, if it is not zero. I can't prove this, but it has always been true for me in 15 years of doing system programming.
- Nonetheless, if you have success, that would be encouraging, on an interesting subject - again, not the same thing as thorough evidence on the topic at hand (I favor Lisp for some things...)
- Which reminds me, since you seemed to be an OCaml advocate, the other day I left a note on your homepage asking you to talk about OCaml some more, but I haven't noticed any reaction to that. I've studied some Haskell but haven't gotten to OCaml yet.

Anyway, this page is about issues for language designers - i.e. of new languages. Someone designing a new language should certainly look at OCaml and Cyclone as examples of the degree of safety and expressiveness that are simultaneously achievable, not C++.

Certainly. So you should discuss that. Probably on their respective language pages, and then give pointers here. I've already asked you for more info about using OCaml for systems programming, but I guess now you're saying that Cyclone is actually more appropriate.
- The region-based memory management used in Cyclone is more appropriate. In a new language design, that approach to memory management would be relatively independent of other features except for the type system: it requires TypeInference to be practical.
- TypeInference is generally a good thing, certainly in new languages. But elsewhere on IssuesForLanguageDesigners I had previously pointed out the tradeoffs involved with it.

See also spin-off page OcamlTypeSafetyProblem.