String Variables Considered Harmful

On WhenIsManifestTypingConsideredaGoodThing, AlistairCockburn wrote:

You will probably be surprised at the number of times future work shows your assumptions about the type of the variable to be incorrect. I've seen some very nifty program improvements where someone said, "Hey, this doesn't need to be just a String, in fact it can be anything that supports printing itself as a string!" The first programmer (would have) made a blooper by declaring that variable as "only a" String. -- Alistair

The problem here is not having to declare the type of the variable; it is declaring what implementation the variable is referring to, namely String or a subclass of it. This precludes the use of the variable for some other object which behaves like a String, but has a totally different implementation.

So, should all code be rewritten :-) to look like this:?

 interface Printable { ... }

 class String implements Printable { ... }
 class SomeObjects extends SomeOtherObject implements Printable { ... }

 public static void main(Printable args[]) {
     Printable greetings = new String("Hello World");
     System.out.println(greetings);
     greetings = new SomeObject( ...  );
     System.out.println(greetings);
 }

More generally, is having variables refer to a class, instead of an AbstractInterface, a broken hack? Discuss! :-)

-- AdvocatusDiaboli=FalkBruegmann

I hope not. SmellsLikeJava. I shudder at the thought of all code looking (smelling) like Java.

I certainly believe we'd be better off with something slightly more document-oriented, something like:

  interface Printer {
    push(Transform); // tab-offsets, ordered and unordered lists, 
                     // display classes for Printer stylizations 
                     // (bold, colored, italics, underline, number base, 
                     //  user-defined, etc.)
    pop(int N);      // 
    print(char);
    print(int);
    ...
  }

  interface Printable {
    print(Printer);
  }

  class String implements Printable { ... }
  class ConsoleOut implements Printer { ... }

One of the things that continuously annoys me with common string handling is keeping track of tab-offsets and other transforms. We have decent subsets of display primitives now... perhaps languages should be based off those rather than deriving from the delightfully stupid line printers.

If we want to be able to say

 String greetings = ...

and still avoid the problem mentioned above, the language would have to allow something similar to this:

 class String { ... }
 class MyString extends String { ... } 
     // inherits String's implementation and type (interface)
 class SomeObject extends SomeOtherObject implements String { ... }
     // inherits the type (interface) embodied in the String class,
     // but not its implementation

 public static void main(String args[]) {
     String greetings = new String("Hello World");
     System.out.println(greetings);
     String greetings = new SomeObject( ... );
     System.out.println(greetings);
 }

-- FalkBruegmann

Um... JavaLanguage does that:

 class SomeObject extends SomeOtherObject { ... }
    // inherits the necessary type from the Object class

 public static void main(String args[]) {
     Object greetings = new String("Hello World");
     System.out.println(greetings);
     Object greetings = new SomeObject( ... );
     System.out.println(greetings);
 }

near enough?

In fact, basic "string-like" attributes such as being printable are fundamental enough that the JavaLanguage puts them in class Object. If all you care about an object is that it is printable, you should accept Object instead of String.

OTOH, in more general terms, I think this page makes a good point; just the specific example is actually covered in Java already.

-- AndrewMcGuinness

Remember that in a declaration we reveal our intention so that the computer can reason more effectively about our program while it is checking for errors and generating code. How precise should our declarations be? In JavaLanguage, for example, we could create an interface for every method we ever implement and from these we could create interfaces for every combination of methods we in fact use. Then we could declare variables precisely by specifying exactly what messages they will be sent by choosing carefully from the tens of thousands of available types. Would such efforts be rewarded with flexibility, clarity and efficiency? I think not. Instead we are left searching for a compromise where the burden on our recall is balanced with rewards from the compiler. My own belief is that the rewards from actually running the program are sufficiently greater that I should allocate almost all of my attention to reasoning about that instead of optimizing types. -- WardCunningham

Note that ObjectiveCaml does declare a new interface for every function. It requires only the methods actually called in the function. This interfaces problem seems to be only in languages that are too ClassOriented?.

The above is not quite true in JavaLanguage. Since you can't specify the set of interfaces that a variable can be you would need to make a new 'type' that is the conglomeration of the ones you expect (as you suggest). However someone else might have the same idea and come up with a different conglomeration. The two types could implement the same set of interfaces and yet not be considered compatible by Java (the compiler will not even consider testing whether they are effectively the same type - since in Java it is the name that specifies the type and not the definition). This highlights one reason why dynamic languages are considered highly desirable; they tend not to muddy the distinction between classes, interfaces and types (both interfaces and types are usually implied and only classes are stated). The only static language that I've seen that offers some way through this confusion would appear to be HaskellLanguage (but I haven't dug down deeply there). -- PeterSumskas See also BenefitsOfDynamicTyping.

While that's an annoying limitation of Java, it doesn't feel fundamentally model-breaking. It would be perfectly plausible to allow a function to take (foo : IBar & IQaz) instead of just (foo : IBarAndQaz). Of course, this is basically just how OCaml works, iirc...

Java Strings are final "for security reasons." (A typical excuse and no fun!) Strings must be immutable and conform to the specification of the String class, or hostile code could change a String after it's been passed into the system. For example, the class loader probably does a check like this:

  if(className.startsWith("java.") { /* system class... */ }

The hostile code could override startsWith() to return something innocuous the first time it's called (or the nth time it's called), loading a system class despite the check.

An interface wouldn't work because it's inherently insecure. In fact, much of the reason interfaces (and subclasses) are useful in the first place is because they can contain callbacks to code that is unknown to the called method.

However, perhaps they should have had a SecureString subclass to be used when security is necessary. Most of the code we write isn't designed to be secure against hostile code in the same virtual machine.

-- BrianSlesinsky

Have a look at RubyLanguage. There, you just create or inherit a method called "to_s", and your class is printable. -- DougKing

Or PythonLanguage. There you create or inherit a method called "__str__". -- ElizabethWiethoff

DuckTyping to the rescue.

I do wish that JavaLanguage and CeePlusPlus string types supported an interface that I could implement myself. That way I could choose whether to use strings or string-like things in my code on a case-by-case basis. -- PhilGoodwin

Since Java 1.4, CharSequence provides exactly that. See below.

As has been pointed out, all objects in JavaLanguage know how to convert themselves toString(), so one approach here is to pass objects around and stringify them only at the last minute.

Another solution, in 1.4 or later, is to use the CharSequence interface. The docs say "A CharSequence is a readable sequence of characters. This interface provides uniform, read-only access to many different kinds of character sequences." Standard classes implementing CharSequence are String, StringBuffer and CharBuffer, and you can easily write new ones.

-- AdvocatusDiaboli=FalkBruegmann

More generally, is having variables refer to a class, instead of an AbstractInterface, a broken hack?

In my opinion, yes. What you want to say is usually: "I need an object that I can use these methods on." When you say, instead, "I need an object of this specicfic class or one of its subclasses.", you're being too restrictive. Of course, it's possible that you really _do_ want the object to be of a specific type, but, in practice, people tend to get it wrong more often than they get it right.

I read one of these short stories here at c2 once, but I can't find it now. I'll retell it, and then maybe someone knows the original:

The novice comes to the Master with a short program: { TreeList asdf = new TreeList(); blabla ... }

He asks the Master: "This is a short and simple program. Surely nothing can be improved?"

The Master looks at the code and changes it: { List asdf = new TreeList(); blabla ... }

The novice was enlightened.

This delightful story certainly deserves more linking: TaoShowedMeTheWay.

This novice, however, remains unenlightened. Without at least two similar subclasses that can be held in a common collection, I see no reason to create a lower level abstraction. The additional level of abstraction shown above creates clutter without value.

If you reference only the interface, you not only ensure that your program is coupled to only the interface, but you also document far more clearly in your code that the behavior you need is generic to any list and TreeList is just the implementation selected (one could improve the code further with DependencyInjection selecting the best of known List implementations based on some annotations for performance or call profile and multi-thread safety and persistence and so on... NewConsideredHarmful, too.)

In answer to the original question: I would answer a provisional "yes". It's a smell, if not a 'broken hack' for variables to reference implementation rather than interface. That said, you can't escape it in some languages (e.g. C++), and manifest interfaces introduce some difficult coupling (e.g. compare Scala and Java, or VisitorPattern's Visitor+Visitable binding pairs).

What would be a practical usage of string-like interfaces? In other words, what kind of practical problems triggered this topic?

I often wonder why Java's toString, for example, has to return a string, rather than get passed a CharacterBuffer? of some sort. It's fairly rare that I actually want something in a string; It's usually just there so I can throw it through a socket or into a file or whatever right afterwards. -- ScottMcMurray

CategoryJava CategoryInterface