Strongly Typed Collection

Most object-oriented languages and frameworks contain classes for managing collections of objects. In many cases, the collections provided by the language or framework will allow any object to be inserted into collections, and when objects are accessed in the collection, they have only the generic "object" interface and must be explicitly downcast to a more specific type.

This leads to a few common problems:

Therefore,

When using collections of objects where all elements must support a particular interface or be of a particular type, define a StronglyTypedCollection class that only allows objects with the necessary type to be added to the collection, and which automatically downcasts the objects as they are accessed.

Implementation of such a collection usually consists of wrapping or subclassing an existing collection class with functions that perform the necessary type checking and casts. It is also desirable to support standard collection interfaces for iteration, serialization, etc., so that the type-safe lists can also be used by code designed for generic collections. Some frameworks are designed to support this (for example, DotNet provides the CollectionBase and DictionaryBase classes for this purpose).

Some programming languages provide support for ParameterizedType?s or GenericTypes. These languages allow programmers to define something like a collection of objects of type T, and then substitute any type for T. Such facilities provide static type checking and sometimes automatic type inference. C++ templates can be used for this purpose.

Strongly typed collection classes tend to all look alike, except for the names of the types involved, so it is easy to automatically generate them using programmable editors, with a programming-language facility such as macros, or with a utility designed for the purpose.

When replacing a generic collection class with a new strongly typed collection class, look for opportunities to utilize the MoveMethod refactoring to move generic collection-usage code from the clients into the new class.

But,

In a language like Smalltalk, where there is no static type checking, strongly typed collections are less necessary, and may even be detrimental to the intended use of polymorphism.

Additional classes add overhead. Extra coding is required, and program size increases. People who create collection classes are likely to try to support every possible collection operation, rather than only those that are needed. Adding all this code simply to provide type-safety may seem like overkill in many circumstances.

This can lead to a lot of extra work while refactoring (as is typical for languages with static type checking).

Example:

This is a simple C# collection class that allows only objects of class Person to be inserted. It implements the standard .NET IEnumerable interface, allowing programmers to write "foreach (Person p in aPersonList) ...". (One might also implement the standard ICollection interface if it is useful.)

 public class PersonList: System.Collections.IEnumerable
 {
     private ArrayList innerList = new ArrayList();

public void Add(Person aPerson) { innerList.Add(aPerson); }

public void Remove(Person aPerson) { innerList.Remove(aPerson); }

public int Count { get { return innerList.Count; } }

// Get/set element at given index public Person this[int index] { get { return (Person) innerList[index]; } set { innerList[index] = value; } }

// Support IEnumerable interface public System.Collections.IEnumerator GetEnumerator() { return innerList.GetEnumerator(); } }
Also known as: TypeSafeCollection?, DomainObjectList

See also UseTheStaticTyping


There was a recent discussion about type safe collections on the TestDrivenDevelopment list:

http://groups.yahoo.com/group/testdrivendevelopment/message/1962

It would seem like the MicrosoftDoneThing? is to derive from CollectionBase? for this kind of stuff, but it seems like even more extra work, although code generation tools are available. I also don't understand the full implications of the name hiding magic going on there, so off I go to read the spec!

I'm writing a little utility to do this kind of generation. I'll share what I find after I figure it all out (if I figure it all out). -- KrisJohnson

Kris, You may find the following links helpful:

http://discuss.develop.com/archives/wa.exe?A2=ind0107C&L=DOTNET&P=R35911 http://samples.gotdotnet.com/quickstart/util/srcview.aspx?path=/quickstart/howto\/samples/CompMod/CodeDom/ListBuilder/listbuilder.src

-- ShaunSmith

Thanks for the links; they were indeed helpful. I've got the thing working: you can check it out at http://kristopherjohnson.net/kj/TypedCollectionGenerator. It generates type-safe collections and type-safe dictionaries. --KrisJohnson


The thing that bugs me most about using generic collections (like ArrayList in DotNet) is that casting from Object is required. A true strongly-typed collection shouldn't be using Object collections at all, even under the covers. It's not actually that hard to build a dynamic collection on top of a typed array (but a tool would make it a snip).


Look, either you want run-time type checking, like Smalltalk, in which case collections work just fine, or you want compile-time type checking. If you want compile time type checking, and you want collection classes then the only sensible option is GenericTypes. All this having to subclass from some base collection class is garbage. I just want to declare something as 'OrderedCollection of Widget' or whatever.

I can't see why this wasn't obvious to the designers of JavaLanguage, perhaps they were hung up on the WriteOnceRunAnywhere bit, but it should certainly have been blindingly obvious to the designers of CsharpLanguage who had no excuses.

Am I missing something, or what? - StephenHutchinson

You might like to hear JamesGoslingOnGenerics


Generics would be lovely (and both Java and DotNet will have them someday, I'm fairly sure), but I don't think generics/templates are as important in an environment where everything is an object, which is possibly why it was low on the list for the designers of both environments, but such a boon for C++ (correct me if I'm wrong, somebody).


But in Java everything isn't an object, some things are primitive data types (yuck!). Anyway, I don't see why generic types aren't just as important in an environment where everything is an object? - StephenHutchinson


Because by treating everything as an object, you achieve genericity. Instead of

  ArrayList<string> list;
  list.Add("Foo");
you do:

  ArrayList list;
  list.Add("Foo");
So you lose strong typing, but you gain simplicity through a common type hierarchy (at least in DotNet you do, by hook or by crook). How would you implement the equivalent of ArrayList in C++ without resorting to untyped pointers or demanding that the elements implemented some interface? (Please tell me, I really don't know and I'm trying to learn C++!)

The question is really "why would you want to implement an untyped ArrayList in C++?" If you don't know what is in the list, what are you going to do with objects you retrieve from the list, without casting? And if you cast you are breaking the type system, so you might as well use a template to avoid the awkward cast syntax and make your program safer. Untyped collections are only useful if you don't need to cast, as in dynamically typed languages.


By using templates, of course:

  #include <vector>
  #include <string>

int main() { std::vector<std::string> messages; messages.push_back("Look ma!"); messages.push_back("No casting!"); }


Uh, exactly (although I think I meant to say "how else would you implement..."). See my point? Having everything as an object means templates in a language are less important, which is why they're not a feature of C# or Java. The only thing templates would bring to those language is less casting (which is reason enough to introduce the feature, but not a show-stopper).

The other advantage is static and dynamic type checking. You can't accidentally put the wrong type of object into a typed collection. I know RealObjectOrientedProgrammers never make mistakes like that, but for those of us who do, it is useful. Also, in C++, templates allow for optimizations that aren't feasible with the "everything is a subclass of Object" paradigm.


But in C++, everything is an object. A C++ object is just a chunk of memory that is typed and has been initialized, either by a constructor or by a primitive language operation (which is undefined for auto variables of primitive types).

That is the C++ definition of "object", but it is not much like the other uses of the term relevant to this page. In particular, there is no Object class with an interface and methods to be inherited (unless you consider low-level stuff like 'operator new' or 'operator &' to be methods), and there is no standardized nor useful way to have a collection of references to untyped "objects". But the point is that this does not matter if you have strongly typed, generic collections. You never have, or indeed *need* references to untyped "objects". E.g. as stated in the paragraph below (which follows on from the one above). I think we are in agreement here - I'm just pointing out that the intended meaning of "everything is a subclass of object" above refers to the Java/.NET/Smalltalk concept of "object".

An untyped collection is only useful if you have true polymorphism. In a statically typed language, the Object base class is not useful, and you have to cast types whenever you query a collection. Generic types make collections much more convenient. For example, compare collections in the EiffelLanguage and those in the JavaLanguage - Eiffel is light-years ahead of Java.


I have seen this recently in a Java application, and it drives people (at least me) nuts, because a PersonList? is not a List (i.e. it does not extends List), therefore you cannot pass a PersonList? to all those neat methods that accepts a List (such as Collections.unmodifiableList()). When applied liberally in the application, you get lots of these "almost like" a list things that you have a hard time remembering which XXXList supports which subset of List methods.

I would very much prefer a ListWrapper? (or whatever better name) that is used like this:

  List personList = new ListWrapper?(Person.class, new ArrayList());
  personList.add(new Person()); // fine
  personList.add("wrong"); // throws an exception by ListWrapper?
It ensures the list contains what you want, even though you still have to downcast when retrieving.


In C++ with STL, those "neat methods" are also generic, solving this problem in style, I think. Martin Bennedik.


I notice someone in the above discussion, defending Java's approach to collections saying So you lose strong typing, but you gain simplicity. Well, that sounds awfully like an argument for Smalltalk. I mean, I don't mind people advocating strong typing, and I don't mind languages that do it properly like Pascal or Eiffel. What sticks in my gullet is when people drone on about how Java is much better than Smalltalk because it's strongly typed when there is a hole in the type system that you can drive a coach and horses through called collections. In fact, you need to regularly drive a coach and horses through it in any real program. - StephenHutchinson

The Java type system also lets you assign any array of objects to a variable of type Object[], which is a hole in the type system. E.g. this is allowed by the type system, but is incorrect:

    Integer[] integer_array = new Integer[10];
    Object[] object_array = integer_array;
    object_array[0] = "ERROR!";

Assignments to array elements therefore have to be type-checked at runtime, and can throw ArrayStoreExceptions.


In ObjectiveCaml I can do this:

 class point (x : float) (y : float) =
   object
     method x = x
     method y = y
   end

class polar_point (r : float) (phi : float) = object method x = r *. cos phi method y = r *. sin phi end

let pointlist = [new point 1.0 2.0; new polar_point 3.0 1.0]

let print_x obj = print_float obj#x; print_newline ();;

List.iter print_x pointlist

ObjectiveCaml is fully statically typed; however, the type inference algorithm figures out the maximal interface shared by a class "point" and a class "polar_point". It also figures out the type of the function print_x, which happens to be
 < x : float; .. > -> unit
In other words, a function taking as argument an object that implements at least a method x with type float. The type system allows me to express covariance and contravariance.

-- StephanHouben

How can it tell that a method name in one class means the same thing as the same method name in another?

In general, you cannot. Of course, unit tests may help ;-)

Although I'm sure it's correct in most cases, matching by name only is going to let some type errors slip through the static check which is why most statically typed languages check types by type-name equivalence, not structural equivalence.

Well, it matches on name and on signature. Of course, type-name equivalence doesn't give any strong guarantees either, since I can always subclass a baseclass and override all methods with nonsensical implementations. More generally, 100% unforgeable contracts cannot be unified with subclassing, so O'Caml doesn't even try. If you want unforgeable contracts, use abstract data types (which O'Caml also provides)


StronglyTypedCollection is, my view, merely a code smell that results from not having a real metastructure. Such a metastructure is present in Smalltalk, can be added to Javascript, Python and Perl, and can even be added to JavaLanguage and CeePlusPlus with a little more work.

The missing element that creates a need for StronglyTypedCollection is the presence of a ClassObject for every object, and a reference to that ClassObject from every object. Once that glaring deficiency is addressed, StronglyTypedCollection is unnecessary (or, equivalently, its implementation is trivial). No "downcasting" is needed -- the consumer just gets whatever collection member is returned. Period.

-- TomStambaugh


EditText of this page (last edited April 14, 2006) or FindPage with title or text search