Java Primitive Types Discussion

In WhyJavaIsGreat (or not), we wondered why primitive types (int, long, float) are treated differently from objects (arrays, strings, user-defined classes).

At the machine level there is a genuine difference between an integer and a chunk-of-memory-treated-as-object, and Java cares enough about efficiency to reflect that. This is kind of similar to C.

But Java is a strongly typed language, and should have enough information at compile time to provide an efficient implementation of primitive types that fit into the Java object model. Smalltalk manages it.

You're right. This is clearly a wart on the language. The reason stated above for this is the correct one, and if you bear in mind the original situation of the language it makes sense: a language written by one guy to support one project group. Efficient implementation of primitives-as-objects is much harder. The language purist in me recoils from this; the guy who writes Java code every day is amazed at how rarely it's an issue. But it is an issue. --GlennVanderburg

I find it an issue every time I need to store primitive types in containers.

Oh, definitely. But for some reason I don't find myself doing that very often. If you'd asked me how big a problem this would be just after I read the language spec, and before I had actually written much Java code, I would have seriously overestimated the problems.

I (very) often want to use primitive types as keys for Maps. This might be because I'm writing a lot of network protocol code.

Hmmm ... having acknowledged that it's a wart, let's turn to workarounds. In this particular case, it seems that it wouldn't be too hard to write a generic wrapper implementation of Map that contains overloaded get(int) and put(int, Object) methods to make this easier. Additionally, my guess is that for this purpose your keys are probably ints or longs, which actually makes it sound more like a List. I think I've got a good SparseList implementation lying around somewhere ... would that help?

Note, too, that it's one of the goals of the ongoing work on parametrized types to handle this case, so that (to use C++ syntax) creating a Map<long, Socket> would do what you want and avoid the need for you to handle all the wrapping and unwrapping.

It seems like there are actually two possible levels. One could get rid of the idea of primitives altogether and allow people to even extend Integer and so on. There's no reason not to, but it will slow things down a bit to do virtual dispatch even on fundamental operations.

One is that primitive types are treated like instances of Object, but they can't be extended. This is kind of like the way arrays are implemented in Java, and also like PythonLanguage. This would let us, for example, use integers as map keys. The JIT can have a special case for these classes so that method dispatch can be fast.

AFAIK, in newer Python versions, one can extend all built-in types including integers. The newly introduced booleans are in fact a subclass of integer with the values constrained to 0 and 1. See www.python.org/doc/2.2.3/whatsnew/sect-rellinks.html and www.python.org/doc/2.3.4/whatsnew/section-bool.html --MarkusSchaber

However, there's still going to be a fair amount of overhead. Java objects typically have about 16 bytes of overhead (more than Smalltalk?) and so that's going to bloat primitive types considerably. Furthermore, in a primitive array we store the elements inline, but if they were objects we'd have to allocate N+1 chunks of memory, and it all gets complicated. Then we have to garbage-collect integers, and so on.

I guess it'd be possible to special case integers, but still make them look to the programmer like they're objects -- store them inline with the low-order bit set to say they're not a pointer, and so on. This is what Smalltalk does, isn't it? That would be a nice language, but it's not Java. Perhaps it could be Java 3.0, but the language designers are so conservative I wouldn't count on it. Talk about a GreatLeapBackward?.

Python, if I remember correctly, creates genuine immutable integer objects, and caches singletons for the commonly used values [-1, 100], though there are also ways to get low-overhead native types if you're storing a lot of them and need them to be small. But then Python is more concerned about keeping the programmer happy than about squeezing out every last byte. -- MartinPool

Smalltalk, if I, also, remember correctly, makes integers look like normal objects at the programming level, but treats them differently at the implementation level. Values can be either object references or integer values, and a single bit (i.e. highest- or lowest- order depending on the memory model of the platform) distinguishes the two. Method dispatch checks the bit and either dispatches a message to an object through a reference or performs an efficient integer operation. --NatPryce

RubyLanguage also does this, I believe. So it's amusing to hear Java-centric people say stuff like "Every integer is an object? Oh my Lord, that must cause tremendous bloat!" Of course it doesn't. You can call methods on integers like they were objects:

 200.times { |i|
   # do something
 }

... but that doesn't mean that a primitive has the object bloat associated with Java objects. It just means that Ruby has really great SyntacticSugar. And just what are those 16 bytes of overhead in Java for, anyhow? -- francis

My understanding is that JamesGosling put PrimitiveTypes? into the language to make it a usefule and practical language that would run on devices and not consume memory, he wasn't thinking about developing a pure language for academics. Is this correct ? --ChanningWalton

That's basically it, but I'm curious about this assumption that a "pure language" is necessarily for "academics". :-) --GlennVanderburg

I have to strongly disagree that "pure" languages are for "academics". A language with simple syntax and semantics is significantly easier to use than one with warts. This translates into faster development of more reliable software, because the code is easier for developers to understand. Java is an example of this: companies that moved over to Java mainly did so because it is much easier to develop with than C++, because it removed many of the complexities and inconsistencies of that language. --NatPryce

I think this issue is about to hurt us badly. It looks like we've found a bug in the floating point implementation in the Microsoft VM, so we'd really like to add some temporary tracking and checking stuff to our code to track it down (no, we can't afford the performance hit of leaving it in the production code). In Smalltalk, we could change the method code for the basic floating point types. In C++, we could write a substitute class with operator overloading (having, of course, used typedef for all our basic types). In Java we can, um, er.... step through the code in the debugger, or rewrite large chunks of it :-(

Were you forced to use MS specific stuff? If not, you can simply switch VM's.

The java treatment of primitive types is just a hack, pure and simple. The original (BlueBook) Smalltalk VM integrated 15-bit integers very neatly into a 16-bit object pointer -- the machines of the time were restricted to use 16 bit object references. This year, I mocked up a Smalltalk-style VM that does the same hack -- so that the basic object reference ("OOP") is long enough to include any of the eight basic java primitive types as well as an object reference. It isn't that hard, and it lets the VM deal with all the primitive types without all the warts. I built a Smalltalk-style primitive dispatch mechanism that handles all eight primitive types. It just isn't that hard, and imposes no particular performance penalty on the VM. Instead of 1 tag bit (ala Smalltalk), I used a tag byte. Instead of a 15 bit field that swings both ways, I used a 64 bit double (so that all the types could be represented). I made the size of an object reference 128 bits. Current machines are far more than 8 times bigger in both memory and disk than the Dorado, and there's lots of other hacks to throw at the problem if object reference size ever gets to be problematic -- but all of those hacks can happen under the covers if the basic object model is kept clean and pure.

There is no need to treat primitive types separately from objects and no need to break the object paradigm to handle these types efficiently. It wasn't that hard to do in 1991, and Gosling and the Sun guys certainly knew how to accomplish the feat. The whole primitive-types-for-performance stuff is pure, unadulterated horsemanure -- a perfect, canonical example of PrematureOptimization as well as a clear demonstration of VoodooChickenCoding that an entire generation has swallowed hook, line and sinker. The Java team put primitive types in the language because they were sure it would speed it up -- with no experience, no data, and no testing to support the "optimization".

And, parenthetically, strong-typing is a language and compiler issue -- as demonstrated by IBM, who used the same VM for both Smalltalk and Java quite successfully. The presence of java-style strong typing is independent of the question of the alleged need for primitive types.

If they "certainly knew how to accomplish the feat" in 1991, then there definitely must have been a different reason to keep primitives. Increasing the object reference size from 32- or 64-bit to 128-bit is pretty significant; don't forget Strings are arrays of (16-bit) chars... I'd love to see your VM evaluated against real Java software.

CategoryJava CategoryDiscussion