Generating Cpp From Smalltalk

AlistairCockburn wrote:

I found myself wondering, on this topic, on behalf of a C++ project, about 'designing' in Smalltalk, and generating C++ for implementation. Not that I can imagine any self-respecting C or C++ shop letting me try that on them. In my thought experiment, I found C++ a more tractable target than Java. Has someone else considered or tried this (or am I just going loony?).

I've thought about this a little bit and here is what I've come up with..

The simplest (and silliest) way to do it might be to scan an image for every method that is ever declared, and then generate a CeePlusPlus class named Object which has a virtual function for each of those methods. All methods would accept object and all methods would, by default, issue "message not understood." Each class in the image would have Object some place back in its inheritance chain. Each class would override the methods which it implements and appropriate code could be generated for them.

This is extremely simple approach would probably only work for very small images. I can picture most C++ compilers choking on the class definition of Object. Vtables which cross page boundaries may not be too cool :-) And, the whole thing would have to be built on each compile (no separate compilation).

Moving away from that simple approach, I see two paths:

Simple Interpretation
TypeInference

Simple interpretation would be any tack that would take Smalltalk code and represent it as interpretable data in the generated C++. For example, the simple approach above can be made less drastic by having a single virtual function on the generated C++ Object class. The function would accept some method identifier as an argument, plus a list of Object pointers. Efficiency and legibility would drop but it would probably compile.

Type inferencing would involve a lot of front-end analysis to discern the types of formal parameters to methods in SmalltalkLanguage. It seems like it would be very tough to do without hints from the developer, and probably impossible in the general case. Moreover, since it is possible for one object to fill another's role without being related by inheritance, the problem becomes very thorny.

Also, it seems like there are a couple of cases. In one, you just want to work in Smalltalk and generate C++, in which case it is just a Smalltalk compiler. In another, you want to generate C++ that you will hand extend and maintain independently of Smalltalk.

Any other ideas? Echoing Alistair: has anyone actually done this?

Interesting side question: will any scheme that allows most of the capability of Smalltalk be no faster than an interpreter?

-- MichaelFeathers

The approach I am (momentarily) thinking of is roughly the following. I put myself in the mind of a project in which real delivery of nasty system is required in C++ in the usual time frame. There are two small teams: a "detailed design team" (DDT for short?), who program / prototype in Smalltalk, and a code-generator team, who write the code generator. All know that Smalltalk does not convert naturally to C++, so they follow special rules to make that easier. They do not use Smalltalk's rampant polymorphism feature, but use straight class-typing as the C++ code will have to. They use special comments to help the code generation: specifically, within a method declaration, a comment of the form ": classname " is a typing hint to become a type declaration. Just these two conventions allow most of the code to be legally transformed to C++. Other things could be done with special comments, but that's enough to start.

The next convention must be to avoid heterogeneous collections, or come up with a programming convention to let the code generator generate those 'cast's from collections. Another is to deal with Smalltalk's anonymous functions (blocks), turning them into C++ function objects.

The end point of the exercise is to think about code generation time: The code generator I described is not perfect, so that manual labor is needed to get the running C++ system. Future maintenance will have to be done in C++, since there is no round-trip facility. Although that sounds bad, this is no worse than the miserable code generators we have today, so we are operating in the same problem space, and, also, that is not the goal. The idea of the idea is to enter the C++ programming moment with a detailed design that has been stress-tested for a while. The Smalltalk team can probably continue to run for a while paralleling the design changes, but I think there is a moment when they have to give up. I would prefer that the DDT people are C++ designers who move into Smalltalk and then back into the C++ group. Having a mono-lingual Smalltalk group that goes around doing this job strikes me as a social timebomb.

p.s. I did interview a project team who had prototyped in smalltalk and implemented in C++, and they said they couldn't simply convert over, the C++ design was considerably different than the Smalltalk. The difference seemed to come from the polymorphism rules, which is why I would make the Smalltalk team program to C++ typing rules. --AlistairCockburn

And there is also the garbage collection. I suspect that it is very easy to write code with GC that does not work without GC. For example, imagine how the design of the Money/MoneyBag classes in ErichGamma and KentBeck's JavaUnit would have to change if they were written in C++.

There are a couple of things that I wonder about in using the compilation approach in the context of a project. I'm tend to believe that there are some hard pervasive qualities of systems which should be present at all stages of development. For instance, if I am making a threadsafe system, I should be testing its resilience all through development. If I am making a distributed system, I should have the distribution working immediately and then build the domain logic on top of it. The idea is, that there are system aspects which are part of the design that require incredible rework to put in later and tend to buffalo an iterative process. Shifting languages may be one of these, but I'm just intuiting and I could be completely wrong.

What about TypedSmalltalk?

-- MichaelFeathers

I've used a number of translation programs and have never found one to be satisfactory. The code they produce is generally unmaintainable, but generally needs to be customized.

I'd suggest implementing in Smalltalk to learn what the objects are. Then, when the program is working, one might possibly stop. (In some companies, don't mention this possibility to the powers that be.)

Then using the Smalltalk implementation as a model, reimplement in the target language, essentially by hand. If you can automate anything, swell, but don't spend more than a few days on it.

Regarding system-level changes that might change everything, I'm less concerned about that than Michael seems to be. Maybe I've just been lucky, but I think that generally, objects tend to encapsulate the sorts of things that need changing when you go distributed or multi-threaded. Recently we made C3 run multiple CPUs in a day, and the task of actually building it to do multiprocessing is under 5 engineering days. --RonJeffries

(Composed the following offline and by the time I'd come back to paste it in Ron had said much of it. Oh, well.)

I have yet to see cross-language code-generation work well (i.e. write code in language A and run that through tool T to generate code in language B that is then to be maintained). It's OK if you write an A compiler that happens to use B as an implementation language (consider all the compilers out there that generate C output; consider then having to maintain the emitted C), but conversion tools tend not to produce idiomatically correct, maintainable code. (I've tried to do such rather-simpler things as convert from Pascal to C, and opted for manual rather than automatic translation).

I do think it could work to prototype in Smalltalk, capture the design at some point and then reimplement it in C++. That's probably 90+% class-by-class method-by-method reimplementation (i.e. capturing much of the "low-level" design), but I would avoid automated tools. E.g. if the class protocol suggests methods to insert/retrieve from a container, the Smalltalk original could implement that with heterogenous collections, while the C++ translation would use the more natural (in that context) typed version, but using the programmers' heads to do the translation.

GarbageCollection needn't be such an issue; there exist GCs for C++ (see GarbageCollectionInCpp). I haven't seen much code using that natively, though many higher-language implementations use that to implement their own GC (e.g. SatherLanguage). It's not uncommon to have subsystems manage their own memory in various ways (shared memory, DB-mapped in various ways...), importing one relying on GC shouldn't be too bad.

All of this assumes some good reason to prototype/design in one environment and develop in another. I've seen some such cases, e.g. where the target platform is itself in some way what's being designed. On the whole though I suspect one may be better sticking with Smalltalk into implementation, or starting with C++ from the beginning. --JimPerry

Two caveats on building directly in, e.g. C++ because you want to wind up there. One: why do you want to wind up there? This is often a telltale sign of CartHorseInversion.

Two: if the project has lots of learning to do, it's harder to learn in, e.g. C++, than in Smalltalk. The more learning-intensive the project, the greater the benefit from "prototyping" it first. --RonJeffries

Why do you want to end up in C++? Let's leave that out of this discussion. I'd reiterate at this point is that converting good, standard Smalltalk to good, standard, OO C++ won't work because of the polymorphism / type-checking business. If you intend to convert / generate, your Smalltalk ought at least to follow the same inheritance hierarchy rules as the target language (I think C++ would be easier than Java, because good Java will make use of interfaces). Also, the point isn't to maintain in Smalltalk, the point is to stress test the design. As Ron points out, the learning is faster in Smalltalk. --Alistair

(edited this one too, but then Wiki wouldn't talk to me for awhile...)

As I said, all of this assumes Alistair's reasons were well-founded. Starting in one environment and switching to another adds a step of complexity and expense you want to think twice about. So if there's a good reason to end up in C++, then there's reason to consider the costs vs. benefits of starting there too.

Learning is too subjective a process for me to accept a simple assertion that one environment makes it "easier" or "harder" than another. YMMV. --JimPerry

Right. I am thinking right now of a contractor who just won a 80-person or 150-person DoD contract to be implemented in some combination of C++ and Java. They have experienced C and PL/I [PliLanguage] programmers by the hundreds, and a couple of people who have done some C++ or Java programming. Smells like many flavors of trouble to me. Having just read TheDeadline, my mind is running along the lines of how to let them get some design evaluation while their new C++/Java people are learning (colorful and bloody metaphors running rampant in my mind as I type this, but leaving all that aside...). So I am playing with this "low-level design in Smalltalk idea". The argument for design evaluation in Smalltalk is that some of us find the Smalltalk just so much faster to move and change designs in, and it is executable. Could all be idiotic, and perhaps that's where this page ends up. But it could be usable, and that's how this page got started. So I am curious for more insight or disagreement. --AlistairCockburn

Makes sense. I just wonder whether using Smalltalk with the type annotations that you mention (or TypedSmalltalk) would be substantially different from Java. Sure, the syntax is simpler, but you'd still have to meander the code and change declarations. If you were dropping typing, would it make sense to just get developers to do SpikeSolution-s every step of the way in Smalltalk and then translate to C++. If I get what you are saying, you need to do something drastic to get people thinking in objects. It just seems that you'd need someone with severe C++ experience to raise their hand every once in a while and say "uh, that's great for Smalltalk, but if you do it this way you'll avoid trouble later." In particular the insights that people like RobertMartin and JohnLakos have on C++ development are things that I suspect people really do not have to consider in Smalltalk. It seems that the biggest risk in your scenario is not enough C++ and Java mentorship to go around, even if everyone starts thinking in objects. -- MichaelFeathers

Some people would like to run Smalltalk on the JavaVirtualMachine, which means compiling Smalltalk to Java ByteCodes. This turns out to be not significantly harder than generating Java high level language. It's harder than doing C++, because you have no unchecked casts. On the other hand, these people don't much care about maintaining the Java bytes. They just want to take advantage of the (alledgedly) ubiquitous and fast JVM.

Some of their art may be worth looking at. For example, instead of a single huge Object class that includes every method, you can have a million little interface classes, one per method. You inherit from the interface for each method you implement. A message send:

aRect setTopLeft: aPoint

becomes something like:

try {
ISetTopLeft temp = (ISetTopLeft) aRect;
temp.setTopLeft( aPoint );
}
catch (ClassCastException e) {
aRect.messageNotUnderstood( aPoint );
}

You can then try to simplify this. Merge the interfaces into common groups, hoist the class casts where they are common subexpressions, use type inference or programmer hints to determine where the casts cannot fail, etc. As it is, it leads to bulky code (although a fine-grained JustInTimeCompiler needn't generate machine code for it all), but perhaps not much slower than a naive Smalltalk implementation. -- DaveHarris

Dave, that is a very cool approach. How is performance? -- mf

I've no idea. I've not tried it myself; it's just something that I've seen talked about. -- DaveHarris

A HundredPersonProject to be done using object technology delivered in C++/Java by a huge team with no previous object experience definitely ''smells like many flavors of trouble'' just as it stands; adding Smalltalk as well seems even more odorous.

My guess is that the Smalltalk prototype approach as described in any of the proposed variants here would only work with a design/implementation team with strong Smalltalk and C++ ("real" OO C++, not just better C) experience and mentoring ability, as well as an implementation team adept in the target language.

I think it is hard enough to teach object thinking. To try to train the leaders/designers of the group in a Smalltalk spin on that (including learning to use the class library and development environment effectively) will be tricky. To train the larger group in the C++ (or Java) spin on that also tricky. (But of course both of these are regularly attempted these days). But to try to do both from a standing start sounds nearly suicidal. --JimPerry

In my experience, you spend a good deal of time telling novice Smalltalkers "Pay no attention to that man behind the curtain." In particular, pay no attention to where the memory for objects comes from, and where it goes. This violates the expectations of experienced C and PL/I programmers.

By contrast, you tell novice C++ers "You are the man behind the curtain, and you must know what you're doing." When you explain the difference between malloc and new, for instance, you have to talk about the stack, and about how it is used. Similarly, the difference between pointers and references (which I still haven't fully absorbed) can't be understood unless you understand the implementations of each. Thinking about memory management is second nature for experienced C++ers and PL/Iists.

My gut feeling is that if you design in Smalltalk, then ask somebody else to implement your design in C++, the implementer is going to be frustrated and angry. "But you can't DO that!" will be heard throughout the cubicles. (One of my C++ coworkers, who has been invaluable in helping me cross the chasm, keeps accusing me of 'magical thinking'.) The implementer is going to have to understand enough Smalltalk to translate the design from Smalltalk idioms to C++ idioms, which means that the implementer needs to simultaneously understand two mindsets. That is asking a lot of somebody who is new to objects.

--BetsyHanesPerry

P.S. I agree that Smalltalk is a much more forgiving language/environment for experimenting: "How would that work?" "I don't know, let's try it out" is nearly painless. I wish I had an insight into how to encourage experimentation and rework in an environment where the cost of recompilation can be substantial.

I'm not talking about designing in Smalltalk and asking someone else to implement in C++. I'm talking about one (team/person) designing in Smalltalk and then implementing in C++. I've developed for both and feel quite certain that if you have your objects right, good Smalltalk will generally move smoothly to C++. (Good Smalltalk doesn't hand blocks around, doesn't change the method dictionaries on the fly, doesn't use (many/any) #perform:s, ...) IMO, most of the things that you can't do in C++ but can in Smalltalk are well-hidden in good code, and you'd just do the same functionality in C++ style, object by object. Mind you, this is just an opinion, I'm not volunteering to do it.

My main thought, though, is that I can learn so much more quickly in Smalltalk that it will pay off. If I can avoid even ONE major retry in C++, I'll probably buy all the time back that the Smalltalk prototype takes. And as described above ... once it's running, you may not need to do it in C++ at all. After all, Smalltalk is write-once-run-everywhere. --RonJeffries

I think that the translation to C++ may not always be straightforward. Because of the differences in type rules, C++ developers may end up with some pure abstract classes (interfaces) which describe roles or capabilities/qualities of objects. C++ developers would probably need them to deal some of the natural polymorphism in Smalltalk that doesn't translate well into strongly typed languages.

DaveHarris's example above reinforces a belief I've had for a while.. that interface-centric design, and technologies like COM, are really a ways of moving OO design closer to what can be done naturally in Smalltalk.

-- MichaelFeathers

In Smalltalk, type checking is done dynamically. That example shows a Java idiom for doing the same. I don't see the interface-centric part as being significant. It's more the dynamic part. -- DaveHarris

If I read Alistair's contributions correctly, he is talking about designing in Smalltalk and asking somebody else to implement in C++. Alistair, did I misread you?

--BetsyHanesPerry

We can follow a number of trails to try to get to a useful endpoint or learn something interesting. In my case, TomDeMarco's statement, "almost nobody ever does a design that gets close enough to the actual code to allow sensible review" (which I agree with) got me thinking about how to get a design close enough to the actual code to allow sensible review. Now, I think of design documents as "promissory notes" - promises about futures, and there is a place for them. But they don't fit the current need. Smalltalk is the one language I know which expresses "design" directly (LispLanguage and AplLanguage are the two other candidates I know of). So I was wondering if Smalltalk could be used as a an active design medium on a project with long turnaround times..... as Ron said, "I can learn so much more quickly in Smalltalk that it will pay off. If I can avoid even ONE major retry in C++, I'll probably buy all the time back that the Smalltalk prototype takes." As Betsy said, "I wish I had an insight into how to encourage experimentation and rework in an environment where the cost of recompilation can be substantial." ...and as Michael said, "translation to C++ may not always be straightforward, because of the differences in type rules" (strengthen 'may not' to 'will not').

So now what? Well, I met one DoD crew who designed in APL2. Since APL2 was x5 or x20 or x100 faster than AdaLanguage, the prototype gave them excellent design feedback without extending the total schedule. Could do this in Smalltalk - just prototype and discard. But, could one prototype and generate a draft? One answer up above says, No. I don't know if that is the final word, but it is certainly defensible. I am exploring how much one can generate with how little extra clutter. messy subject. I know the type checking changes the class hierarchy, hence I suggest the Smalltalkers write using C++ typing rules.

If 'yes', however, then would it be the same team or different people? The APL2 team was different from the Ada team, and I suspect there was some fair friction between them. I imagine the same with the Smalltalk / C++, plus there is the difficulty of transferring the design knowledge between the heads...

as PeteMcBreen would probably say here, "So transfer the heads, instead!", i.e., have the Smalltalk designers be the C++ team leaders or active designers.

The test case for this thought experiment is the DoD company I recently encountered. I don't know if their other project setup characteristics, with 100 novice OO people learning C++ and Java, affects the design feedback answer or only just makes us wince. Betsy perhaps has another real case one could use as an honesty constraint on the solution. --AlistairCockburn

Unfortunately, the only parallel real case I worked on (50+ coders new to objects) was in Smalltalk, and crashed and burned. An good post-mortem on that project is Doug Barbour's "What Makes a Good Project Fail?", http://www.dsbconsulting.com/artGoodProj.html

--BetsyHanesPerry

Using different people as Smalltalkers and C++ coders is bound to create friction. But -- I suspect -- using the same people to create the Smalltalk and the C++ will do the same, albeit in potentially more complex ways. I figure 90% or more of the developers that know both languages will be solidly in either the Smalltalk or C++ camp. The staffing of your project will therefore fall along a spectrum. At one end, you have Smalltalkers who happen to know C++; at the other, C++ evangelists who know Smalltalk. And in between, you have a discordant mix.

With a "balanced" mix, you may well get a "balanced" lack of progress throughout the project as religious battles are fought every step of the way. With C++ evangelists, you could have slow progress developing the "design" code as fish try to fly (though the right people might avoid that). With Smalltalkers, you have your best chance: when the "design" is good enough, they might just stop! --KielHodges

Smalltalk is the one language I know which expresses "design" directly is an interesting statement, but I'm at a loss as to what precisely it means for a language to allow ExpressingDesignDirectly, such that Smalltalk and maybe LISP or APL do so but (one assumes) C++, Java, or other candidates do not.

The expression "<language X> expresses intent directly" (where language X is the preferred language of the speaker) is one of the hoariest in the field, but rarely justified by the passage of time. I'm sure something deeper than that is meant here, but I'd appreciate enlightenment on the issue. --JimPerry

Right. So my assumption for this page has to do with ExpressingDesignDirectly. If you don't agree with those assumptions, then this thought exercise has little meaning. --AlistairCockburn

Alistair -- If you're looking for an easily-mutable prototyping language, have you considered Perl? The following quotation from Salon (http://www.salonmagazine.com/21st/ as of 8/13; "The Joy of Perl")

made me think of this discussion:

'LarryWall believes that this evolutionary process mirrors how the real world works.

"Perl does a lot of hand-holding," says Wall, "and gives very good feedback on what it thinks is wrong with your program, so there is a very rapid turnaround if you are trying to develop something quickly. You try something and it breaks and you fix it, you just grow it, evolve it. This is how I program, this is how I write. This is how a lot of people write, this is how they think." '

--BetsyHanesPerry

DaveHarris said (much earlier): Some people would like to run Smalltalk on the JavaVirtualMachine, which means compiling Smalltalk to Java ByteCodes. This turns out to be not significantly harder than generating Java high level language.

The easiest way to accomplish this is to buy VisualAge/Java & VisualAge/Smalltalk from IBM. Both use the same VM, both connect to the same repository, and their mutual co-existence demonstrates the semantic equivalence of the two virtual machines.

BetsyHanesPerry said (also much earlier): In my experience, you spend a good deal of time telling novice Smalltalkers "Pay no attention to that man behind the curtain." ... by contrast, you tell novice C++ers "You are the man behind the curtain, and you must know what you're doing."...Thinking about memory management is second nature for experienced C++ers and PL/Iists.

The GC issue is deeper than how programmers want to think, the issue is that the basic semantics of C++-style object destruction preclude efficient garbage collection. We discovered this at ComponentSoftware and the Java team discovered it again with Java. It isn't the "thinking about memory management" that is the rub...most experienced Smalltalk VM developers think about memory management. The issue is that C++ destructor semantics require to explicitly declare a moment when an object is destroyed, and *then* perform one of its methods (namely its destructor). This has several terrible consequences: the GC must invoke methods on objects that are known to be dead (else the GC shouldn't be destroying it), the GC must find and touch every dead object instead of touching just the live ones (making the GC overhead proportional to the size of the object space instead of the number of live objects), and each object creator must know when the object becomes dead (introducing deadly overhead for the most common kind of smalltalk object: dynamic, short-lived objects). The problem is that incremental, efficient garbage collection *can't* be implemented in C++ while conforming to destructor semantics, no matter how experienced or dedicated the programmer.

I'd like to offer a different direction for GeneratingCppFromSmalltalk: add the metaframework and reflective capabilities that are already present in Smalltalk, define a "scripting" scripting language that expresses the fundamental semantics of the environment, then write generators from the new metaworld that this defines into C++ (or any other target).

Consider a world where everything is an object, and every object is an instance of the C++ class "CppObject?". Every CppObject? is defined to contain a dictionary of attributes, keyed by attribute, and whose value is another CppObject?. By convention (smile), every instance of CppObject? contains an attribute referencing its "Class" (another CppObject?). The world comes with some CppObject? instances already in place..."Class", "Attribute", "Method", "Process", "StackFrame?", etc. Some special CppObjects? support protocol for invoking behavior in external dll's or so's, sort of like platform functions. A collection of these defines a set of environment primitives, which are guaranteed to be present in every system (but can be readily extended or restricted). All you need to do now is write a little compiler that turns statements in the "scripting" language into sequences of calls on these primitive behavior objects. Voila: A self-sustaining OO environment...suspiciously like Java, Smalltalk, and some Lisps.

Such a meta-engine can be fully implemented in C++, and so of course it will run really fast *smile*. Isn't this what you had in mind? smiling, trying not to bite my tongue while its in my cheek. --TomStambaugh

Several people have suggested above that the only sensible way to deal with the proposed situation is to prototype in SmalltalkLanguage and then re-implement in CeePlusPlus without (or largely without) the aid of automatic translation. I'd have to agree and say that this is the best practical approach. To be successful, however, the same programmers that implement the prototype must also implement the final product. Obviously such a project is likely to end in disaster if you start with Smalltalk programmers that don't know C++. It might be more practical to start with C++ programmers and have them learn Smalltalk, but that's no doubt a risky approach as well. The only sensible approach for this kind of project is to do it with people who are reasonably good at Smalltalk and C++ to begin with. I'd suspect finding such programmers might be kind of difficult.

Of course, the real reason for starting with any other language than CeePlusPlus at all is because C++ is not necessarily the best language for prototyping. This led me to the following conclusion: Prototype the project in JavaLanguage, then re-implement in C++. Finding programmers that are reasonably proficient in both languages is likely to be much easier, and, speaking from practical experience, it's not hard for a C++ programmer to pick up Java. Of course this begs the question of whether automatic translation from Java to C++ would make sense, even if Smalltalk-to-C++ translation doesn't... -- CurtisBartley

A coupla small points...

Perhaps CeeLanguage is a better target than CeePlusPlus? It's simpler and you can write your own object environment if you want to. Isn't that what SqueakSmalltalk does? Also, I've had good experiences with PythonLanguage for this kind of prototyping. It doesn't have such a nice environment as SmallTalk, but it does have interpreters, and its syntax is more AlgolLanguage-like, which some C++ people find easier to grasp. Of course, whether this is a good solution depends on whether you secretly intend to ship the prototype. -- SteveFreeman