Automated Code Generation

Falls into three categories (arguably!):

Templating - source code is generated to be edited. One-shot code generation. Often supported to some extent in IDEs and editors.
Round-Tripping - source code is generated to be edited, but the edited code may be re-imported into the generator to affect future generate-edit cycles. Most often found in CASE tools e.g. RationalRose, TogetherJ.
Compiling - source or binarycode is generated but is not intended to be edited. Eg JavaServerPages, compilers of all kinds.

My own personal preference is for compiling, as long as the compiled code supports the OpenClosedPrinciple to some extent - often the 'compiled' code will not do everything you need it to do out of the box. For this reason, many projects will develop their own code generation systems in order to gain control over what is being generated.

Typical uses of automated code generation are in CASE tools for 'roughing out' a system and in interface layers where a small amount of information can be used to generate all of the repetitive code required (eg IDL->CORBA code, WSDL->SOAP support, OO-RDBMS mapping, Tables->Forms)

Many problems can be eliminated with InstallableCodeGenerators. Instead of a tool generating code or most of the code, instead the the parse tree is made available and problem specific code generators can be run against the specification. For example, a FiniteStateMachine description can be generic, but the code generated can be customized by target environment. In my FSM generator I have installable code generators for C, C++, and for different OSs and different middleware layers. In my dreams I think developers would work at the specification layer with a few select others working at the code generation layer. -- AnonymousDonor

In circuit simulation tools, evaluating the state of the circuit is often the bottleneck. So my circuit simulator reads in the circuit description, parses it, generates code, compiles it and dynamically links with it. Code generation at runtime is great! The funny thing is: the simulator is written in Python and it is blindingly fast, since all the computation happens in code that is completely special-purposed C code for the problem at hand. -- StephanHouben

I was at a talk on Friday about a system where you could generate IDL. So you type in some stuff, generate IDL, generate Java from the IDL, and compile. The presenter proudly announced that from 3000 lines of input they could generate 28000 lines of source code. I had to ask why would you want to - if you could possibly get away with 3000 lines, you should take that option. I think this is not the simplest thing that could possibly work, I think it is a case of WeightMakesRight?. -- JohnFarrell

"X thousand lines of code automatically generated for you" strikes me as a waste, and misses the point: Why would you want X copies of Y lines of code to maintain (X*Y), instead of calling Y lines of truly reusable code?

Code generation is often little more than CopyAndPasteProgramming.

Consider the conversion from C to assembler - "X thousand lines of assembly code automatically generated for you from C" - you don't maintain the assembly, just the C. Where do you get (X*Y)? I doubt you would end up needing to edit 84 million lines of code. Aren't they saying you worked with the 3000 lines, and treated the 28000 lines as C treats assembly?

I subscribe to the notion that code generation is often a language smell - we need code generation because our language does not have all the types of abstraction we want. Because the language has a smell, though, does not always mean we should not use it. If we decide it is the best compromise for a task, and code generation would make it better, then that is our best option. Since there is not and never will be a language with all the abstraction capabilities we would like.

Actually, Lisp is that language, since macros are built-in code generators that allow you to create any new syntax you want.

I imagine there will always be a place for code generation to improve on a language.

There's another purpose for code generation, though, and that's generating the same code in different languages. If we, for instance, extract all our abstractions for a UI into metadata, perhaps, we can interpret the metadata directly on the fat client, but that's not practical for the HTML front-end, so we can use the same metadata to generate the HTML.

-- Steve Jorgensen

Instead of generating code from some input file, your program can interpret the instructions in that input file. An interpreter is easier to write, maintain, and deploy than a code generator. You can squeeze better runtime performance out of code generation, but few projects require that extra performance.

[Like most generalizations, there are exceptions. Code generators are, at times, easier to write, maintain, and deploy than are new interpreters. This is especially the case when the new language would inherit a lot of features from an existing language, or if the program must integrate with existing OperatingSystems or libraries. Compiling from high-level code to high-level code is especially common: VHDL to C++, Haskell to C, RealMacros to their host language, etc. Anyhow, once you start dealing with PartialEvaluation, CompileTimeResolution, JustInTimeCompilation, and runtime specializations, the line between 'interpreter' and 'compiler' tends to vanish.]

Compilation of a major programming language entails a totally different economy than does AutomatedCodeGeneration for a single project. A extraordinary amount of effort and expertise goes into creating a C compiler, and everyone who uses that compiler leverages the result. That's very different from AutomatedCodeGeneration that's used only a handful of times.

Further similar criticism at CodeGenerationIsaDesignSmell.

So, where does AutomatedCodeGeneration end, and Compiling begin?

AutomatedCodeGeneration doesn't end where Compiling begins. If you meant to ask where Compiling begins, it's right at the point you promise to never hand-edit the generated code. This is an important promise, because it allows the generated code to be regenerated at any time in the future in a fully automated pipeline.

CategoryAutomated