Code Generation

CodeGeneration is the process by which some (semi?)automated tool (the CodeGenerator) is used to turn higher level (more abstract) input into lower level (less abstract) output.

I disagree. Reading database schemas via JDBC and generating java code is not a higher level to lower level issue. We need a better definition that encompases this sort of horizontal business.

how about: "to turn an abstract input, or a terse input, or a high level input (or any other existing input) into either a less abstract output, or a more verbose (less terse) output, or a lower level output, or, basically, some other output that doesn't yet exist."

''No, this CodeGeneration definition is good, we need better definition of higher level/lower level. In your sample database schemas via JDBC are higher level because you chose it to be higher level. If you will update database schemas from a java code then java code becomes higher level and database schemas - lower level. Highest possible level is human input. Lowest possible level does not exist, because it's always possible to generate next level from any other level. Usually the lowest level of an interactive program is set of windows on user interface screen, but even this level can be used to generate something else looping through windows handles.'' -- AlekseyPavlichenko

For Consideration as a definition ...

CodeGeneration is the process of transforming code from one representation to another.

Often, this is from a higher to a lower level:

Other times this is a side effect of DontRepeatYourself.

RefactorMe -- BevanArps

See an example of code generation in MdefExample.

The output of the CodeGenerator invariably needs further processing before it is complete. This further processing may or may not be fully automated; hence the distinction between ActiveCodeGeneration and PassiveCodeGeneration.

To make it more concrete: Programs that write other programs are doing CodeGeneration.

Yes, but this is not the only form of code generation... some code generators simply produce large (and often internally redundant) data tables for other code to work with. The original definition does not artificially restrict the scope of the concept.
These are 'Text Generators' or 'Data Generators'. If these don't generate code then they're not Code Generators (the clue is in the name)

And if you do CodeGeneration, you'll want to know HowToDoCodeGenerationWell.

It may serve you well to keep in mind the anti-CodeGeneration opinions of CodeGenerationIsaDesignSmell and RuntimeReflectionIsaDesignSmell. They'll help you avoid taking a good idea to excess.

Here's a hello, world code-generator

  #!/bin/perl -w

# read input my $msg = join(" ", @ARGV);

# output code print <<"HERE"; #include <stdio.h"

int main () { printf("$msg\\n"); return 0; } HERE

we can then include it in the build process:

  MSG = hello, world

test: hello test "`./hello`" = "$(MSG)"

hello: hello.o gcc -o $@ $^

hello.o: hello.c gcc -o $@ $^ -c

hello.c: perl > $@ "$(MSG)"


  % make

Yes, its easy to break it, but I say YAGNI to input validation --DaveWhipp.

CodeGeneration QuickQuestions

Q Do people consider use of GenericProgramming and TemplateMetaprogramming as examples of code generation?

A I don't think so, since CodeGeneration can create huge duplicate snippets of code, whereas GenericProgramming and TemplateMetaprogramming won't. -- AlexBetis

Disagree, they are in fact widely considered examples of code generation. As justification for why, they take general code and produce more specific code. I don't think it's relevant whether they produce duplicate code snippets, but actually that's not entirely true either. They usually can, and do, especially since you used the word "snippets". They do not produce duplicate copies of entire functions. -- DougMerritt

The ClarionLanguage RAD tool ( is an excellent example of code generation. Based on a data dictionary (database definition) and a set of highly configurable templates (browse, forms, reports, process) it can generate a fully functional desktop or web database driven application WithoutWritingaSingleLineOfCode?. --SergioCastillo?

I've seen that phrase or something like it somewhere on wiki before, as a warning.

Also see

I am hoping the 80/20 rule applies to code generation, but I'd be happy with even 60/40. I work for a consulting company. As such, we're constantly writing new applications. They are all data driven. They all have different database schemas unique to their solution, but the code to access the database is always the same grunt work again and again. We chose the code generation route. It seems we need more senior programmers writing templates then we need junior programmers performing CutAndPasteProgramming? for all the grunt work.

If we can achieve good, working, usable code for 80% of the project, that leaves us with the bulk of the time to focus on the unique (and generally more interesting) 20%.

Generated code should never be edited by hand. Either class extensions or decorators should be used to augment or change behavior of the generated code. These pieces, written by hand, are outside of the package structure of the generated code. This allows the code to be generated again.

Consider all the code required just for simple database persistence in a web app:

Overall, a "simple" web application will have many classes just to be able to save a message to a forum.

All of this can be created from one xml metadata file. I suppose the real question, though, is what would be simpler?


EditText of this page (last edited November 10, 2014) or FindPage with title or text search