Code Generation

CodeGeneration is the process by which some (semi?)automated tool (the CodeGenerator) is used to turn higher level (more abstract) input into lower level (less abstract) output.

I disagree. Reading database schemas via JDBC and generating java code is not a higher level to lower level issue. We need a better definition that encompases this sort of horizontal business.

how about: "to turn an abstract input, or a terse input, or a high level input (or any other existing input) into either a less abstract output, or a more verbose (less terse) output, or a lower level output, or, basically, some other output that doesn't yet exist."

''No, this CodeGeneration definition is good, we need better definition of higher level/lower level. In your sample database schemas via JDBC are higher level because you chose it to be higher level. If you will update database schemas from a java code then java code becomes higher level and database schemas - lower level. Highest possible level is human input. Lowest possible level does not exist, because it's always possible to generate next level from any other level. Usually the lowest level of an interactive program is set of windows on user interface screen, but even this level can be used to generate something else looping through windows handles.'' -- AlekseyPavlichenko

For Consideration as a definition ...

CodeGeneration is the process of transforming code from one representation to another.

Often, this is from a higher to a lower level:

From UML diagram to code stubs
FourthGenerationLanguage into CeeLanguage

Other times this is a side effect of DontRepeatYourself.

Generating SQL scripts to create a database from an XML representation of the schema;
Creating DataAccessObjects? by examining a database using JDBC

RefactorMe -- BevanArps

See an example of code generation in MdefExample.

The output of the CodeGenerator invariably needs further processing before it is complete. This further processing may or may not be fully automated; hence the distinction between ActiveCodeGeneration and PassiveCodeGeneration.

It's not true that it "invariably needs further processing"; there have always been e.g. compilers that produce machine code directly rather than assembly. Also, whatever finally does produce the machine code typically can be considered to be a "code generator" (consider macro assemblers if you have any doubt), so that claim is necessarily false; the recursion has to stop somewhere.

To make it more concrete: Programs that write other programs are doing CodeGeneration.

: Yes, but this is not the only form of code generation... some code generators simply produce large (and often internally redundant) data tables for other code to work with. The original definition does not artificially restrict the scope of the concept.
: These are 'Text Generators' or 'Data Generators'. If these don't generate code then they're not Code Generators (the clue is in the name)

But those data tables can be (and often are) regarded as just a different representation of code/programs.
... such as the lookup tables generated by lex/flex when generating lexical scanners: the most constructive interpretation of those tables is that they represent the specification of a certain finite automaton - i.e., a program; followed by yacc/bison's generating the code for implementing a context free grammar.

And if you do CodeGeneration, you'll want to know HowToDoCodeGenerationWell.

It may serve you well to keep in mind the anti-CodeGeneration opinions of CodeGenerationIsaDesignSmell and RuntimeReflectionIsaDesignSmell. They'll help you avoid taking a good idea to excess.

Here's a hello, world code-generator

  #!/bin/perl -w

  # read input
  my $msg = join(" ", @ARGV);

  # output code
  print <<"HERE";
  #include <stdio.h"

  int main ()
  {
printf("$msg\\n");
return 0;
  }
  HERE

we can then include it in the build process:

  #!/bin/make
  MSG = hello, world

  test: hello
test "`./hello`" = "$(MSG)"

  hello: hello.o
gcc -o $@ $^

  hello.o: hello.c
gcc -o $@ $^ -c

  hello.c: hello_gen.pl
perl hello_gen.pl > $@ "$(MSG)"

finally:

% make

Yes, its easy to break it, but I say YAGNI to input validation --DaveWhipp.

CodeGeneration QuickQuestions

Q Do people consider use of GenericProgramming and TemplateMetaprogramming as examples of code generation?

A I don't think so, since CodeGeneration can create huge duplicate snippets of code, whereas GenericProgramming and TemplateMetaprogramming won't. -- AlexBetis

Disagree, they are in fact widely considered examples of code generation. As justification for why, they take general code and produce more specific code. I don't think it's relevant whether they produce duplicate code snippets, but actually that's not entirely true either. They usually can, and do, especially since you used the word "snippets". They do not produce duplicate copies of entire functions. -- DougMerritt

The ClarionLanguage RAD tool (www.softvelocity.com) is an excellent example of code generation. Based on a data dictionary (database definition) and a set of highly configurable templates (browse, forms, reports, process) it can generate a fully functional desktop or web database driven application WithoutWritingaSingleLineOfCode?. --SergioCastillo?

I've seen that phrase or something like it somewhere on wiki before, as a warning.

Also see http://codegeneration.net

I am hoping the 80/20 rule applies to code generation, but I'd be happy with even 60/40. I work for a consulting company. As such, we're constantly writing new applications. They are all data driven. They all have different database schemas unique to their solution, but the code to access the database is always the same grunt work again and again. We chose the code generation route. It seems we need more senior programmers writing templates then we need junior programmers performing CutAndPasteProgramming? for all the grunt work.

If we can achieve good, working, usable code for 80% of the project, that leaves us with the bulk of the time to focus on the unique (and generally more interesting) 20%.

Generated code should never be edited by hand. Either class extensions or decorators should be used to augment or change behavior of the generated code. These pieces, written by hand, are outside of the package structure of the generated code. This allows the code to be generated again.

Consider all the code required just for simple database persistence in a web app:

UI - html/jsp/asp/php
Server-Side Controller - to handle the form in #1
Model object (say, User.java or Order.cs or Message.php)
Data access object interface - UserDAO is an interface defining save/update/delete. We should always write to the interface.
Standard implementation of UserDAO - persists data to and from the database on every request
StandardUserDAODecorator - Wraps the standard implementation of User DAO. Allows ease of customization of the generated code without having to actually modify the generated code directly.
Generate unit tests for each DAO implementation. It'd be nice to know if the generated code worked.
Optionally generate the ServiceLayerPattern? from Fowler's PatternsOfEnterpriseApplicationArchitecture. I rather like this pattern, actually, as it's a natural fit for the business rules, rather than the application logic.
Generate unit tests for the service layer.
Optionally generate decorators for the service layer. You don't want to hand edit generated code, but the default business rule may well be insufficient for your application.
Generate factories for all classes you plan on decorating. This factory reads from a config file. You choose your decorator by editing the file.

Overall, a "simple" web application will have many classes just to be able to save a message to a forum.

All of this can be created from one xml metadata file. I suppose the real question, though, is what would be simpler?

CategoryCoding