Code Generation Isa Design Smell

"It's such a great product," says the PointyHairedBoss, "just look at all the AutomatedCodeGeneration it does for you!"

"Dude!" I say, "you just don't understand: The need to do lots of AutomatedCodeGeneration is a flaw, not a feature."

The input to the code generator is the higher abstraction that is being converted to a lower abstraction: the output code. Why is the input to the code generator not sufficient in itself? Plus, the input and the output are a form of OnceAndOnlyOnce violation, since they have to be in synch or duplicate the same info, just in different formats.

It's not true. May be we need to clarify OnceAndOnlyOnce definition - it's OnceAndOnlyOnce of human input. A program can have as much as necessary duplication as long as all copies can be regenerated without human. Those automated duplications do not violate OnceAndOnlyOnce. Consequently code generation does not violate OnceAndOnlyOnce.

More than that, the primary purpose of code generation is to follow OnceAndOnlyOnce, because most of modern systems/languages/libraries do not allow to follow and force programmers to violate OnceAndOnlyOnce if code generation is not used.

Legitimate Exceptions: Although code generation is often a sign of poor software engineering (human-to-machine and human-to-human interface), there are some limited reasons to use code generation:

Subsumed Reasons:


Code generation smells because ...

Who says the code is that similar? Generated code shouldn't be touched by hand, ever. If you need to specialize generated classes, inherit from them. This separates hand written code from generated code.

I wrote a system to generate code from a schema (back before there were tools to do this). Methods on generated classes always returned references to the manually edited subclass. The generator would generate "stub" subclasses if there was no manually edited subclass. We used separate source directories for manual and generated code. At any time, you could blow away the generated code and regenerate it - which is what the ant full build always did. It did not need to be maintained and was not kept in source control. -- PaulMurray

Not if those templates are highly customizable via the parameters. Code generation is mandatory in a language without macros.


Code generation is good because ...


Is this code generation like Lex/Yacc/JavaCC etc? Or like MFC, UML frameworks? For the former, I would not like to call a subroutine to process a grammar every time a program is invoked. -- RobertField


So is using the C++ or Lisp language. You are copying and pasting the code into your computer. It is not your code. Using C++ or Lisp to make programs, is using code that is automated for you. It's not yours. This analogy is not accurate. Using C++ or Lisp, you use and invoke C++/Lisp code, not the generated machine code. When using code generation, like some ORMs, you use the generated code directly, and that in my opinion is where the smell is coming from.


I normally think that CodeGeneration is a sign of an insufficiently expressive language. However, I find myself on a project that has reams of hand-written identical code. Right now, ActiveCodeGeneration is smelling like lilacs. -- AndrewMccormick

Exactly, it's something we have to do to get around the fact that we don't have runtime macros ala Lisp.


If the above statement was true, "The need to do lots of AutomatedCodeGeneration is a flaw, not a feature." we'd all be writing our own operating system (think snob or irrational) instead of stealing other people's TEMPLATE. Linux, Windows, X, Unix are all templates that you took, or bought, or got some how. It was automated for you. Template is operating system. All who use Lisp, C++, those were all created by someone else. They are automating your work. You were born because of automation. Everything is based on automation.

No automation? Why not just stop using the Internet all together, and send people typewriter notes in the mail? Email is automation. It automates delivery. You do not have to code every time you send email. That is automated for you. Automation is good. Automation can be bad, because of poor design or general lack in need for automation - but saying that automation is bad, in all instances (generalization) is dumbfounded. If you use your typewriter, you are still using automation. If you use a pen, you are still using automation.

Yes, there are programs that generate code that are not good. But since everything is based on automation (even text editors and your keyboard hardware), arguing about anything to do with automation, and calling everything to do with automation bad, is dumbfounded.

When automation is bad: For example, in order to post on this message board, it would take less time to just type out some plain text, hit save, than to get a program that automated the task for you - since your brain is doing the automated work of thinking up a post. But since you are still using a keyboard, to post to the message board you are still using automation.

Even commandline is a form of automation. You do not go inside your computer and directly send electricity in it with your hands from you brain. It is automated through several devices - keyboard, processor, etc. (that you did not build, yourself). People that bash automation need to figure out, that everything, including life, is based on automation. You wouldn't be using an oven would you? How about a fire? they are all forms of automation. You don't start a fire, without a spark or a match. The spark and the match are a form of automation, since you cannot with your human bare hands, start a fire.

The future involves automation. Otherwise, you might as well live in a cave, and then start right over with the rocks, fire, tools, and hunting automation. Oh, but you'd still be using automation. So, your only solution is to do nothing.

Great job tearing into that strawman. And just think: I was in danger of not realizing that an argument against code generation is an argument against all automation, ever. Thanks for clearing that up.

{Perhaps it's a matter of what is being automated. Code generation may simply be an unnecessary step. In other words: yes, it's automation, but maybe automation of an unnecessary step.}


[Comments from WhiteBoxFramework, which originally appeared on RefactoringWithaFramework.]

By "WhiteBoxFramework," do you mean that application programmers can see the framework code, but are not allowed to change it? (...or are you talking about sample or wizard-generated code?) -- JeffGrigg

[...] I definitely do not mean generated code! -- MartinLippert

Any system based on code generation is not a framework. A framework allows one to write their own code using well-defined interfaces in any manner they want. Required code generation is a smell. It means they could create something good so worked around it.

Yes, I've often said that code generation is used by people who don't know how to call subroutines. I've met many managers and programmers who think code generation is a good thing; the more the merrier. I say that maintenance is the largest long-term cost of any project, making code generation generally a bad thing. -- jtg

MicroSoft's IDE for MFC and ATL "generates code" for you. And most CORBA ORB implementations "generate code" from IDL. Yet I would still consider these frameworks.

But, in these cases, the good sense of "generate code" is that they create a minimal "shell" or "skeleton" within which you put your code. Or, they generate standard proxy code, as an implementation technique, and you'd never change that code; if it needed change, you'd just delete and regenerate it.

-- JeffGrigg


It all depends on what type of code generation is being performed. With ActiveCodeGeneration, this is definitely false, but with PassiveCodeGeneration this is probably true. -- MikeRettig

I strongly agree that the generation of lots of boilerplate code that then needs to be customized is bad, especially when such code needs to be manually re-customized as changes are made to whatever was used as the basis for the original generation. But automated generation of an intermediate form which is then directly processed into the final form is often helpful, especially for tools such as yacc, lex, and RPC/IDL compilers. I'm not sure the terms "active" and "passive" are very descriptive of the distinction; maybe BoilerplateCodeGeneration? (or WizardCodeGeneration?) and IntermediateCodeGeneration? would be better?. -- KrisJohnson

I stuck with the only definitions that I've seen in a published work. I stole ActiveCodeGeneration and PassiveCodeGeneration from ThePragmaticProgrammer. I don't think the terms are succinct, but I can't think of anything better. Although, BoilerplateCodeGeneration? (or WizardCodeGeneration?) and IntermediateCodeGeneration? provide specific examples, I don't think the phrases provide a generalized distinction. -- MikeRettig

If you delete all the generated code and regenerate it all from sources every time, then CodeGeneration is just a performance optimization or implementation technique (possibly for portability). In that case, CodeGeneration is not a design smell. (So, the C code generated by lex and yacc are not a design smell.) -- JeffGrigg


Unavoidable code generation is a language smell. One is forced to generate code when the abstraction mechanism of the language (if any) isn't powerful enough to remove duplication. (Compilers do this for assembly language. The calling sequence conventions and idiomatic instruction usage is captured in the compiler's code generator.)

Unavoidable code generation is a system smell too. One is forced to generate configuration scripts when lots of packages need the same information but can't read the same files. (This smell is so pervasive that we just assume computers are suppose to work this way. Can you imagine a world where you just refactor the package to read one file? I can, but not in my lifetime.)


Comments wanted: I have a representation of the project's database schema. Using a home-grown tool, from this representation I generate

Whenever we need to make changes to the schema, I update the source and then re-run the tool to crank out all the new stuff, and then build the software and run our tests.

Now, is this a good thing or a bad thing? (I think it's a good thing, but I'd be glad to hear better ways of addressing this problem.)

-- KrisJohnson

Are these examples of what you are talking about?

I wouldn't worry about it too much, if it works for you.

But...

Good ideas. But it's easier to generate scripts than it would be to write programs that directly manipulate the database. It's also easy for Oracle DBAs to examine and use the scripts, rather than learning how to use the tool (and deciding whether to trust it). And it's easier to create and deploy a static HTML page than it is to install ASP/JSP/CGI. The class-generation thing is a sore point with me - I think it is unnecessary and even detrimental, but others in the company want it. -- KrisJohnson

One could argue that today's DBA departments are a ProceduralSmell? - an indication of a dysfunctional organization. And why does an organization make it so difficult to deploy a web application? -- DevilsAdvocate

And there you have human attention being split between two separate points in the process: the programmers are looking at the Oracle code generation, but the DBAs are reading the generated scripts. So if the DBAs are in a hurry, and they find an error with the script, are they more likely to a) walk over to the programmers and tell them, and wait for them to fix it, or b) fix the Oracle script by hand, forget to tell the programmers, and then trip over the same problem a week later? Better, perhaps, to focus the thinking in one place, and since code is more versatile than Oracle scripting, well ... (obviously, I'm not a DBA.)

And HTML isn't so relevant to this discussion; it's not really "code" in the sense of "code generation". It's more properly viewed as formatted output, like CSV, PDF, or XML. (See also JustAnHtmlCoder.)

Oracle PL/SQL is not a very powerful language it seems. It does not have a lot of meta ability, partly because Oracle often seems more concerned about machine performance than programmer or DBA productivity so that they can win the benchmark contests. Then other vendors are forced to copy their practices in order to keep up on speed. I would like to see a vendor push a "dynamic RDBMS" as a productivity and adaptivity tool. It could have dynamic columns, as described in MultiParadigmDatabase, and a dynamic DB scripting language as its native language for stored procedures, etc. Code generation would rarely be needed. It may not do very well on performance, but it may change the way people think about RDBMS, and some niches may benefit greatly from its dynamicy. The current "static" RDMBS are like only seeing Java and Eiffle, but never knowing anything like SmallTalk or Python existed.


I'll have to take issue with "Anything you can do by calling data driven subroutines, I can do by generating code." from the lists above.

CodeGeneration fixes the data into executable code at CodeGeneration time, while data driven code can change its behavior at run time, with just a change of the input data. Yes, you can do CodeGeneration of anything that can be done in a data driven way. But many languages' compile and link requirements, and many organizations' release control requirements will prevent you from making this approach as flexible as a data driven approach.

(Caveats: It's not unheard of to compile code at runtime. Java virtual machines have popularized this concept. And while the data that drives a data driven application is not subject to the same release controls as the source code, a good argument can be made that it should be.)

The point of that comment was that the "data-driven subroutines" and "generated code" are not really very different from one another in terms of capabilities, and there is no reason to say that one is always preferable over the other. Obviously, one cannot hard-code data that won't be known until run-time, but when the data is known at compile-time, then hard-coded solutions may be better. I agree that data-driven programs are usually more flexible than hard-coded programs, but there is often a cost in terms of performance, development time, and other factors, and those other factors sometimes make code the better choice. Use of generated code often passes the DoTheSimplestThingThatCouldPossiblyWork test, whereas data-driven programs are often unnecessarily generic and more difficult to test thoroughly. -- KrisJohnson

Data driven isn't nearly as flexible as code generation, at least in a compiled language. Code generation can give you objects, data driven can't.


It all depends on what sort of code is being generated.

On the one hand, you may have a code generator to produce skeletons. Perhaps your team has a standard for writing classes (maybe you must explicitly create constructors and destructors), and your generator produces empty classes in normal form. In this case, you don't re-generate the class; you generate it once, and work on it manually. This doesn't add to maintenance costs.

On the other hand, you may have a code generator that takes in source from another language and produces what you need, without requiring you to touch the converted code. Consider LexAndYacc. These two tools take in their own special language, produce correct and inscrutable C code, and make parsing simpler than straight C. This works well, but you have to be careful in your configuration management. In this case, the input code to the Lex and Yacc programs are source code, but the C source that they produce should not be stored as source; it is an intermediate, like an object file. Because the Lex and Yacc code is so robust, it doesn't matter that you can't hand-hack it; if there's a problem, you fix it at the Lex or Yacc input level, where it belongs.

On the gripping hand, there are two particularly nasty types of generated code. The first kind is generated by a program that takes interactive (point and grunt) input, rather than an input file in a higher language (like Lex). The second is generated code that is almost, but not quite, what you want. The former often leaves you with nasty code that you can't wrap your head around, and thus cannot safely change. The latter often requires you to regenerate, then hand-hack, every time you want to make a change. For maintenance reasons, consider avoiding either of these types of generated code.

-- RobMandeville

Generating code that is almost what you want may not be quite so bad if you can bridge the gap to what you need by inheriting from the generated classes. This only works well when you can regenerate without needing to touch your inherited classes, of course. -- FalkBruegmann

The LexAndYacc example should be on your Gripping Hand. As suggested above, it is evidence of a Language Smell. C is inadequate for parsing and LexAndYacc generate code for it instead of providing a library for C functions to call. Why do LexAndYacc generate code instead of compiling parsers into a library that you can link directly with? Because they aren't expressive enough to allow you to define a competent interface -since it's written in C. It's an Ouroboros of language smelliness.

["compiling parsers into a library you can link directly with" is in fact almost exactly what LexAndYacc do. Producing source rather than a binary is a feature - they compile to C code instead of binary, that's all. You don't run Lex once and then hand-tweak and maintain the code (which is when CodeGeneration smells), you maintain the grammar as the source to the output.]


It also depends on the nature of the code generation process.

If you treat the code generator, and the files that it reads, as part of your source code then you'll soon know if its good or not. You DoTheSimplestThingThatCouldPossiblyWork, and then apply OnceAndOnlyOnce. If you've refactored the source code, and the code generator still exists, then there's no problem with it. I've been working on RefactoringCodeGenerators? recently, so I'll create a page to summarize my results. -- DaveWhipp


[See ReflectionVsCodeGenerationArticle]

In part, the article provides strong support for the idea that CodeGeneration is a PerformanceOptimization?: about six times faster, in his example.

I would say CodeGeneration can be a PerformanceOptimization? or CodeGeneration can often facilitate optimizations. The problem I solve in the article has nothing to do with performance. The statistics were one of the last additions to the article. -- MikeRettig

One could do type checking and better error handling on the runtime reflection side. System stack traces normally don't tell you much about the data you were working on at the time.

In the example stack trace, it appears that 'SimpleFileLoadManager?$1.load' is failing while reading an integer from the stream. One assumes (from the following stack trace) that it was trying to load an array of integer LineItems? to build an instance of the PurchaseOrderLoader? class. Instead of letting the 'java.lang.Integer.parseInt' NumberFormatException? bubble up, the SimpleFileLoadManager? object could catch it and reformat the exception into an application defined exception that includes the class and property it was working on at the time. The more informative application defined exception could be logged or displayed, as an aid to tracking down the error.

The point is that you can't rely on stack traces to provide all debugging information you could possibly want. Like, when I receive a database error while loading records from a file, I want to know more than where in the program the error occurred: I want to know where in the input file the bad data occurs. -- JeffGrigg

Very true. Increased error handling could be added to either the GeneratedCode? or the RuntimeReflection? solution. IME, functionality is easier and safer to add in a CodeGenerator. My point in the article is that the GeneratedCode? gives you an informative StackTrace? for free. Sure you can use reflection to trap and handle errors, but this raises the complexity of the solution. -- MikeRettig

See RuntimeReflectionIsaDesignSmell.


I once worked on a project where all code was created using CodeGeneration. It was a complete mess - jumps everywhere, different calling conventions for different functions and all sorts of crazy things. I told the team they were crazy - how could anyone work with this? They replied that I shouldn't be programming in assembly and I should let the compiler take care of it.

CodeGeneration as "just" a shifting of the boundaries between SourceCode and MachineCode?

That anecdote demonstrates a common failure is code generators: for some reason we seem to throw the basic principles of software design out of the window when doing code generation. Projects based on code generation are not spaghetti of necessity. ExtremeProgramming appears to be an ideal methodology for code generation because it focuses on an OnsiteCustomer. The customer of the code generator is the project itself, so it is necessary to work with a customer who is intimately coupled with the engineering tasks. -- DaveWhipp

I think the anecdote is intended to show that compilation is code generation. Yes, it can produce messy (assembly) code, but it does raise the level of abstraction and can make the job easier and quicker. Remember, the compiler is just another program!

I agree with you regarding the intent of the anecdote's author. I have been known to make that argument frequently myself. More recently, however, I have discovered the benefits of applying a refactoring strategy to code generation. If the generated code is a mess, then the code-generator will be hard to maintain. There are some principles you can safely abandon in the generated code, but you must be careful (redundancy is OK; but spaghetti is generally not). It is too easy to generate code that is more complex that required, and then justify the complexity on the basis that "it's just like assembler". Unless you have a debugger that allows you to debug at the level of your source code (i.e. not the generated code), then you should err on the side of readability. -- DaveWhipp

Yes. In my view, CodeGeneration is a useful technique if we remember that whatever we generate code from has become source code. From language users we have become language maintainers. That's a more difficult job.

See HowToDoCodeGenerationWell for continuation of this discussion.


CodeGeneration may be TooDeepIntoTheBagOfTricks - and will earn you a PropellerBeanie.


What about generating code that is Hard to write, like with the ANTLR parser generator? It would seem to make that case that writing a grammar and generating the code to parse it is much preferable to writing the grammar AND the code to parse it. -- StevenNewton

yacc, bison and other compiler-compilers could, instead of generating code, generate and interpret their state-transition tables at run time, and call application processing routines using function pointers that you provide. But in a number of applications where these tools are used, performance is fairly important, so doing code generation as a performance optimization makes a lot of sense. -- JeffGrigg


Code generation is an effective bridge between visual design tools & algorithms written in programming language. These are different levels of the software process, and code generation is the hand-over of skeleton structure so the code functionality can be attached. -- ThomasWhitmore

Why couldn't the graphical models be executable? Or, could the code be attached as attributes to the graphical model? -- DevilsAdvocate

My graphical models are executable. The mechanism that I use to execute them is CodeGeneration :-). -- DaveWhipp

My project is using Rational Rose to design a framework and generate C++ skeletons. Once our designs settle down, we add functionality in protected regions of the code. We can do round trip (both forward- and reverse-) engineering; if we change the model/design, we regenerate the code, but Rose preserves the stuff we added. This works just fine for our needs; it just takes a little getting used to. I don't see how anyone can make a blanket statement like "code generate is a design smell"...


Code generation is wonderful within (at least) the following framework:

We design a new language more appropriate to a particular problem than the target language of the code generator. The input to the code generator is now the source code. The output of the code generator is merely a convenient intermediate form, and despite being humanly readable it is not source code and should never be hand-modified.

The generated output is chosen to be, for example, C-language code primarily to leverage the existing optimizer and machine-code emitter of our C-language compiler. Again, that generated C-language code should not be considered source code.

(Hint: we never check generated code into our revision tracking system.)

Yes, to the author above, rather than generate code in another language we can usually merely interpret the source language. We could also run the rest of our system in an interpreter, but we usually don't. The reason why intermediate language code generation is often preferable for our special new language is the same reason why machine code generation is preferable for our general purpose language. Efficiency.

There's no reason to be frightened of code generation, and it is emphatically not a smell. There are, however, stupid uses of code generation, just as there are stupid uses of an "if" statement. Nobody calls conditionals a smell.

I'd like to put some of the above in list form and add to it a bit. Proper CodeGeneration requires that:

Anything that should be added?


I don't think listing attributes of 'Proper CodeGeneration' can ever be right, because it's trying to either bundle at least three things into one, or to argue that only on of the things can be right. Over on AutomatedCodeGeneration I proposed that there were three kinds of code generation - one shot, round trip, and 'compiled'. I totally agree with your assertions when applied to what I called 'compiled' (aka ActiveCodeGeneration here). I would add, as others have pointed out, that best practice in this case may also be to report error positions with respect to the source, not the generated code.

However JeffGrigg argues above that (one-shot) generation of minimal 'shells' of code is also a good thing in some circumstances. And different rules must apply in this case - the developer edits this shell so it must be checked in and the code generation step happens way before the build step. This is clearly a different beast, and different criteria for using it must apply. I would accept the 'cut and paste' criticism for this kind of code generation, if it generated lots of repeated code. -- BrianEwins


In my opinion, code generation or not isn't the design smell issue. Whether you can or can't repeat the generation step, and how much it costs you if you need to repeat it, and how likely you'll be needing to repeat the code generation step - those are the design smell issues I'd worry about.

Pardon me if I state the obvious. -- AndreasKrueger


(PageAnchor: attribute_repeat)

Most generated code looks to me like a "database dump". Why not leave it in the database? It is easier to manage a bunch of parameters as a database than as code, at least for me. I can use database browsers and query languages to customize my view of that info. Would you rather edit a spreadsheet as a linear text dump? I surely wouldn't.

Performance is the only reason I can think of, but I have seen some pretty fast desk-top database engines, although they went belly-up IIRC. Most current DB engines seem optimized for million-record tables instead of the hundreds range, which is understandable because that is where the money is. Most current tools seem to assume that one is using code as a lite-duty database instead of where it should be. I think the pendulum will swing back the other way one of these days such that it will be "put as much as possible in the database" instead of "put as much as possible in code". Sure, some people may not like that, but I do. It fits my thinking patterns me better. No trend/fad favors everybody. (DataStructureCentricViewDiscussion, DataDictionary)

Codifying such also is a violation of OnceAndOnlyOnce it appears to me. Example:

 As a table:

Field Type Size onValidate ----- ---- ---- ---------- Name String 30 nameValid() Rank String 10 rankValid() Serial Number 10 foo() Location Number 5 bar()

As Code:

unit Name { Type = String Size = 30 onValidate = nameValid() } unit Rank { Type = String Size = 10 onValidate = rankValid() } unit Serial{ Type = Number Size = 10 onValidate = foo() } unit Location { Type = Number Size = 5 onValidate = bar() }

(This example is subject to frequent TabMunging. In fact the whole goddam page is. Tabs be damned!)

The tabled representation does not repeat the attribute names over and over again (a OnceAndOnlyOnce sin). For example, "Size" appears only in one place in the table representation, while it appears 4 times in the coded version. Plus, I find it much easier to see row-wise and column-wise patterns in the tabled version.

The reason I keep mentioning "to me" is because some have admitted to preferring the code version for some odd reason. It may be subjective, I don't know. Tablizing it just works for me.

However, you can achieve minimal duplication using OO:

 class unit:
 def __init__(self, Type, Size, onValidate):
  self.Type = Type
  self.Size = Size
  self.onValidate = onValidate

Name = unit(String, 30, nameValid()) Rank = unit(String, 10, rankValid()) Serial = unit(Number, 10, foo()) Location = unit(Number, 5, bar())

Your "nodes" are not connected, I would note. And, it would be easier to type it into a table browser in my opinion. For one, the columns line up automatically. Note that a variation of the above could look like:

  function unit(Type, Size, onValidate) {
 r = createArray();
 r['type'] = Type;
 r['size'] = Size;
 r['onvalidate'] = onValidate
 return(r)
  }
  Name = unit(String, 30, nameValid());
  Rank = unit(String, 10, rankValid());
  Serial = unit(Number, 10, foo());
  Location = unit(Number, 5, bar());

- - -

What about the headache of putting the stuff into "the" (not! multiple instances) database, and the configuration management of doing so, and the "DBA may I have one or two tables" of doing so? I've worked on maintaining several projects that wanted to put everything possible into the database, no matter how non-volatile the information was. My thought after one or two updates was "Welcome to Hell".

This may be a case of DbasGoneBad. I have used "nimble table" tools, such as FoxPro, that made making such tables a sinch. Unfortunately the OOP fad appearently killed off the acceptance of such wonderful tools and techniques. People are afraid of tables these days because of the human overhead they now have. It's a crying shame. --top

There was a really good comment (or two) above about "generator's input is THE SOURCE" / don't check in generated output, and that it's faster.

OTOH, about the 7th time you change something, maybe you should put it into a database, AND give the user a proglet to tune whatever it is themselves.


The one instance that I have run into lately where code generation can be a GoodThing: Creating EJB interfaces with XDoclet. (http://xdoclet.sourceforge.net). This has been good, because generating interfaces is not truly important work, and I don't even version control the interfaces. Rather, ANT generates the interface code every time I build the application. While I consider this to be good, most J2EE tool vendors and specification leads could also realize that CodeGenerationIsaDesignSmell.... -- ChadThompson


Code generation can be a very good thing when dealing with language boundaries. For example, generating a Java proxy for a database stored procedure or SOAP interface. -- JeffDrost

A mirroring wrapper is perhaps a violation of OnceAndOnlyOnce or YagNi. If you look at such code, most of it repeats a theme over and over. The duplication itches like insufficient abstraction. The duplication should be compressed out somehow. I think dynamic languages are better at this, but this risks starting a HolyWar about static versus dynamic typing. Perhaps we should study some generated code together.


This page touches on many of the issues found in EffectiveCodeGeneration.


I spend a lot of time in some projects in building code generators to take away the grunt work of programming. Examples would be generating stored procedure templates from tables, generating calling code for stored procedures, etc. The amount of time thus spent is always worth the savings the team gets from the generated code.

In maintenance projects I also write a lot of scripts to generate scripts. I need to do this when I do not have direct access to the system I have to maintain and I need to run scripts based on some configuration.

A perfect example: In one project I needed to log all inserts/updates/deletes for some tables in an audit table. I wrote a script that will accept the table name as a parameter and generate the trigger based on the table metadata. I spent about 3 hours writing and testing it. The three hours was worth it as there were 80 tables where this trigger was required and furthermore I was able to use it in subsequent projects when required.

In projects I make a concerted effort to identify areas that will benefit from code generation. -- HemantSahgal?

Ideally, there would be an "event level" that would be all tables. It is analogous to an "on_key" GUI event which can listen to key/mouse strokes for any widget on a form, or even have it at the application level, not just the form level. A status structure of some sort often tells which widget/form/table/etc. is the target of the event.

But lacking that, can't you call a stored procedure or some central routine with only parameter values that are different per table? It sounds like you might be dealing with a limited language if you have to truly replicate the entire logging function for each target table. I agree that limited languages or tools may force one to replicate to some extent, but I don't know your particular environment.


The code that you use to code your program, is automated. In order to get away from automation, you'd have to build your own computer from scratch. Otherwise you are automating. And there's no way you'll be able to just go and buy a keyboard at the store, to get this computer of yours running. You have to build it yourself. Get real, automation is the reason everything exists.

There is a difference between repeating a concept and repeating code. If you have 20 sections of code that are all 80% identical, that is a strong hint that more abstraction is needed in your design, or perhaps your languages is not "powerful" enough. Why can't you create a subroutine or method with parameters instead of repeat the code 20 times? The point at which duplication of similar patterns is no longer tolerated varies per individual.

[The language not being powerful enough is probably the main reason for using code generation as a technique. Repeating similar code is generally necessary only when the differences can't be factored out in the language you're using - for instance, type differences in early C++. Since C++ compilers have started really implementing templates, I've had much less use for code generation in C++. In a language like Lisp, I'd think that you would never need a code generator, since the macro feature is so flexible.

Of course, in a very real sense, both C++ templates and Lisp macros are code generators - written using powerful built-in features of the respective languages. Hmmm. -- DanMuller]


Code generation of interfaces that other code depends on especially smells.

Let's say I automate generation of interfaces and default implementations for classes representing types I define in an XML Schema. I just give it the schema, and the tool takes care of the rest. Dependent code talks just to the interfaces.

Now let's say I use ActiveCodeGeneration for the interfaces. They are wiped and recreated for each build.

Here's the kicker: what about IDEs that passively validate and compile code as you type, providing UI artifacts displaying lists of methods, parameters, return types, fields, etc.? As soon as start writing the code depending on the generated interfaces, the IDE will complain that type does not exist. No auto-completion will be possible.

Sure, you can generate the interfaces only, but every time you do a clean, you have to remember to regenerate these interfaces. Such a task is not doing a build, per se. It's an intermediate step.

If the generated interfaces go into an archive, it's even worse. Some IDEs on some platforms will not let you overwrite archives you specified as a dependency for the project. The auto-completion process has some sort of file lock on the archive. Thus, when you complete the dependent code and want to do a build, the build will ultimately fail due to an I/O error.

Surely this indicates a bad smell coming from the IDE?


Code generation is a way of writing a compiler for a MiniLanguage? without most of the work that goes into writing a compiler. One problem which can arise with it lies in poorly defined interfaces between the code in the MiniLanguage? and the code in the main language. If the generated code gets used as if it was code in the main language, it creates hopelessly hairy interactions. It's like compiling chunks of C where the resulting assembly gets macro-substituted into assembly files. Defining the possible interactions between the generated code and the main code strictly makes a big difference.

Consider an interpreter instead of a compiler then.


I just sat through a 2-hour presentation on a 'software delivery process methodology' that shall remain nameless. Needless to say, it was huge (4+ Mb of HTML or four big ring-binders) and had all the usual process diagrams, roles, list of deliverables (one that made the button on the scroll bar shrink in a frightening way ... and one that didn't seem to include executable code as far as I could tell) and templates for those deliverables.

One of the project managers in the room noticed me rolling my eyes. He knows I'm a bit of an 'extremist' and asked me afterwards if I thought it was all bullshit. I said I did and he agreed with me. "But", he said, "managers of companies we consult to expect to see this sort of stuff. They don't read it but they do want to see it. It may be bullshit, but if we adopt it, we don't have to write that bullshit ourselves. Even better, these templates allow us to automatically generate the resultant bullshit the method requires. So the customer's managers are happy because they have the requisite weight of bullshit and we're happy because we don't have to do much to produce it."

An example of DoTheSimplestThingThatCouldPossiblyWork, I wonder? ;-)

-- PaulDyson

I had a manager once that told me there are two types of managers: one is concerned with progress of the checkmarks, the other is concerned with what the quality of the work that was produced to satisfy a checkmark. Most are the first type. (see CargoCult)

Code generation is being used to satisfy the red tape b.s. that many companies require, no-one reads it, but clients want it, so code generation comes to the rescue. You can generate b.s. documentation just as well as anything else, the design smell isn't code generation, it's the bureaucracy that requires all that unnecessary paperwork. Code generation isn't a design smell, bad code generation is a design smell, and there's a world of difference between the two. This is a completely different topic, though I'm not it's author, I agree with it's point, you simply can't do business with many big companies without producing lots of paper, because it's the paper they're paying for, it gives all their mid-level managers a way to justify their jobs, and keeps them busy with meetings and revisions; they have to have something to do. -- RamonLeon

Isn't that called "boilerplating"?

EditHint: I moved the main points together, but the discussion could be moved to a separate page.


Alternatives

Code generators get you working code fairly quick and early. Maybe a good framework would alleviate the need for such, but good frameworks are usually difficult to get right the first time and require a lot of domain knowledge. Alternatives include:

To do functional style programming in .NET (usually in static framework methods that operate on families of classes) involves heavy use of reflection. Reflection is a horrendous performance bottleneck and code-genning up the exact 'lightweight' object with all the properties you need is far faster and is done at compile-time (instead of run-time). I view code-gen as a substitute for reflection. Why slowly reflect on a big object and pick out what you need at runtime when you can compile exactly what you need? --BrianG

Perhaps there are other ways to do something similar. May we request a UseCase?


I used to try to use DataDictionary techniques to map DB columns to forms or variables or maps for in-app processing. However, the languages, team familiarity, and tools just don't support that technique very well, especially with regard to handling those fields that don't map one-to-one. Thus, I've moved to using existing schemas to generate conversion function calls, similar to those under HelpersInsteadOfWrappers. Suppose something like this is generated:

  ...
  makeColumn(handle, obj.sourceX, "destinationX", myType ....);
  makeColumn(handle, obj.sourceY, "destinationY", myType ....);
  makeColumn(handle, obj.sourceZ, "destinationZ", myType ....);

If I need to custom-diddle something, I can just do this:

  ...
  makeColumn(handle, obj.sourceX, "destinationX", myType ....);
  temp = obj.sourceY . fiddleWith(obj.sourceB);   // append stuff
  makeColumn(handle, temp, "destinationY", myType ....);
  makeColumn(handle, obj.sourceZ, "destinationZ", myType ....);

It is possible to do such fiddling with DataDictionary frameworks, but requires pretty fancy frameworks that confuse newbies. The above is generally newbie-friendly and doesn't require a new hook be put into the framework. Thus, in an imperfect world, sometimes code generation is not a bad thing, as long as the code it generates is not unnecessarily verbose. One row per field is usually sufficient.


My company uses code generation, and I feel they made the right decision. The problem is that we made an engine which runs a program in a new language, a very domain specific language. We sell the engine and allow customers to write programs in this language. We have the engine written in C++ and we have most of the tools written in Java. We want the language to be expressable as XML, to be expressable as an object graph in C++ and an object graph in Java.

It's basically the serialization problem except that we want the objects to be in Java and C++, and code generation from like a Rose model to Java code and to C++ code is the only sane way to do it. (The key IMHO is keeping the model as small as possible, something which isn't being done, but that's a discussion for another time.)


Generating code that has to subsequently edited by hand is bad.

E.g. Microsoft Wizards are bad. The generated code is often hard to understand. Worse, if they code generator is changed, and the automatically generated code cannot be regenerated because of the hand changes made... that's bad.

Hand editing or customization of generated code is not so bad if it does not interfere with regenerating code. E.g. if the hand edits or cuistomizations are not blown away by the regeneration.


So many of y'all are ragging on code generation. Why use code generation when a denser, more abstract description would do the job?

Because your boss doesn't want to hire people with specialized skillsets. They want to stick with JavaProgrammingLanguage? or CeePlusPlusLanguage?, because there are lots of programmers out there who can read and maintain it. Unfortunately, neither of those languages would be described as "dense" or "abstract" without having to build a few layers of classes, first. So, either sit down and start banging out boilerplate code (copied and pasted from other projects, tweaked as appropriate for this project) or write something dense and compact and let a code generator write it. Ideally, you'd revision-control the dense, compact version, but you may not want your boss knowing that your code generator created that pile of code.

I have been ordered to stop factoring at one org I did contracting for because it was claimed factoring made it too difficult for PlugCompatibleInterchangeableEngineers to come and go, and they cited past examples. It's the way of the industry, for good or bad. Rather than fight it, perhaps we should embrace it and find the best way to leverage code generation. If you can't beat 'em, join em, and join 'em using the best techniques possible relating to it.


See also CodeGeneration and AutomatedCodeGeneration, MetaRefactoring, CodeAvoidance, HowImportantIsLeanCode, LispMacros


CategoryCodeSmell, CategoryDiscussion, CategoryAbstraction


EditText of this page (last edited November 27, 2013) or FindPage with title or text search