Incremental Integration

For a small project, the root principle of ContinuousIntegration works great. The script that integrates runs all your tests first. It won't integrate if any test fails.

As your project gets larger, that system breaks down. You can't pour liquid nitrogen on every CPU and overclock it, so your integration gets slower. That makes developers pause before integrating, and that, in turn, makes them less Extreme and Agile. You should integrate every single time you want a "checkpoint" to revert to. Ideally, you should integrate every line-item of every refactor.

So you install CruiseControl (or the equivalent), and you tune your integration scripts to do these things:

But this leads to a new problem - WhoBrokeTheBuild?!? Handle build breaking like this:

The "IndexOfSuspicion?" is your mental model of whether to IncrementallyIntegrate, or GrandWazooIntegrate. If you changed a common resource that all tests use, if you renamed a global variable, if you added a migration to the database, or if your colleagues have (as usual!) broken the build, you must GrandWazooTest? to integrate.


TODO: give this stuff a different name because it's not II - it's just the root principle of VersionControl itself...

You can't afford the cost of IntegrationHell. But neither are your developers skilled and disciplined enough to adopt FrequentReleases. Most SCM tools exacerbate the problem by requiring you to lock your source, and to coordinate your versioning on a file-by-file basis; the former means only one person can be dealing with integration issues at one time, and the latter means integration requires a lot of intrusive record-keeping in or out of the source.

Therefore,

Use a merging, project-oriented SCM system rather than a locking file-oriented one. Two good examples of this are ClearCase and CyclicCvs. The merging tools pretty much require developers to keep up to date with the changes their peers introduce into the source. They still permit baselines and controlled integrations, but they let developers see exactly who is doing what to which parts of the code, they keep all those integration records in their own databases, and they allow you to version whole projects in a wieldy manner while still allowing code sharing between multiple projects.

This doesn't replace the discipline in FrequentReleases ... but it is an easy path toward it. -- PeterMerel

Even on conventional SCMs, the best practice seems to me to be that each developer does a small amount of work, then integrates on his local machine, then releases to the group. The effect is that everyone is only a version or two out of phase on most everything. I've never done this with the net of unit tests we have on C3, but even without it worked pretty well. I'd definitely try this approach, with zillions of unit tests, next time I'm faced with a real SCM. --RonJeffries

I didn't know it at the time, but even using conventional languages and SCM, I've always guided my projects toward frequent integration and frequent releases. I guess I have high sensitivity to IntegrationHell. To me, those days are so bad, and so unproductive, that they're worth avoiding. And if you have solid ongoing testing, I see no upside to waiting longer between integrations. Does anyone else? --RonJeffries

One case springs to mind. CyclicCvs gives you the ability to branch your entire development from some past stable point ("MagicBranchTag?" ... go figure). You'd do this when, for example, some customer of a long-ago release asked you to please fix this one little bug but don't force them to upgrade to the current release. Or the current release isn't stable yet ... you know the drill. What you want to do is plan to merge the work back into the main line of development, and make certain you plan when you're going to do this. That way you'll still experience a little IntegrationHell, but you can scope just which circle of hell this will occur on.

I should say, though, that CyclicCvs is not an unconventional SCM - it's very popular both in freeware (linux, freebsd, ...) contexts and in most of the large and small corporate developments of my experience. It supports distributed development, secure checkin via ssh, anonymous checkout, overlapping projects, email notification and so on, runs on anything that opens and shuts, and it scales well too - a chap on the CyclicCvs mailing list claims to be using it to control a military development that involves more than a Gig of source - 22,000 source files. Your tax dollars at work. Don't know if I believe that one, but FreeBSD is something like 40 Meg of source with hundreds of developers around the world, and CyclicCvs seems to handle that fine. -- PeterMerel


The sentiment "IntegrateEarlyAndOften?" is an oft-repeated and hard won piece of advice preached by almost all "experts" and people "in the know" on this subject. It is pretty much as Ron describes: you do a small amount of work that is logically cohesive and coherent, and then integrate (though not necessarily on the local machine - that can depend on various aspects of the SCM tool and how it manages workspaces and repositories and when things in them become visible to whom).

This can be achieved both with and without locking, with and without branching (which I think is what is meant above by "merge-oriented", see http://acme.bradapp.net/branching/ for more details about branching) and with and without numerous other SCM tool features like: change-tasks (also change-sets and change-packages), viewpathing, dynamic version selection, activity streams/folders, ad infinitum.

The various SCM tools (like ClearCase, or CyclicCvs, or PerForce?, or Continuus, or other SCM tools) and their features can make it more or less convenient to increment early and regularly. Whichever way you do it, the SCM tool features used to achieve it can have a dramatic effect on the development style regarding builds and baselines and releases and testing.

This can be good or bad (or just plain different) depending on your personal POV, but I dont know of many who would debate the wisdom of incrementing as early and as frequently as possible. The main issues are finding suitable granularities for what constitutes a "logical change" and the "period" (or interval) between integrations/releases to set the "project rhythm," as GradyBooch and JimMcCarthy refer to it. This will almost always be context-sensitive to the given project and organization.

Sometimes integration will be daily, as in DailyBuild or NightlyBuild. Sometimes it will be weekly, or even bi-weekly. Sometimes it may even be multiple times in a day (maybe even for geographically distributed remote development).

Sometimes a logical change is a single fix or feature; Sometimes it is a patch, or single edit to a single file; Sometimes it is all of the above.

Whatever the "right" change-granule and integration-interval are, we have to strive to find it for each project, make it as small as manageable, and remember to keep tabs on it, tweaking it as needed. Their presence and regularity is one valuable indicator of project health and serves as a forcing function that drives the project forward (at least GradyBooch, JimMcCarthy, SteveMcConnell, AlanDavis?, and many other published "experts" seem to think so ;-). --BradAppleton


See also ExtremeVersionControl


EditText of this page (last edited January 29, 2010) or FindPage with title or text search