Extreme Version Control

A lot of developers dislike VersionControl systems. They see them as necessary evils that are to be used only to help reign in chaos between multiple developers or as a safety net when something goes wrong.

I don't see things that way. For me, the VersionControl system, like UnitTests, is a tool one uses constantly so that one can work faster. Proper use helps one to understand the code, and helps one to monitor and improve the development process. It is indispensable to me; I've gotten to the point that I can't work well without one.

Here is a set of practices I'm calling ExtremeVersionControl, because I think it follows the spirit of ExtremeProgramming (XP) and fits well with its practices. Like XP, it is most appropriate for small-to-medium-sized development teams on average projects. It focuses on empowering programmers to do programming, and de-emphasizes documentation, configuration management policies, and other activities that are not directly related to creating working software. It may not be appropriate for huge teams nor for those with specialized documentation and record-keeping requirements.

The most important practices are

Commit Changes Frequently (CommitEarlyAndOften)
Review Differences Before Each Commit
Make Changes Fearlessly

The other practices support those basic practices, or are desirable practices that naturally arise as a consequence of the others.

This is what I do with VersionControl, and why.

-- KrisJohnson

Use CVS

I use ConcurrentVersionsSystem (CVS), because it is free (in terms of both price and freedom), because it is lightweight, because it runs on multiple platforms, because it works over the Internet, and because it is not tightly integrated with other development tools.

These attributes are important because I need to install and use the system wherever and whenever I want it. I don't want to argue with a manager about the cost, nor wait a few weeks for the purchasing department to get me a license.

CVS is easy to set up, and easy (for me) to use. Its feature set is minimal, and it doesn't surprise me. That's what I want.

I also like the fact that CVS does not lock files. Some other version-control systems require that users lock files, preventing others from modifying them - this does not work well in conjunction with CollectiveCodeOwnership. (CVS does provide "reserved checkouts" for those who want it, but the feature is disabled by default.)

(Because of my CVS-centric view, the remainder of this article uses CVS's terminology and operations. If you want to apply ExtremeVersionControl to another system, adapt accordingly.)

Or, Don't Use CVS

The converse of this is that CVS may not always be the best choice, and even in the many cases when it is the best choice it may not be ideal. So don't just assume that CVS is always the system to use. Survey what else is out there, work to make better version-control products, and continue to improve them.

CVS embodies a lot of assumptions about software-development and version-control processes, and makes some operations much more difficult, risky, or slower than need be. CVS does not have good support for refactoring, especially in languages (such as Java or Python) which encode content information in filenames and directory structures.

One must manually and explicitly add files to or remove files from the repository. CVS will not examine the set of files in your working directory to determine what belongs in the repository. If one uses emacs, this is actually not such a problem; it's easy to get it so adding new files is just a few keystrokes.
CVS has little support for renaming or moving files between directories. It can be done, but it is non-trivial. (SubVersion solves this problem. -- ApoorvaMuralidhara)
There is no way to identify a discrete set of changes to a project's files. One can tag before-and-after versions, and put comments on the individual files, but a "transaction" is not identifiable.

(SubVersion solves this problem too! You can checkout/export the repository as of any checkin. . -- DanNovak (my favorite feature)

CVS seems based on the premise that disk space is expensive and bandwidth is cheap. In modern development, the opposite is often the case.

In Any Event

I don't want to have some machines with version control and some machines without. I also want to be able to put it on customers' machines when appropriate. So a licensing model (or a large enough purse) that allows the client software to be installed everywhere is essential.

And I don't want to mess around with a "suite" of tools that force me to adopt an unnatural work-flow model or development methodology. Integrating version control with the bug-tracking system and/or project scheduling package always ends up being more annoying than useful. And I don't want to worry that the company making the product will be bought by Microsoft or Rational, and will turn the product into another bloatware item in their marketing strategy.

Commit Changes Frequently

For some people, "frequent commits" means once a day (or even once a week). That's not what I mean - I mean frequent. I commit after every passage of a UnitTest, after any individual refactoring, or at any other point that the code is working but differs from what is in the repository. It's not uncommon for me to commit changes every five minutes during a heavy refactoring session. Even when going slowly, I commit several times per day.

If I go for more than an hour without being able to commit any changes, I treat that as a "smell". Something must be very wrong with the code, or very wrong with my approach.

I never check bad code into the repository. I only check in code that I know is at least as good as what is already in the repository, meaning that it passes all UnitTests it did before my changes and it has no new flaws.

This practice goes well with ContinuousIntegration and IncrementalIntegration. When my colleagues and I are all committing changes several times per hour, integration conflicts are found and addressed very quickly. In those cases where frequent commits or big refactorings would interfere with others, I'll do my work on a branch and merge it into the trunk when appropriate.

CheckDiffsBeforeCommit

Before checking in, I always do a "cvs diff" operation. This shows all the differences between what I am about to check in, and what is already in the repository. I carefully examine those differences before doing a "cvs commit".

This review is helpful in that it lets my PairProgramming partner and I see, all at once, what we've done. It lets us analyze all the changes, which increases the probability that we will remember what we did later, and that we'll notice any mistakes or omissions. It helps catch those "TODO" items that sometimes get left lying around. It catches unintentional changes. And it reminds us of the changes so that we can provide a useful and complete log message when we commit (I always need help remembering exactly what it was I did five minutes ago).

This practice is very valuable when using tools that feature "wizards" or "designers" that will change or corrupt your code without you noticing it.

When reviewing, I pay particular attention to any changes to interfaces or changes to functionality that other team members are relying upon. If the changes could break other peoples' code, I talk to them about it before committing.

Most books about version control espouse the virtues of being able to find the differences between your January build and your August build. But I find the set of differences between now and half-an-hour ago to be much more interesting and more understandable.

"This will take too much time and slow things down," you say? Yeah, well, so do UnitTests. Do it. Always. It will pay off.

This practice, in conjunction with Commit Changes Frequently, is the heart of ExtremeVersionControl. It is an automated way of taking notes, and reviewing those notes continuously. It helps me to pay attention to what I am doing.

See also ReviewBeforeCheckin. That page recommends a full-fledged code review before each check-in, which may improve quality but would also slow things down even more.

Make Changes Fearlessly

The great thing about a version control system is that, no matter what I change or to what extent I change it, I can always change it back.

So I'm never afraid to change any files. I just do whatever I think I have to do to whatever files I think need changing. If it doesn't work out, undoing my mistakes is trivial. This encourages attempts at BigRefactorings.

Keep the Repository Open

Many organizations put the repository under the control of a SoftwareConfigurationManagement team or other gate-keeper, who will enact draconian policies to restrict access to the repository. Such organizations require that programmers get permission before checking out or modifying files, and may require an odious amount of record-keeping.

"... who MAY enact draconian policies ... MAY require that programmers get permission": CM professionals and teams are not necessarily synonymous with locking, restricted repositories, unnecessary record-keeping, etc.

There may be times when this makes sense, but I've never seen one. CollectiveCodeOwnership by the developers has always worked well.

I don't like "locks" or "reserved checkouts" or any other feature that prevents developers from accessing or changing files. Many developers have a fear of overlapping changes, requiring a manual "merge", but in practice I've never seen this be a serious problem.

I do enable whatever logging capabilities the repository server has, to keep a record of who has done what. Every user of the repository should have a unique account name.

Notwithstanding the above, some organizations may have security concerns or other reasons to keep some individuals from accessing portions of the repository. When this is the case, assign access rights on a "need to know" basis, and not based upon whether the person is deemed smart enough to use the repository correctly. (Don't give stupid employees read-only access. Just fire them.)

I'm stuck with a client right now that does exactly this. Of course, you and I think, "If you have stupid employees, fire them." but that's a little above and beyond these folks. The client uses MKS Source Integrity which, on the whole, isn't too bad. However, developers can only add, check-out, and check-in. We can't move, rename, delete, or in any other way alter the structure of a project. Also, the ominous CM group you allude to above must do all label and baseline operations which is a total pain in the rear; especially when writing Java code. My directory structures change frequently during the beginning of a project.

So, here's the $64,000 question: How does one go about convincing a company like this that these practicies are counter-productive? Obviously, there's the factor of people possibly losing their job if it should ever be shown that the CM team isn't really needed. -- JeffPanici

One approach: keep notes about when you are losing time, and how more open access would have helped. Get other people to do the same, if you can. Of course, if the CM team aren't going to listen to the facts, you're hosed and should probably go work with a different client that isn't sabotaging itself.

What a coincidence! I started doing this [tracking lost time] right after I posted here. My client lost nearly 24 hours to their own arcane policy. This is time lost asking for moves, renames, and deletes alone. An additional annoyance is that "normal" developers cannot do builds. They have a cute little web form that one must fill out to schedule a build. This client is nowhere near large enough [from a development standpoint] to require this kind of procedure. Further, these are builds into the development area! I looked into their official build procedures document and discovered I could automate 100% of it via Ant. After three hours of work I showed a few of the developers and managers. Their response? "Oh, we don't do that here. The CM team knows how to do builds." When I pointed out it completely eliminated the need for CM to do builds, their eyes glazed over. They also didn't seem to care they lost 24 hours for things that should have taken me no more than five minutes.

Obviously, your final bit of advice is being strongly considered. -- JeffPanici

Don't Put Change-related Comments Into Source Files

One can use the repository to find the differences between file versions, the dates and user account and log messages recorded with each commit, and anything else you need to know about what the changes. So I don't clutter source files with a "Revision History" section or with comments describing what has been changed or who did it. Especially that it may make the source files that contain the "Revision History" recompile after every commit/update.

Use Plain Text Files Whenever Possible

I'm a big believer in the PowerOfPlainText, and ExtremeVersionControl amplifies that belief. With plain text files, I can get meaningful diffs between file versions. And in that nightmare scenario where CVS stops working for some reason, everything will be fine because CVS stores its version-controlled plain text files as plain text files (with some other plain-text bits added), so you'll have little trouble getting at your files.

This leads me to gravitate toward development tools that use plain text files exclusively, and to stay away from tools that use binary files or which don't store source in separate text files. This has kept me away from some otherwise-useful tools for professional work (SqueakSmalltalk and QuartusForth, for example), but the benefits of plain text files are just too great for me to give up.

(Note: QuartusForth stores its sources files in plain text. Hate to say it, but ANSI Forth requires plain-text sources. I think what you're looking for is a revision control system that works with Palm Pilot's database files as though they were plain-text. You're conflating how Palm OS implements its persistence mechanism with what Forth expects.)

(Note: SqueakSmalltalk has an awesome tool called a ChangeSorter? that when combined with MethodDoesOneThing? makes integration a breeze. Most integration complexity comes from separating the changed from the unchanged.)

DolphinSmalltalk can keep class source in individual text files, for version control and differencing. And, as you'd expect with Smalltalk, there are browser-based tools that can show you the history of the current method. e.g. This is because DolphinSmalltalk has its own version history system, where every change you make to your image is recorded in a text file. This gives you much finer granularity than file-based systems. It's literally possible to ask your change log "What changes were made to the structure of class X between these two dates?"

Use One Version-Control Repository for All Files

I don't use the repository only for source code. I also put specs, documents, diagrams, configuration files, test scripts, and any other file that is related to the project into the repository. Having everything in one place, and one set of tools for saving and retrieving those items, is very useful.

It is especially valuable to keep UnitTests alongside the programs they test.

I put everything into a single repository. Putting Java code in one repository and C++ code in another, or putting Unix code in one and Windows code in another, is counterproductive.

At my current project, we can take a clean machine, install all relevant software, do a full get from source safe, and hit compile and deploy. The entire product of our project (including discovery documents) is in a VSS repository. Which is backed up. One place for everything.

I must disagree with this practice, as stated. Yes, a single project should have everything kept in a single repository. However, you should not keep multiple projects, even if they're related, in a single repository. Where I work, this is a perpetual problem. Keep your projects separate and modular, just as you keep your classes separate and modular. Rely on the interfaces between projects. It makes ConfigurationManagement go much more smoothly, particularly when you have multitudes of different configurations you need to test/deploy/sell/...

I think the issue here is what a repository is. In some SCM systems a repository holds multiple projects that are 100% seperate from each other, just like a Database Server holds multiple databases. In that case using the repository for everything is sound advice. ConfigurationManagement becomes concerned with which of those projects to check out, and which Tag or Version of each do we check it out at.

If on the other hand the repository does not work like that, then it is perfectly reasonable to use multiple repositories, but they should probably be hosted by the same server, or otherwise easily discoverable.

Use Version-Control to Synchronize Machine Contents and Configurations

In my work, I tend to use several different machines at the same time. CVS helps a lot with this. After making changes to source files on one machine and committing them, I can just go to each of the other machines, type "cvs update; make", and then all the machines have the same software.

In addition to synchronizing software, I also use CVS for configuration files and all those little useful scripts I have lying around. Like I said, I use a lot of different machines, and life is easier when they all work similarly. So when I have a new machine, I install CVS, I check out my personal files (my .emacs file, .bashrc, a few Tcl, bash, and batch files, etc.), and then the machine is set up the way I like it. And then when I modify one of my files (adding a new macro to .emacs, for example), I can easily propagate that change to all the machines.

Use the Command Line

I sometimes use the CVS web interface to view a file in the repository that I don't want to copy onto my hard drive. And I'll sometimes use a GUI client like WinCvs if I can't remember the command-line options for some operation I want to perform. But in general, I use the command-line client interface to CVS.

I've found that the GUIs are more difficult to configure, and are more likely to lock-up or crash during operations, than are the command-line tools.

I also like the feedback provided by the command-line tools, because it is easier for me to tell exactly what is going on. This increases my confidence that bad magic is not happening in the background. And I can pipe the output to a file for later viewing.

Integration with editors and IDEs is a nice little bit of chrome, but not really very useful. Emacs's VC mode is nice, but I can live without it.

[I use TortoiseCVS, an Explorer shell extentsion, on Windows. I find it a lot more convenient than mucking with the command line. I've had a ton of problems with the WinCVS GUI and generally consider it to be crap, TortoiseCVS is fast, stable, and feature rich. It's also an excellent way of getting people who're scared of the complexity of CVS to use it.]

Share These Practices With Other Developers

As with ExtremeProgramming, these practices can backfire if you are the only one doing them. If other developers expect only "release code" to be in the repository, are unnerved by seeing so many frequent check-ins, or get confused by seeing a bunch of non-source files, then these practices will hinder progress. So I try to get everyone on the team to understand and adopt these practices.

If there are team members who are uncomfortable with VersionControl systems, then I show them how easy it is to use them.

Use Other Best Practices

There are some other obvious practices to put into play, which I have not included above. But because they may not be obvious to anyone but me, here is a short list of things I assume everyone knows:

Backups -- Back up the repository files to tape/CD/whatever at least once per day. When you are changing things so often, you increase the chances that some hiccup will corrupt the repository. It is also a good idea to check out the entire contents of the repository on at least one machine, so that if the CVS server machine dies, you have at least one online copy of everything.
Give Useful Log Messages -- When checking-in a file, one can provide a textual comment describing the change. Provide useful information describing what feature was added, what bug was fixed, or what refactoring was done. Do not provide information that is otherwise available from the repository (e.g., which lines where changed, who made the change, the date, etc.)
Labels -- Affix meaningful labels to all files at significant points in the development process.
Always Build Software from Pristine Repository Sources -- Whenever building an "integration build" or "release build", ensure that it is built from scratch from unmodified source files, recently retrieved from the repository. Affix labels to the source file versions used for any identifiable build. Do not build on the same machine that is used for development, or at least build in a separate directory from the one you use for development (and rename or hide the development directory somehow when doing builds).
Put All Source Files Under Version Control -- Nothing required to build, install, or configure the system should be excluded from version control, for any reason.
Do Not Put Derived Files Under Version Control -- Don't put the "target files" (object files, executables, output from IDL compilers, yacc/flex, etc.) under version control. Any file that can be constructed from another is not a source file, and there is no reason to clutter the version control system with them. As an exception, you may want to put the "release builds" into the CVS repository, but I find it is best to store them elsewhere.
- In some cases, putting derived files under version control makes sense. If a derived file changes rarely, is expensive (in time) to rebuild, and doesn't depend on other parts of the system; this is an appropriate choice for versioning. If an object is delivered by a different team with different tools (or a completely different build environment), this is also an appropriate choice for versioning. Examples of derived files that might be versioned include: FPGA files for embedded systems (may take hours to rebuild), third-party libraries, even if source is available; components developed by other parts of the organization (so you can control when to integrate them). Any versioned derived object ought to be noted in project documentation
Communicate With Other Developers -- The version-control system is not a substitute for communication. Don't assume that anyone else will notice the changes you have checked in nor that they will understand all the ramifications. Talk to people about what you are doing.

''There are extensions you can make to your version-control system to strengthen its role as a communication tool; see the chapters on Watches and Commit Emails in the CvsBook.''

At TwistedMatrixLabs? we use CvsToys, configured to not only mail a set of diffs with every commit, but to announce them in real-time on the developer's chat channel as well. We've found there's a significant social aspect here; it's gratifying to see your completed work announced to the forum in which it may have been just discussion before, and reaction from your fellows is often immediate. Project leader GlyphLefkowitz half-jokingly said that this means we don't need to make the version-control system automatically run the unit tests now, that "everyone should just update and run the tests as soon as they see a commit," and if the tests fail, you know who to yell at.

-- KevinTurner?

A more general version of this can be found at http://cia.navi.cx/ . There are more than 300 projects using this service now, and it provides both IRC notification as well as statistics and RSS feeds. The public service is really only appropriate for open-source software, but it's possible to set up privately too. The RSS feature has proven to be particularly useful. I keep finding users who talk about monitoring the CIA feed for their particular favorite piece of software

* Put All Source Files Under Version Control -- Nothing required to build, install, or configure the system should be excluded from version control, for any reason.

Does this include the compiler and OS? :) After all, to recreate a build (perhaps to track down a build-specific bug), the exact version of the compiler might be essential. And in fact at SierraOnLine?, where we had our own language, each project would indeed keep a copy of the compiler .exe in source control.

Disadvantages

While I think the above practices help me a lot, there are some problems to deal with:

Dependent upon the network -- If I'm on a machine that is not connected to the network where the repository is located, then I can't follow my practices. This happens when I'm at a customer's site, or when working on my laptop in a hotel room. When in this situation, my only recourse is to back up my source tree every once in a while. Since this is such a pain-in-the-ass, I slow down and I get grumpy. And somehow, whatever changes I make offline have to get merged back into the repository. The other potential problem is being unable to access something needed in the repository, so I always make sure I check out the entire source tree to my laptop if I'm going to be traveling.
Binary files -- CVS does not handle binary files well. They clog up the repository, because CVS stores the entirety of each revision, and CVS cannot give useful diffs. As described above, I work around this by avoiding binary files whenever possible. I try to save MicrosoftWord files as RTF. For some files, like icon bitmaps, there is really no way around use of binary, but those files tend to be small and not really amenable to diffing anyway.
Performance -- For these practices to work well, you really need a well-performing server and network. This has generally not been a problem for me, but when things do bog down, I start to cut corners.

Note, however, that all of these disadvantages are addressed beautifully by the use of DistributedVersionControl systems.

If I'm on a machine that is not connected to the network where the repository is located, then I can't follow my practices.

I often export from the remote repository and import the export to a local repository, work for awhile, then when able to, copy over to a working copy of the remote repository and commit that. You lose some granularity of changes in the overall history, but you've still got 'em all if you need to deep dive into them, and you get to keep up your regular practices (most important - frequent commits and freedom to make changes). Can't imagine working on a machine without local version control.

See comments at ExtremeVersionControlDiscussion

An interesting update of CVS is SubVersion. They are trying to write a replacement for CVS that fixes its well-known problems.

I am now using SubVersion. Its interface is very similar to that of CVS. One advantage, from an agile perspective, is that it allows you to seamlessly and transparently rename/move files, so that you don't have to decide "up front" (a la BigDesignUpFront) exactly where everything has to go and what it will be named.--ApoorvaMuralidhara

I have seen somewhere recently (and I suspect in this Wiki) an admonishment to avoid making bulk trivial changes to files under source control, for example reindenting or doing tab conversions (touched on in TabsVersusSpaces).

The rationale is that it adds very little, is more likely to cause conflicts and makes diffing much harder. The suggestion was to make local changes where you touch the code, and perhaps when code is imported from a different indentation regime.

I can't find that page. Could you answer me (answered now, so removed badge) please? Thanks. -- MatthewAstley

I don't know where the page is, but it is pretty common practice to adopt a common set of formatting conventions, and make everyone adhere to them, for precisely this reason. I personally tend to reformat files to the standards whenever I have to touch them. I consider this to be a FixBrokenWindows activity. If there are no standards, then I'll try to conform to whatever style is already used in each file I touch. If the file has inconsistent formatting within, then I'll make it consistent.

One rule I try to follow is that if I find a file with bad formatting, I'll fix the formatting first and check that in before I make any other changes (or make all my changes first, check them in, and then fix the formatting). That way, it is easy to isolate "significant changes" vs. "formatting changes" in the file's history. --KrisJohnson

There are what appear to be well-thought-out rules for when and how to reformat at http://svn.collab.net/repos/svn/trunk/doc/user/svn-best-practices.html.

The common practice I've seen is to keep formatting patches (which should have no code effect, so you can test by comparing binaries) separate from behavior changing patches.

While it was not written as an XP book, we had XP and agile practices in mind when we wrote SoftwareConfigurationManagementPatterns, so there may be some useful lessons in there for those who are wrestling with these issues. -- SteveBerczuk

See ExtremeVersionControlDiscussion, XpVersionControlRoadmap, BondageAndDisciplineVersionControl, VersionControl, SoftwareConfigurationManagementPatterns, DistributedVersionControl

CategoryVersioning