Extreme Version Control Discussion

Discussion of ExtremeVersionControl:

This is the first time I've ever heard of a programmers disliking version control systems. The very thought of that amazes me. I think it would be crazy to even consider working without one for even the most trivial projects. That said, I think the ways most organizations use version control systems are awful, and cause as much harm as good. That must be the source of any programmer aversion to version control. Mediocre organizations - the norm - add ridiculous complexity to the use of what is a very simple and important tool. I see two intertwined issues that lead to this complexity.

First, I attribute most of this nonsense to the appearance of so-called "Configuration Managers" - non-programmers who specialize in administering version control systems. They have no incentive to arrange things as simple as they should, because that would put them out of a job. They frequently confuse data with information, such that the signal to noise you see in the revision history is very low under their regimes, thus negating an important reason for version tracking. Non-programmers who administer source code management systems usually add difficulty to a process that they (as non-programmers) cannot truly understand, namely, creating software. A good team doesn't need them, and is usually hurt by their presence.

Actual, as opposed to so-called, Configuration Managers know that part of their job is to arrange things (builds, checkins, etc.) as simply as possible (procedures, documentation, access to tools, etc.). In any profession, I'm sure there are some folks who put more emphasis on job security than on doing their jobs well. There are good teams who understand CM and version control well enough for daily work, who can also use someone to focus on version control tool arcana, getting good integration builds, branching and parallel development, generating useful change reports and working with QA and project management, and other CM functions, while they're happily off coding. It's called division of labor. It's a shame you've been burned by poor practitioners. -- SarahElkins

As a "configuration manager" that you so heartily disdain, let me tell you that programmers make the very worst project organizers - especially in high-traffic, highly volatile environments. Every developer I've ever met with the exception of 1 was totally willing to totally botch an entire development process to put in one well-meaning "fix" that would cause an unintended ripple effect through the rest of their code.

1. You are correct that source code should be very simple to find and use - and new developers end up spending far too much time trying to learn the environment when they could be writing code and being productive. As a config manager, I find that even when I make build environments so simple a monkey could run them (hell, a monkey wrote it!) one developer on a different level than all the rest can so confuse and change an environment that no one will ever put it back again.

2. Your build environment (source, libraries, structure) should be defined by your configuration manager. One simple checkout from my two project hierarchies gets you all of those things. Break your environment? No problem - check it out again. Everybody starts out with the same setup, every time. Customize all you like - but you always have a "home" to go back to when things get mucked up. Your config manager ought to know the environment better than all the developers put together. Config managers wear all the hats - developer, sysadmin, QA, and support.

3. Configuration management is far too important a detail to be given to developers who find the routine scripting, project maintenance, and countless other tasks (shared development environment maintenance, QA, unit testing) a boring routine. Developers make the very worst maintenance and QA staff, bar none.

Someone below noted that the use of PVCS really takes a person who's learned how to exploit the tool fully - totally agreed. Otherwise it's just a slow network share. -- Ed.

Knowing how to manage source code well is the duty of every good programmer. It is easy to do well and should be done simply when done well. Kris' advice below is great. (I personally don't like CVS much, but my reasons are rather silly, and I would take CVS over most commercial "process-oriented" garbage for the same reasons Kris likes it.)

Managing source code is one of those things which anyone qualified to do well (a programmer) would be completely bored doing as a full time job, and almost anyone who would accept it as a full time job (a non-programmer) has almost no chance of doing well. Rather like that old joke about politicians...

There are configuration managers who do their job well and are not bored by their jobs (which usually consist of more than just managing the source code). Just like some folks enjoy doing system administration over programming, even though some programmers can't understand why. It's fun to work with good teams who want CM help, improving their scripts, processes, general knowledge, tool leverage, etc. -- SarahElkins

Second, there are some horrendous version control products out there. For instance, PvcsVersionControl.

Can somebody please explain to me why it is that this Wiki seems to breed people who have a problem working with PVCS?!? I have used it on and off at multiple gigs for over 15 years now and I have always been able to get my work done. What is the problem here? Why pick on PVCS when there are much better candidates for venom out there? Oy! Get a life, people!

OK, I'll try. The problem is, if a development team has installed PVCS Version Manager and then expects it to administer itself, well then you get out of it what you put into it, not much (a guaranteed recipe for disaster). And if you have an administrator who is capable of only using the GUI, there will definitely be some limitations to its use, especially where there are lots of projects to create. The best administrators will be someone who understands what version-control is all about plus has real programming/systems-administrator skills, and given the time to put that to use. Admittedly, PVCS has some quirks (what doesn't), a determined administrator can overcome these. These I consider the four levels of PVCS usage for any organization with a serious need for software version control (1)intolerable(no administrator, forget it), (2)barely tolerable(a GUI only administrator who doesn't really understand what version control is about or any part time administrator, forget this also), (3)tolerable(full time administrator who understands version-control and has some systems administrator skills) and (4)very tolerable(full time administrator who understands version-control and has some systems administrator skills and is capable of putting his programming skill to use [code talking to PVCS]). Any programmer knows the limit as to what you can do is up to you. PVCS VM is massively configurable, so you need an administrator who is willing to take the time to learn the possibilities. Give that person the time, it should pay off. Sizable databases tend to need full-time DBAs, an organization with lots of software libraries needs a full-time library administrator. -- Scott Macfarlan

I imagine it's because it's one of the more prevalent commercial tools out there, and it does have some flaws. ClearCase is another prevalent commercial tool, but I've not worked with it myself. -- SarahElkins

...so I always make sure I check out the entire source tree to my laptop if I'm going to be traveling. If you're using a system like Perforce (and I thought CVS as well), then you don't actually have to "check out" the files, just "get" the latest version before you leave. Or are we actually talking about the same thing? -- MikeSmith

Same thing. "Check-out", "get", "update", etc. all refer to essentially the same operation with CVS. "Check-in" and "commit" are synonymous as well.

Why check-in every 5 minutes? What's the point? It's like calling your mom every 5 minutes to let her know you are ok.

The point is that you always have the "best version" checked in. So you can Make Changes Fearlessly and not worry that you will make a mess of things. (And I don't understand the "calling your mom" analogy. What does saving your work have to do with letting someone else know your status?)

Check-in can be a pain so some people put it off. A better strategy is to do it more often. This alone may cure the pain. But if not, you will have more motivation to seek the automation that must be missing.

You can also use version control as a better 'undo' operation, with changes batched together in meaningful ways. If you don't want to step on anyone else's toes, use a private branch and merge it when you would normally commit. I sometimes use sccs on top of cvs for this undo capability: branches in cvs are too painful for minor changes; and CVS doesn't have a "private checkin" concept

You should check-in when you have completed feature and that feature is scheduled to be in the release stream. If you are worried about merging then maybe your merge process is broken. If 100 people updated every 5 minutes, I would do nothing other than sync peoples changes. Let me get some of my own work done. If you have a private branch check-in as you want. But checking into a release branch puts requirements on others so you should be polite.

If you are working on a project with 100 people, then ExtremeVersionControl is probably not for you. More formalized coordination will be necessary.

Just because it's most recent doesn't mean it's the best. Even if it is, so what? What people are running is usually good enough for their purpose which is to develop their feature. I would not be happy if every "best" change in win2k was downloaded to my system. It's really no different than for local development.

It's very different for local development. If the code you're using doesn't match what the rest of the team is using, then you are just delaying the discovery of incompatibility. ContinuousIntegration is a good thing. It's better to find a compatibility problem as soon as possible, fix it, and move on, than it would be to work for a few hours (or days) and then discover you have to redo your work to fit others' changes. If others are checking in incompatible code so often that it has a significant adverse effect on your productivity, then that is a "process smell" that needs to be addressed.

I agree. After you check out source code, it starts to go out of sync with the repository version - you're modifying your copy, and others are committing to the repository. You should integrate (update/merge, verify tests pass, commit) as often as possible (when tests pass), so the developers are as much in sync as possible. Every five minutes would be great; I usually achieve more like every twenty. -- ApoorvaMuralidhara

If you are working on a project with 100 people, then ExtremeVersionControl is probably not for you. More formalized coordination will be necessary.

If you're working on a project with 100 people, its chances of success are close to zero, or so says the Standish Group. No coordination is really necessary - it will die anyway.

Then I guess I should be dead. This is not the first time someone has said that.

Non-programmers who administer source code management systems usually add difficulty to a process that they (as non-programmers) cannot truly understand, namely, creating software.

From the point of view of at least some SCM administrators, the administration job is not part of the process of creating software - at least not directly. The function of the administrator is more like that of an archivist, or asset manager. There are places where contractual or regulatory constraints require source code versions to be kept, identifiably, for months, maybe years. In the insurance business, for example, part of the auditing every insurer must pass includes being able to show a source repository for three years back (don't quote me on the exact duration). This is only extremely tangentially related to creating software, but the SCM administrator is responsible for it. To meet this responsibility, the SCM administrator may implement processes that can add complexity to an otherwise simple tool. In a case like this, explore ways that the development teams can do what they need to do with minimal excess steps imposed by the SCM administrator, while leaving things in a state for the SCM administrator to meet the non-software needs.

Good programmers who use version tracking systems well can already show you which source code was used in (at the very least) any released build within the entire lifetime of a project, and they do so without adding any complexity to the process of producing code. Nobody else is needed to achieve the goal you describe.

I like to think of VersionManagement, testing, etc as part of the development process. Whether they are done by different people, or the same people. Projects work better if you focus on the Project rather than departments. (see also SoftwareConfigurationManagementPatterns)

-- SteveBerczuk

While it's true that CVS is a necessary evil I find that it's way too focussed on procedural program a la C. If you take modern ObjectOriented programming languages the CVS system still works but not quite as well. Especially if you start refactoring. The problem is that when refactoring, certain classes may have their names changed, some may become obsolete while others may be created. In Java the problem is amplified that during refactoring certain classes may be moved from one package to another. This is not very compatible with CVS as you'll constantly be creating new files on the server although they're the old file but moved to a new position. Perhaps a new system should be thought of that lies closer to the ObjectOriented paradigm. -- ChristophePoucet? (feel free to refactor my words to add links to the appropriate places or in case of mistakes.)

It's got very little to do with object-oriented programming languages. The problem is that CVS (and RCS, and others) manage your source code at the file level. This means that if you move/rename a file, you lose the history for that file. There are other revision control systems (e.g. SubVersion) which can track history across a file move, and that deal with multiple changes at once. I will concede, however, that if you're using an object-oriented language, you'll get bitten by this more, because it's a natural instinct to keep your filenames in sync with their contents, and merciless refactoring often requires changing the names of things. -- RogerLipscombe

See http://www.eclipse.org/stellation/

Question: You've just isolated a bug in the existing code-base, and you've written a test case to prove it. You now have a failing test case. Do you commit now, or do you wait until you've fixed the bug and the all the UnitTests pass again? You seem to be in a dilemma:

If you commit now, you're violating the rule against checking in RedCode.
If you wait, you're violating the rule to Commit Changes Frequently (the section where this question was originally posed).

Discussion:

ItDepends. I would probably fix the bug first before checking in, but if the fix is not something I could do right away, then I would probably go ahead and check in the new failing test case. Failing test cases are valuable and need to be saved and shared with the rest of the team. -- KrisJohnson

Working on a team, it's important for everyone to trust everyone, hence the rule is you can't check in with a failing test. Working alone, a broken test can be a nice reminder of where to start working again. -- AnonymousDonor

I agree that checking in RedCode is forbidden, but the situation described above is one where the production code has not changed - it is already broken and we've simply proven that fact. Checking in the new test without isn't necessarily violating anyone's trust, but it does depend on how your team works. It also depends on what the assigned task is: if it's my responsibility to fix the bug, then I wouldn't check in the test until I've fixed the bug, whereas if I've just been asked "Is this a bug in the code?", then writing the failed test completes that task and so it should be checked in before moving on to the next task. -- KrisJohnson

For the sake of argument, Kris, let's assume that it is your task to fix this bug. Assume further that the bug is non-trivial, and points directly to some weak spots in the code where you see plenty of opportunity for refactoring. Your other team members are also actively working on the code, so it is in the team's best interest to get these refactorings committed so everyone can benefit from them. How do you proceed? -- AnonymousPoser?

Well, ideally what would happen is this: I fix the bug in the simplest possible way that doesn't break any other tests, and check that in. Then I do a simple refactoring step, verify it through tests, and check that in. Then I do the next simple refactoring step, and so on until I'm happy. Along the line, if I need to change something that will conflict with others' work, I'll communicate and coordinate with them as necessary. -- KrisJohnson

How about introducing a new concept: YellowCode. This is indicated by a test case that fails, but which is expected to fail in the committed code base. The testing harness would have to support this concept. These tests are essentially stories and/or bugs yet to be addressed. This allows ExtremeVersionControl to work more effectively.

In the real world, problems are found for which it is not cost effective to fix immediately. We want to capture the known problems with test cases, but not expend the effort to fix them now (or perhaps ever, but that decision may not be made now). Obviously when the cost to fix is high, the likelihood of encountering the problem is low, the impact is low, and a reasonable workaround exists, there is little incentive to fix when compared to other work demanding your time. For some bugs it may not be as obvious as to when or if it is appropriate to fix them.

I've often been working on one bug or feature and in the process stumble upon other bugs which are too complex for me to address immediately and fall into the category described above. A failing test case is a much better place from which to deal with the problem later rather than a bug report that attempts to describe how to reproduce the problem, which often has some ambiguity. If coding the test case is simple enough, I can do that and be assured it won't fall through the cracks.

Our home-grown test harness has support for a test case to report "I failed, as expected". -- JohnVriezen

My own method for handling this is just to let them fail. The team should be working closely enough together that everybody knows which tests are expected failures. If the number of expected-to-fail tests is so large that people can't remember them, it's time to fix them or remove them. Another method is to put all the expected-to-fail tests into their own "suite", and don't run that suite if you don't want to see any red. -- KrisJohnson

I would expand on the YellowCode approach, as follows: Tests are marked either 'new' or 'familiar'. New tests fail with status yellow, whereas familiar tests fail red. When a test is first created, it's new (and typically fails). Since the status is yellow (as opposed to red), it doesn't prevent check-in. Any new tests that pass are then marked familiar, provided that all familiar tests still pass. The last piece is important: If your new code (to pass the new test) breaks something, you should be able to back out your changes. If the new test was promoted the first time it passed, then after backing out it would be a familiar test that failed (red).

Of course, this requires that the testing system store and retrieve the 'familiarity' setting of each test. My tools don't currently support testing at all, but when I add that, this is how I'll do it. -- JoshuaJuran

The use of a CommitEmailList can provide another form of communication.

But, such a form of communication loses effectiveness as people commit changes more often. People just ignore the e-mails.

On the subject of checking in bad code: When I am working on a project on my own, I check in very frequently. I check in whether it works or even compiles or not. When working on a shared project, I don't want to check in bad code, and occasionally I find that an irritating restriction. Most often, the problem is that I have tried to do something, it hasn't worked, and I want to try a different way. On my own, I would check in my bad code, then wipe out the bad stuff and try again. If I later decided to go back to the first approach, the code is still in version control and I can get it back.

What's extreme about that?

What would be cool would be a two-level version control, where I can check in stuff that won't be picked up by other people until I've got it working. PerforceVersionControl makes this pretty easy to do

It should be possible to rig this in CVS by creating a "private" branch, and then merging it onto the trunk as the "second-level" commit, but working out CVS branches always makes my head hurt. I wonder if anyone has ever encapsulated this concept as simple script wrappers? -- AndrewMcGuinness

Just out of curiosity, how do you do the two level thing in Perforce?

The PerforceVersionControl documentation suggests creating a branch off the main line. The branch would be a temporary (but version controlled) copy of the source tree that you can mess around with until you're done with your changes. Any check-ins/comits/submits would only affect your branch of the code. When you're done, you use the merge-functionality in PerforceVersionControl to integrate your changes back into the main line, much like if you'd done all the work in one large change. The branch is then terminated and a new one is opened for the next isolated lengthy task. (This method is the basis for various strategies used to handle patches and management of old product versions and something that the CM people mentioned above probably would be the ones to set up. IMHO though, the strategy should be limited in complexity, allowing the programmers themselves to work within it without hampering development speed.) -- AndreasAxelsson

What is the value in checking in non-working code? The problem lies not in how often one should check in, but the size of the problem one attempts. My personal rule of thumb is that if I don't have something that works after about 4 hours (one morning or one afternoon session), then scrap it, go back to the previous version, and start again. Don't worry about checking code in based on some arbitrary egg timer, check in when there is a result worth saving. This may be 1 minute, 5 minutes, 1 hour, 4 hours. Attack the problems incrementally, and one will never be very far from either a check in or a roll back.

It may be useful to distinguish between different disciplines, all which may happen to use version control.

Most of this discussion is talking about committing changes that are on the main line of development. This supports ContinuousIntegration by only committing working code (no RedCode) that other developers can use. It also captures the IncrementalValue?. The terminology I'm familiar with calls this the development branch.

The release, or production, branch reflects versions of the project that are in use (presumably having passed quality control, the release process, etc.) Work can, and often does, continue to fix bugs with this version, independent of the main development branch. For example, see most any large open-source project. Work is done on a branch of the release, then merged back into the production branch.

Some people like to capture changes so they can rollback. I do this, because it makes me feel safe, but this is a different goal than no-RedCode-version-control. It suggests a different tool, or a different discipline. You can use version control systems to help with this, but as noted, it might conflict with the goal of no RedCode. In fact, I almost never rollback because I avoid committing till things work, but I do want to rollback speculative edits as I'm trying to get things to work.

However, you can still use version-control to hold your edit history if you commit to a personal branch. When things work, you merge or commit to the development branch.

I've tried multiple times to use branches in CVS, and I haven't found a way that works well. The concept seems clear enough, but the tools that CVS provides have always lead me into confusing, tedious, and error-prone cleanup. It has never been worth it for frequent changes/merges. It does seem to work, if carefully controlled, for the major branches (development and release). Some causes of the pain are when a file changes on both branches, or when a file/directory is moved or renamed.

I'm currently experimenting with GnuArch. It's main disadvantage is it's poor command-line interface, and documentation that doesn't convey to me a useful mental model (I can't guess how to do things). However, it has several key features:

DistributedVersionControl so I can work on my laptop
Branches are not second-class citizens, they are all equal
It understands merging the right way
It is project-oriented, not file-oriented, so it understand rename/move.

-- AlanGrover?