Windows File System Organization For Java Development

Yes, this is a rather mundane issue, but one that I seem to have to reconsider every time I start a new Java development project. Why don't we try to collect on this page some (best) practices for organizing a Windows file system for Java development. I'm using "file system" here in a general sense, to mean the entire tree of drives and directories under "my computer" (i.e., not to mean FAT vs NTFS, etc.).

Let me start with some context. You're using a Windows machine to host a development environment of some Java-based application. The application depends on some third-party products, e.g., Apache, WebLogic, Oracle, the JDK (of course), and maybe some things like JakartaAnt, JavaUnit, and some jar files from JavaSoft. You check out the application's codebase from a CM repository onto your Windows box, and you use some editor or IDE to create and edit the files comprising the codebase, which you then deploy into the servers on your box for testing and debugging purposes.

Now some forces in the context. By Microsoft conventions it seems third-party software programs are expected to be located under "C:\Program Files". Some installers respect this convention with their default location, but others don't. And you may further wish to locate all of the "components" of the application and its development environment on a single logical drive that is separate from the "system" drive. Some third-party software distributions come with installation wizards, while others don't (you simply extract a zip archive to some directory). You may also wish to group third-party software distributions, and keep them separate from the codebases of the application(s) you're working on. And you may wish for your filesystem organization to reflect the versions of the third-party products you're using.

The problem? How to organize the filesystem. Some balance must be struck in the conflict between the Microsoft convention, non-compliant installers, and the desire for a single, non-system logical drive. Some directory structure must be used to separate application codebases, and different versions of third-party software distributions, from each other.

Over the years I've seen a number of solutions. One shop chose the separate-logical-drive solution, created a "Program Files" directory on the "development" drive, and overrode non-compliant installers (e.g., Oracle, WebLogic) to put the software under "Program Files" instead of at the root level of the drive. Other shops let the stuff get sprinkled around wherever, according to installer defaults. Yet another shop ported Unix conventions to the Windows environment, and created a high-level directory structure of "\usr\local\bin".

A lot of shops aren't real careful to let the path to a jar file reflect the API version represented by that jar file - one of my favorites is jaxp.jar (several times I've had to reverse-engineer which JAXP version the project is using by looking at the file size and mod date of the jaxp.jar file). This plays hell when different codebases assume different JAXP versions but only one jaxp.jar can be on the classpath.

Here's a set of ProtoPatterns that I've arrived at:

Dedicated Development Drive - keep the "components" of the application and its development environment, but nothing else, on a single logical drive separate from the system drive. Suggestion: We're on windows here, use "subst x: c:\appl"

Major Products Under Program Files - install all major third-party products under a "\Program Files" directory on the development drive, if for no other reason than to preserve some semblance of compliance with Microsoft conventions.

Minor Products and APIs Under "\java\dist" - install minor third-party product distributions (jar files, etc.) in a "\java\dist" directory on the development drive ("dist" for "distribution").

Three-Level Directory Structure - for third-party software distributions, use a three-level directory structure to keep track of product versions and group products by vendor. The first level is the vending organization's name. The second level is the product name. The third level is the product version. This results in paths like "\java\dist\JavaSoft\JAXP\v1.1", etc.

Application Codebases Under "\java\projects" - locate application codebase(s) under "\java\projects" on the development drive, with a separate subdirectory for each project.

Standard Project Directory Structure - each project has a standard set of subdirectories, e.g., bin, conf, data, doc, lib, src, test, etc.

Explicit Classpath Management - don't define a classpath environment variable. Every time you need to invoke a tool that requires a classpath, pass the classpath that is appropriate for the purpose at hand. I know, it's kind of a separate topic.

One thing I'm not real sure about is whether there is any legitimate reason to separate "\java\dist" from "\Program Files". The distinction between "major product" and "minor API" is admittedly arbitrary.

Comments, criticisms, and alternatives are encouraged. Thanks. -- RandyStafford


For me, location independence is the major force. This arises from a need to deal with other developers who organize code differently, a need to sometimes work with 2 copies of the source, a need to work on machines which you can't repartition (this happened on a customer site), and so on. Location independence implies that you need to manage your classpath (and all other paths) explicitly. It is worth separating internal paths (paths to things inside your code tree) from external paths (to third party products), and to manage external paths together, so that developers achieve localization via changing one file, and changing that file has no other effects. -- BrianEwins


I agree, Brian. I still like to strive for a "standard environment setup", but people invariably put things in different places, so location independence (for build scripts, etc.) is valuable. JakartaAnt facilitates this quite nicely. It usually boils down to a set of environment variables that have to be defined, to point to the install locations of major third-party products. And then there are other parameters (database logins, etc.) that vary on a user-by-user basis, and that you don't want to keep in a common file checked in to CM. One of my main concerns is achieving some level of consistency within a development team, so that you can bring new people up to speed quickly, and so that you don't have to waste time chasing problems that turn out to be due to configuration differences. -- RandyStafford

Environment variables are a curse. You really should get these put in a file under configuration control. The one time we didn't do this, we had pain on every machine as folk constantly forgot to set the variables. -- Brian

So the answer is to document the procedure for administering the development environment, including the procedure for installing and configuring the third-party components, and including what environment variables have to be set (and how). If you have documentation (and automation) then you don't need a "guru" per the remarks below - new people can come on and "serve themselves" as far as getting their computer up. This is the goal I'm aiming for, and if they don't follow the procedure it's their fault.

I think it would be hard to get away from using environment variables. I see some difficulties with instead storing the values they contain in a file under configuration control. The purpose of the environment variables is I'm thinking of is to point to the installed location of things, so that the build scripts can have location-independence. If different developers used different values for the variables, then the file would be changing all the time, or each would have to have his own copy of the file kept in CM, and on checkout everyone would also get everyone else's copy which they wouldn't use, so it seems wasteful and overkill. For other types of variables (database logins, etc.) I agree they should be kept in files instead of environment variables, but I still wouldn't bother keeping that file in CM, for the same reasons. I've also noticed that a lot of third-party products depend on the existence of at least one environment variable: JAVA_HOME. -- Randy Stafford


I take an different approach than the ones described here. My goal in organizing a project is to enable a new developer on a new machine to be up and running as soon as possible. I put everything I can into revision control. My goal is to be able to type 'checkout' and 'make' and have a working system. All software gets installed under the project's 'bin' directory, and checked into RC.

 project/bin
 project/src
 project/lib 
 project/script

We did this on my last project and it worked reasonably well. It was easier to write scripts when we knew where the binaries were and could count on specific versions being available. We didn't have any JAR file versioning issues either. The one exception was Oracle; we didn't try to put that into revision control as it was so large.

-- JimLittle

So I assume that people would checkout "bin" only once? (else it could be expensive checking out installers every time). Or were the "make" scripts smart enough to avoid re-installing? Thanks. -- RandyStafford

The revision control software (ConcurrentVersioningSystem?) was smart enough not to check out files that hadn't changed. All of the tools in the 'bin' directory were command-line and didn't need an installation program. The scripts set all of the necessary environment variables automatically, and the configuration files were stored in RC.

GUI-based tools, such as IDE's, profilers, etc., required manual installation. Their installers were stored in 'project/resources', also in RC. The goal, though, was to be able to check out, build, and test the system from a command-line without the normal configuration hassles you see on most projects. We never achieved that goal, but we got pretty close. -- JimLittle

Have you looked at CruiseControl? It's an automated build system based on Ant; it gets a copy of the current code base from configuration control then builds it (and tests it if you have automated that). If you aim for an automated build environment like that, you end up having to go a long way towards Jim's position - all 'guru knowledge' in the build process has to go in the script, which ends up making it easier for new developers to get up and running. There is no-one behind the curtain. -- Brian

Not sure who your question is directed to, Brian, but yes - I'm familiar with CruiseControl. It's value-add is the "continuous" in ContinuousIntegration, and it happens to use JakartaAnt to automate administration of its environment and running of test suites. Note that one can still automate those things (via Ant or otherwise) without adding on CruiseControl to continuously re-administer a test environment. The goal, as you note, is encoding the knowledge of how to build the system in the build script. I claim that is necessary but not sufficient to make it easier to get new developers up and running. What's missing is the procedure for how to install and configure on the new developer's machine all of the third-party components that the application and its development environment depends on. That procedure has to make some assumptions about where to install these components, which is what I'm focussed on here. -- RandyStafford

I think it would be somewhat more fair to CruiseControl if you would claim that Ant keeps knowledge about HOW to build the system, while CruiseControl keeps knowledge regarding WHEN to build it. Doing what CruiseControl is doing is not easily doable with vanilla Ant. The CruiseControl metaphore is almost like a crontab with knowledge of SCM tools. That said, I don't think CruiseControl will solve any problems with WindowsFileSystemOrganizationForJavaDevelopment. --NiclasOlofsson


See also OrganizingTestCases


EditText of this page (last edited December 4, 2003) or FindPage with title or text search