Refactoring Cpp To Reduce Dependencies

Here's my response to Michael's stab at ExtremeProgramming rules for CeePlusPlus: instead of trying to head off dependency problems at the pass assume that you aren't gonna need to worry about dependencies and do the simplest thing that could possibly work... until it stops working. Then refactor mercilessly with an eye toward reducing dependencies. It would be nice to have a full catalog of dependency reducing refactorings after the model of MartinFowler's book. However all I have is the following humble offering: (note: Please allow me to apologize, on behalf of the C++ language, for the need for some of this stuff. Most of it becomes second nature and easy to use - honest.)

Start by writing everything inline in the class declaration ala Java. (do ConstCorrectness -- you'll need it and it's very expensive to add later).
Use AlternateHardAndSoftLayers, via CppHeresy, to bust up modules before they become a problem
Notice dependency and partitioning problems:
- Inconveniently long compiles (dependency)
- Namespace collisions (dependency/partitioning)
- Test suites become hard to compose -- what do I test first? (dependency)
- Test suites are too big and take too long to run. (partitioning)
- It's hard to figure out which files to include. (partitioning)
- You are surprised by the existence of a dependency.(dependency/partitioning)
- Every new bit of code breaks some other bit of code. (dependency/partitioning)
Reduce dependencies first:
- Get rid of nested #includes (for classes that depend on other classes)
  - Change #include to ForwardReference.
  - Change value fields to pointer or reference fields.
  - Move affected method bodies to .cpp.
- Insulate method implementations (for classes that are depended on)
  - Move method bodies into .cpp
  - Move private method declarations into .cpp as standalone static functions.
    - Only if really necessary
    - Use friendship to grant access to data
    - Or use a pimpl parameter (see below) to access the data
- Insulate data
  - Use an abstract base class and a factory
  - Or use a pimpl (PimplIdiom)
    - Forward declare a struct to hold private data
    - Have exactly one data member (typically called pImpl) which is a pointer to this struct.
    - In the .cpp define the struct to hold all the data your class needs.
    - Allocate the pimpl in your constructor
    - Deallocate it in your destructor
- For emergencies only: when you must inline for performance but you cannot because of dependencies.
  - Create an "inl" file -- _inl.h or just .inl. Put all of your method bodies in this file.
  - Add an INLINE declarator to each method.
  - At the end of your header:

#ifndef DEBUG // inline in release mode
#define INLINE inline
#include "XXXX_inl.h"
#undef INLINE
#endif

In the body of your .cpp:

#ifdef DEBUG // don't inline in debug mode
#define INLINE 
#include "XXXX_inl.h"
#undef INLINE
#endif

During development define DEBUG and you'll get insulated but slow code. At release time undefine DEBUG and you'll get fast but uninsulated code. Don't do this until you know you need the optimization. It makes the debug and release versions of the code quite different.

Then partition if needed:
- Use facades to break dependency chains
  - Facades should be fairly well insulated if not fully insulated
- Use opaque handles to manipulate objects behind the facade. (look out, this starts to get procedural and clunky pretty quick)
- Use packages to group code together
  - Each package gets its own subdirectory
  - Only insulated (usually facade/handle) classes are exposed outside of the package.
  - High coupling pulls classes into the same package.
    - Dependency loops within a package are fine but those loops must be forever confined by the package -- absolutely no exceptions!
  - Diverse clients push classes out of the package. All clients of a package must pay for any change to the package. Therefore they should all get some benefit from most changes.
  - Package dependencies must, must, MUST form a directed acyclic graph.
- Release packages as revisioned subcomponents of the larger project.
  - Do this to increase stability in the project or to support multiple projects.
  - Allows asynchronous integration. Which is a pain, but can be valuable for large projects.
  - Compile the classes in the package into a static or dynamic library. Or possibly into a server executable in extreme cases.
  - Expose the package to the rest of the project as a binary and a set of headers.
  - To integrate:
    1. Use the latest version of each leaf package.
    2. Test each of these packages.
    3. Bring all client packages up to date so that they can use the latest tested packages.
    4. Test these packages.
    5. Go back to step 3 using the newly tested set of packages as the base packages.

For goodness sake don't start doing this stuff all at once at the start of a project! Go from simplest to most complicated. If the next step up in complication doesn't sound like a relief then you don't need it badly enough to do it.

A note on packages: I think that this is the way that XP can break the project size barrier. Split a big team into smaller teams with each team being responsible for a package or set of packages. Make the teams customers to each other for purposes of the planning game. There ought to be cards for integrating package revisions and negotiations about whether features get completed before the integration or after. I think that this can work but I've never tried it. -- PhilGoodwin

This rule is wrong: Start by writing everything inline in the class declaration ala Java.

The opposite of that rule is so right it supersedes the various "Simplest Thing" and "Only Once" rules. Develop the muscle memory to bang out the method prototype here in the .H file and the body there in the .CPP file (or get a C++ pseudo-Refactoring Browser that does this for you, like the Add Method twitchy in VC++).

That rule's wrongness perverts everything else in the list, but if we remove it and ripple this change thru the list everything else falls into place. Writing a few lines of extra code to support proper structure (and to cover details like physical encapsulation that VHLLs do for you) is not the same as adding extra functions that need UnitTests to support them.

So put the function bodies in the implementation files first. This leads to always preferring forward declares to nested includes, owning things by pointer instead of actual, and other physical dependency management techniques.

With the first half of the list taken care of, the rest just falls into place. You can now skip ahead to the package dependency rules. (Oh, and nobody should ever think they can have only one class per header. But you don't know that until a quick RefactorMercilessly session points out what's cohering and what's coupling.) -- PhilipCraigPlumlee

ExtractImplementationFromHeader is such a simple refactoring that I now never start building a c++ class in two separate files. I've been doing this for some time now with no ill effects (unless you count the fact that I can now write and change code much faster than I used to be able to do). I don't mind being perverse as long as it's practical. Are there any practical reasons why I shouldn't be doing this?

As far as one class per header: I create a new header file for every new public type I create. Often even typedefs. Private classes live in the implementation file of what ever class they are helping with. The practical reason for this is that it helps me keep track of where classes are implemented and allows me to move them easily if I want to. Its a OnceAndOnlyOnce issue with me. Maybe it's only stylistic, but I'll bet that most XP teams will converge on making them separate. -- PhilGoodwin

Ask JamesKanze?. I'm not gonna try to take your position with him, for the same reason as I don't box with Mike Tyson.

James is a smart cookie. I'd love to see his reasoning on this subject. I know he hangs out on comp.lang.c++.moderated. Is there a subject line that I can do a search on to see what he has to say? -- PhilGoodwin

Here's the eXtreme view (maybe sometimes):

    Guru of the Week #4: Class Mechanics
    [1] http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=224661047

And here's the "No premature optimization or premature compile-time coupling" view:

    Guru of the Week #33: Inline
    [2] http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=339574418

The deal is OAOO and DRY and DTSTWCPW apply after you have dealt with your language's technical requirements. Slamming out a Java-style class in C++ is real cute up front, but your crows will not come home to roost enough for you even before the time comes to upgrade or profile or optimize. Just get a proto-Refactoring Browser for C++ like http://www.genitor.com/genitor/ocs.htm and live with lots of text duplication and manual "Small Design Up Front" to manage dependencies. -- PCP

I don't inline to optimize the speed of the running code - I do it to optimize the speed with which I can write the code. If I had different tools I might behave differently. As it is, what I do works for me - but only, perhaps, because I do all of what I do. One thing I do is to put compilation firewalls between packages. Another is to make my packages small. Since I expect to recompile the entire package no matter what I do (because it's so cohesive that it's likely that any interesting change in functionality will mean changing most of the files in the package anyway) inlined classes don't make any difference in my compile times. Dependency management seems to be the only reason to ExtractImplementationFromHeader. Coding speed is really the only reason not to. Given the right set of tools I could easily see myself going the other way. I don't mind the existence of duplicate text if that's what the compiler needs - I just don't want to spend any time at all maintaining it. -- PhilGoodwin

One major point of that 'Inline' thread - 'inline' changes performance profiles. More often than not it speeds something up. But it can also slow things down. I was not accusing you of thinking it automatically meant fast. But if you believe in compilation firewalls, use the most basic one; the H file CPP file paradigm. -- PCP

Right, when I use them that's the first thing I do.

Defining method implementations within the class definition does not have to mean that the method is physically inlined in users of the class. Just about every C++ compiler has a compiler switch to prevent such inlining. On GnuCpp it is -fno-default-inline, and some related switches. It is a pity that there is no language defined mechanism to be more explicit - something like

    class complex {
        float re, im;
        !inline void print(ostream& os = cout) const {
            os << '(' << re << ',' << im << ')';
        }
    };

In this, as in many other areas, C++ violates one of the basic principles of language design: there should be ways to explicitly turn and and off a feature, as well as a default (e.g. signed char, unsigned char, char).

Defining method implementations within the class definition is mainly just a convenience for the coder.

When you say "muscle memory" I say "tool".

Extracting .h header files automatically is pretty easy. It's especially easy when you have a coding standard for indentation. I suppose that there is a need for a general tool that comprehends the language, as opposed to relying on indentation. (Hmmm... HowWouldRefactoringGoFromIndentationToParsing?) With C++, of course, you are required to generate a .h interface file, and a .c object source code file, and the original "fat" class definition file should not be directly compiled and linked, courtesy of the One Definition Rule.

At some point, of course, your class definition gets so bulky that you, the programmer, want to move implementation bodies out of the class definition just to make the class easier to read. That is a useful refactoring - useful in the sense of being human oriented, rather than tool oriented.

-- AndyGlew

Start by writing everything inline in the class declaration ala Java.

This rule increases dependencies! If you keep your header file simple, most external classes can use forward declaration in lieu of including the relevant header file. If you actually use the class in the header file, you are forced to include the header file. -- WayneMack

PhilGoodwin realizes that, but takes the stand that this is the SimplestThing (in the short term). Further he says that with his tool environment, he finds it very easy to refactor (ExtractImplementationFromHeader). Each person's tool environment may have different strengths and weaknesses of course. I too would have thought this would be a stupid thing to do, ignoring years of experience, but he makes a good case. The fact that he takes this stand at all is enough to make me intrigued and curious to try it out on my next C++ project. He's right that it's easy to fix later, and he's also right that it is the simplest thing to do at the very beginning. OptimizeLater.

Writing everything into the header increases compile time dependencies. Moving much to the implementation moves those dependencies to be at link time.

CeeCeePlusPlus does permit "link time polymorphism", ie., replace the dependended on symbols by mock versions at link time.

Forcing everything into the header makes the dependencies explicit at compile time. Putting things into the implementation creates two almost orthogonal dependency trees, the compile #include dependency structure and the link time symbol definition and reference structure. -- JohnCarter

Start by writing everything inline in the class declaration ala Java.

When I'm writing code test-first for a new class, I declare the class in the same .cpp file that my tests are in (with inline code), and then move the class to its own .h and .cpp files as the last refactoring I do before checking in the code.

In a VisualCeePlusPlus 6.0 project with a hundred or so headers/classes, just moving the inline methods from ten or so of my most popular .h files into .cpp files reduces compile time from 20 minutes to 17 minutes. Doing other compile-speed optimizations, such as pImpl idiom, can shave another minute or two. Using containers implemented as templates (BoostLibraries and stl) seems to really increase compile time.

Oy vey! Not only does SmalltalkLanguage keep looking so attractive for its developer-speed, but my "statically typed" refactored C++ code was silently compiled to doing the wrong thing as int and bools silently converted to each other - changing a constructor argument list from (unsigned char, bool) to (bool, int) did not give me any compile-time errors or warnings.

-- KeithRay

WouterVanOortmerssen agrees with a lot of these ideas:

http://wouter.fov120.com/rants/javaclassesincpp.html

His motivation is as much to reduce redundancy and aid refactoring as it is to reduce dependencies.

See LargeScaleCppSoftwareDesign in DefinitiveCeePlusPlusBooks. -- PaulChisholm

See CppDependencyAnalysis for a tool to analyze C++ header dependencies

CategoryCpp