Algorithms That Demand Garbage Collection

Some algorithms are greatly simplified by using GarbageCollection. As a nigh-trivial example, you can remove the last k items in a list by simply changing one pointer if you can rely on a garbage collection system to clean up after you. But are there any algorithms for which alternatives to a full garbage collection system are too messy or inefficient to contemplate? What are the qualities of these algorithms, and do any RefactoringPatterns address this problem?

WilliamPugh's paper on concurrent maintenance of SkipLists (ftp://ftp.cs.umd.edu/pub/skipLists/) uses garbage collection to allow concurrent changes to the list structure and accesses to data held in the list. In this case, it would be 'possible' to use reference counting. Each node in the list would need a reference count. Modifying this count to track every client's motion through the list would be an interesting exercise for someone interested in NonBlockingAlgorithms? (LockFreeSynchronization), but it would also add a good deal of overhead to each motion. More motions than deletions occur in most programs, so a garbage collection algorithm actually looks 'more' efficient than the alternatives.

Other possible changes to the algorithm may exist, but I (JasonRiedy) cannot think of any that do not involve low-level atomic operations. The garbage collection solution seems to combine portability and efficiency quite well.

In my mind, reference counting is garbage collection, but implemented manually. The problems with it are (a) if implemented naively, it's inefficient; (b) how to cope with circular references; (c) it's manual, which means it's hassle. None of these are real killers. I expect any algorithm which needs garbage collection can use reference counting, at some sacrifice of efficiency and/or cleanliness. If you want exceptions, probably (b) is the place to look. -- DaveHarris

Ref-counting is not necessarily manual. Perl and Python both use ref-counting to do memory management, without requiring any special work on the part of the programmer. C++ smart pointers can also automate ref-counting, but, of course, you have to remember to use the smart pointer. In my experience (a) and (b) are not nearly as significant as (c), so automatic ref-counting is a big win over manual ref-counting (which really sucks).

Also I should point out that ignoring leaks due to circular references, ref-counted memory management will be more memory efficient than classical GC even if it is slower, since an object is freed as soon as the last reference to it goes away. -- CurtisBartley [[ Not necessarily. You have to allocate space for the reference count, of course. If the average object size is quite small, then the waste due to the reference count becomes significant. Also, most ref-count implementations use a fixed heap, which have known and serious problems with waste due to fragmentation. Compacting GCs do not. -- ArlieDavis ]]

Perl and Python both use ref-counting to do memory management, without requiring any special work on the part of the programmer.

I think you may be MixingLevels here. There's no special work on the part of the programmer when writing in Python. However, when dealing with the C that implements Python or creating C extensions to Python you will need to handle ref-counting yourself.

In my years of doing C++ programming, I have never encountered a situation that required GarbageCollection. However, I've hit situations where automatic GarbageCollection would have been darned convenient.

Generally, I've found that memory management issues make it difficult to write highly generic code that is also efficient.

-- JeffGrigg

When you're trying to write code that is all of these...

Highly Generic / Reusable
Efficient
has a clear definition of object ownership (for reason of simplistic memory management)

You'll have trouble: Code will get complex, or you'll have to compromise one of the above.

Garbage collection tends to be inefficient because the release of the memory is not tied to the action that makes the memory unneeded. In C++, one can release memory when it is no longer needed by the code. In a garbage collection system, there is a set point in time (typically when memory is exhausted) at which the garbage collecter runs and tries to determine whether memory can be freed. I have seen garbage collection systems churn almost to a halt when they ungergo a spurt of temporary object creation. The solution? Add more memory.

Generating arbitrary LambdaExpressions requires GC of some sort (including reference counting), IMHO. Lambda expressions are not used in LIFO fashion, making them a poor candidate for stack allocation--thus they must come from the heap. The usage pattern of these is such that manual management of them would be difficult.

This is one reason it is often claimed that FunctionalProgramming requires GarbageCollection.

I don't think that argument works. The order the expressions are used in doesn't really affect the strategy for managing their allocation. To be more specific, the vast majority of function objects I've encountered can easily be implemented as copyable value objects. Where the [lexical] context is too large to make this a reasonable approach, I agree that some sort of shared_ptr is called for. But never underestimate the power of not passing by reference.

The problem isn't the order the arguments are used; it's when the lambda becomes no longer alive. Use of ValueObjects is one solution, if you know in advance how big the lambda is. If you don't, then you're back to square one.

Regarding ReferenceCounting. In addition to the issue reclaiming cycles; it also has problems in multithreaded programs; as the reference count updates must be atomic. If each change to a ref count is protected by a semaphore, the impact on performance is terrible. Even more "lightweight" synchronization methods still run into problems.

Also regarding ReferenceCounting. If I haven't made a serious error in my thinking, ReferenceCountingCanHandleCycles - but it's probably pretty slow... -- KarlKnechtel

A bigger problem with reference counting is the circular reference issue. The simplest is when two objects hold a reference to each other, often a parent and a child.

In a previous life, I did a lot of performance analysis on multithreaded apps on medium-scale x86 SMP systems (4-8 processors). The apps used reference counting quite a lot, using interlocked integer access to synchronize access to reference counts. Empirical analysis (profilers, bus traces) showed that interlocked integer access was the #1 killer of SMP scalability. It only gets worse when you add more processors, since each interlocked access by a single processor can potentially (and frequently does) stall *all* processors. The trend toward deeper pipelines only makes each stall all the more painful, although that trend appears to have slowed. -- ArlieDavis

Would WeightedReferenceCounting? have helped ? (see http://en.wikipedia.org/wiki/Reference_counting ). Weighted reference counting was designed for multi-processor systems. Or is there something even better ? -- DavidCary

There probably aren't any algorithms which demand GC, because you could always just never delete anything, and use an infinite amount of memory.

HansBoehm found a situation where not having GC would have made life more difficult, and the code much slower: <http://www.hpl.hp.com/personal/Hans_Boehm/gc/example.html>. BrokenLink as of NovemberFourteen

Continuations seem to require GarbageCollection or at least another equally complex structure (e.g. support for CactusStack).

See ValueObjectsRequireGarbageCollection

CategoryAlgorithm CategoryGarbageCollection