Caches Do Not Affect Big Oh Time Complexity

Cached data can be accessed much faster than data from the lower levels of the memory hierarchy, such as hard disks. The BigOh notation, used to formally quantify the time complexity of algorithms, ignores improvements which only increase the speed of the algorithm by a constant factor, relative to the size of the input.

It can be argued that caches do not affect BigOh time complexity, because each cache miss causes a constant time penalty. This penalty is the time it takes for reading a block from the disk. Depending on the current position of the head and on the position of the block on the disk, this operation may require time proportional to the size of the disk in the worst case, but this size does not depend on the size of the input... unless your input is larger than your disk.

Thus, caches do matter for AlgorithmsDealingWithMassiveData, but you can probably ignore the issue of locality when analysing your latest version of QuickSort (unless you want to take constant factors into account, and we often do).

The issue is similar to the common approximation that multiplication takes O(1) time: this approximation is only valid for integers which fit into your machine's registers. If you need to use BigIntegers, then you shouldn't assume that multiplications take constant time.

The rest of this page is a ViolentAgreement. All participants here agree that CachesDoNotAffectBigOhTimeComplexity, but they all think the other is wrong in their analysis.

For an introduction, see: http://www.dunkel.dk/thesis/thesis.pdf. It explains a lot more than is required here, but it is a good introduction to the subject.

Before you go reading a 133 page PhD thesis on new work to extend old theory, you need to go brush up on your old theory. There is obviously a large amount of work remaining for the future; what you don't seem to understand is that the old theory is not wrong.

Sorry, I shouldn't have said wrong. I should have said "does not currently apply to memory hierarchies built using HardwareCaches". Even the fact that they are HardwareCaches is probably irrelevant, since the important aspect is that algorithms are CacheOblivious?. -- GuillermoSchwarz

The short scoop is that BigOh assumes theoretically infinite scaling, but caching is by definition fixed (limited). The only place I see it making a BigOh difference is perhaps if locality is used to look-up similar things and that locality (distance) does not scale the same as the "outer" factor.

No, it applies there too; with or without hardware cache, one can do a traditional complexity analysis of any sorting algorithm and get a correct result that does not depend on whether you have a hardware cache, or which processor you're running on.

You are correct that there are additional speedups/slowdowns due to cache (and cpu, and other factors), but this is all on top of the basic O(n) analysis itself, and all of those things are very well known, and have been known for a very, very long time.

You are mistaken in thinking that these concerns that you are raising somehow supersede traditional complexity analysis. They do not, even though they are important. I've worked for decades on speeding things up, and I can assure you I've always been very concerned with caching issues. But only after I've resolved the O(n) issues.

Consider a specific example. Let's say you have a single memory operation with cost of 10. Let's say that, with a cache hit, its cost is only 1. So there's a 10-fold speedup on that one operation. That's a multiplicative constant. That's a very important difference, but not as important as the difference between e.g. an O(n) versus an O(n^2) algorithm.

Now consider a variation, where that memory operation is inside a triply nested loop, each one doing 1000 iterations. Now the cost is multiplied by a billion, so one will care a lot about the final outcome. A speed cost of 1 billion compared with one of 10 billion is very important. But it's still a factor of 10.

The basic O(n) analysis tells you that, the most important thing, *intentionally* glossing over the small details (additive and multiplicative time/space constants) in order to simplify the analysis enough to get a first approximation.

If speed (or space) is still an issue after that, then one proceeds to examine more details, such as whether there is caching. Or how to fit an algorithm into the existing code and/or data cache. Etc. This is traditional, not new.

You also have mentioned a threshold for the value of n. Yes, true, but that is also very well known, not a new issue. If you look at the C library implementation of quicksort on any platform, it is a certainty that it is not using quicksort when operating on small ranges, where n < 10 or something, because it has been tuned.

-- DougMerritt

[I've often read such advice, but never seen suggested code for small values of n, or concrete evidence on the appropriate value for n.]

That's because this is where actual timing tests become important to find out the effect of the particular cpu, caching, happenstances of coding, etc. In one circumstance quicksort may be used down to a range of N=8, while another implementation on a different machine in a different language may find that it should only be used down to N=100.

Complexity analysis establishes the overall framework, e.g. that perhaps quicksort should be used for large N, something else like insertion sort should be used for small N, and then after that has been established, one sets the threshold of N empirically.

Ok, I will assume you are right. Let us suppose for a minute that BigOh analysis can take caches into consideration, as if it were as simple as considering that whenever a[i] is accessed, it takes 10, but when a[i+x] is accessed thereafter, it takes only 1. For simplicity let us suppose that x * sizeof(a[0]) is the size of the cache, of course that means that a[0] is accessed in 10, a[1] is accessed in 1 and even a[x-1] is accessed in 1, but when a[x] is accessed it takes 10 again. For simplicity, let us suppose that only accesses to a are cached, so other variables can be accessed any time without affecting our analysis.

Now we want to analyze BinarySearch. The first access is in the middle of the array, it takes 10 because the cache is empty, then x elements are in the array. What's the probability that the next access into the array will hit the HardwareCache? If n <= x, we know that the probability is 1. If n > 4 * x then we know that the probability is zero. So we can do the analysis considering that fact. What's the result: exactly the same as if there were no HardwareCache: O( log n ). So you see CachesDoNotAffectBigOhTimeComplexity.

The argument need not be so complicated. Think of it this way: if cached accesses are 10 times faster than cache misses, then the constant factor of an algorithm employing a cache could be as small as 1/10 that of an algorithm without a cache. Realize that that does not affect the asymptotic performance of the fundamental algorithm. Imagine that all accesses hit the cache properly, so that all operations were 10 times faster. Obviously, this would still have the same asymptotic performance as the slower algorithm. So having operations that vary between 1 and 10 times faster cannot affect the BigOh analysis.

To affect asymptotic performance, you cannot just speed up operations by a constant amount. You have to speed them up by a variable amount - by some factor involving n. Constant-time operations cannot be improved, obviously, but they could be worsened. So it would be possible to slow down BinarySearch if accesses took O(n) time, for example. That would affect the asymptotic performance of BinarySearch. But if you don't somehow involve n in the cost of a particular operation, you haven't changed the BigOh time. -- JohnKugelman

Moved from MergeSortDiscussion:

(When analyzing SchwarzMergeSortAlgorithm?, an in-place MergeSort which allegedly performs better than QuickSort)

Well, on modern architectures (1999+) it turns out that Knuth's opinion that InsertSort was closer to BubbleSort than to QuickSort was not accurate because at the time DonKnuth made his analysis HardwareCaches were not as widely used as today, and he admitted the analysis would need to freshen up. -- GuillermoSchwarz

He's not wrong, and he didn't admit it, and asymptotic measures do not change when minor architectural changes are made. Cacheing and RISC vs CISC, etc, change additive and multiplicative constants, that's all.
- Caches affect program locality performance. The time it takes depends on how you access elements on the array. If you access elements which are close enough, the data will be in the cache, because chunks of the array are loaded in the cache.
  - But a cache affects performance only by a constant factor, which falls out of the BigOh/BigTheta calculations. Say that a cache hit is 1000 times faster than a cache miss. Then at most, you're improving performance by a factor of 1000. This in no way invalidates the algorithmic complexity, which'll dwarf the constant factor for large (>10,000) data sets. What you're really saying is that constant factors matter, which is well known to anyone writing practical programs. People choose linked lists over vectors all the time, despite the fact that the complexity measurements are strictly worse. For small data sets, it doesn't matter, because the constant is the dominant factor. For large ones, though, a (n log n) will beat a (n^2) all the time. -- JonathanTang

That argument seems inaccurate. Let me explain why. If you access an array in an orderly fashion, you will make efficient use of the cache. If you do a time analysis for that algorithm and a similar one which does access the same elements but randomly, you will see no difference in the time analysis, BigOh and BigTheta will be the same, but actually one algorithm will be much faster than the other. I would say several orders of magnitude faster for larger n. Increasing n will only increase the difference in favor of the algorithm that uses the cache efficiently. -- GuillermoSchwarz

It sounds like we have agreement, though not understanding. I think we agree that:
- Caches do not affect asymptotic complexity measures like BigOh or BigTheta; caches only affect the constant factors
  - I agree that Caches do not affect asymptotic complexity measures like BigOh or BigTheta, but I do not agree that caches only affect the constant factors. I think caches are totally ignored by BigOh an BigTheta. That's a big mistake, in my limited point of view: What I care about is the actual performance of algorithms, so that I can run them and find out that the calculation was accurate. That is not possible with BigOh and BigTheta because they are the wrong abstraction. See AllAbstractionsLie. I think a new abstraction is needed, one in which caches are also considered. And that abstraction will not consider that caches are constants that can be happily omitted. -- GuillermoSchwarz
    - But it's easy to show that caches do affect only the "constant factors", by considering the extreme cases: All memory accesses result in a cache hit, and all memory accesses result in a cache miss. There is obviously a minimal bound on the cache hit time, and in any sound memory system, there will be an upper bound on the cache miss time. The ratio of these two is a constant. So if we go from all-hits to all misses; the numbers of compares and swaps is unchanged (as we haven't changed the algorithm), but the time for each has increased by this particular constant factor. As any mixture of hits and misses must lie between these two extremes, it follows therefore that introducing cache misses only increases time by a constant factor, one unrelated to the input size. And I've given an upper bound on that factor - the ratio of the miss penalty to the hit time.
    - It is agreed that for high-performance applications, things like cache behavior and locality of reference are important considerations; and BigOh and the like become insufficient means for comparing algorithms. Unless your implementing an RDBMS or something similar, BigOh notation suffices. Use of more complicated models without first showing that sorting is a bottleneck is PrematureOptimization.
    - It is a contradiction to claim that something (caches, whatever) "doesn't affect BigOh" but simultaneously affects "more than the constant factors". If something causes a constant factor to be no longer constant (in particular, to grow without bound as the input size grows), then this means that BigOh itself has changed. This is rather simply derived from the definition of BigOh.
    - If you have a memory system where the cache miss time is unbounded; and increasing the input size causes the cache miss time to increase, then you can claim that the cache changes the BigOh notation. If cache miss time is bounded, however, BigOh notation is preserved.
- For small datasets; constant factors matter; so much so that an O(n^2) algorithm might be preferable to an O(n log n) algorithm in many cases. (I use BubbleSort on small datasets all the time. If the dataset grows larger, I can easily switch. DoTheSimplestThingThatCouldPossiblyWork.
  - I would rather use InsertionSort for small datasets as DonKnuth recommended, but as you say that's irrelevant. -- GuillermoSchwarz
- Even on large datasets, constant factors matter when comparing algorithms with the same asymptotic complexity.
  - Constants can be ignored if they appear on outer loops. In inner loops those constants are very important. -- GuillermoSchwarz
    - Uh, insertion/deletion in a linked list is O(1) (assuming one has a pointer/iterator to the insertion point already); insertion/deletion into a vector is O(n). (Insertion at the end of a vector may be amortized to O(1) if you double the buffer size at every reallocation; however this strategy has always struck me as very wasteful of space).
      - If your collection has N word-sized (pointers) elements, then using a resizeable vector will consume 2*N words in the worst case. Doubly linked list will contain 3*N words and singly linked list will contain 2*N words. So the vector will not not be very wasteful.

Cache may noticeably improve performance for small datasets. I don't think anybody denies that. It allows one to be lazy if the datasets will be relatively small.

This has nothing to do with MergeSort, or does it?

I'm not saying that for small n caches are important. For small n, you can even use SillySort and get away with it. It's for big n that caches are important. Let me put an example from loop optimization: You have two loops, one inside another and you want to optimize the whole operation. Which one would you optimize? Optimizing the outer one usually doesn't improve performance so much as improving the innerloop. So you see, improving the performance using caches can make a big difference. -- GuillermoSchwarz

Caching does not change the O() analysis.

Absolutely: CachesDoNotAffectBigOhTimeComplexity. But some algorithms perform faster than others just because of the cache. That's why O() analysis is irrelevant -- GuillermoSchwarz

Obviously the whole reason caches exist is because they often make things run faster. Your notion that that means O() analysis is irrelevant is just wrong. You need to drag out the old textbook and refresh your memory, rather than taking a HostileStudent attitude. This is not a matter of opinion. You are misremembering. Stop wasting people's time.

Ok, I reread about BigTheta and effectively it is not about averages. I forgot the notation. Sorry. But it is about averages that I was trying to make my point. Obviously I do not have the mathematical capacity to do this analysis. Besides I was referring to HardwareCaches, not SoftwareCaches as proposed below. This page is a mess.

Actually, caches can affect BigOh time complexity in some algorithms.

Consider a (doubly) recursive function

  int f( int a, int b)
  {
     int x,y;
     if(( a==0 ) || (b==0))
        return 0;
     if( result already in cache )
        return result;
     x=f(a-1,b);
     y=f(a,b-1);
     compute result using a,b,x and y, and cache it.
  }

With a cache, f(n,m) takes time O(nm) to compute.

Without a cache, it takes vastly longer because intermediate sub results are computed over and over via different paths, exponentially often in fact. The algorithm above is of more than theoretical interest. A variant of it is the basis for one form of the grep algorithm where the best alignment of two sequences is computed from the best alignments of shorter sequences. The recursive formulation, it can be argued, is a lot clearer than the dynamic programming formulation, but is woefully inefficient (without the cache).

-- JamesCrook

A 'cache' is also the basis of the fast fourier algorithm. The fast fourier algorithm can be formulated as a recursive algorithm which recomputes many sub results it doesn't need to, much as the algorithm shown above does. A cache rescues it whilst keeping the reason that the fast fourier algorithm works, i.e. its recursive formulation, clear.

Damnit, memoization is not the same thing as a hardware cache, you're muddying the waters by inappropriately using it as a synonym in the context of this page.

Unfortunately, memoization is a form of "caching" (and is often referred to as such), even though the "caches" referred to on this page are clearly hardware memory caches. And in the case of this page - the operations being cached (compares and copies) are both O(1) in the size of the input, so caching doesn't help here anyway. (Memoization can improve the algorithmic time complexity for some algorithms - for example, converting a naive recursive algorithm for computing Fibonacci numbers from O(2^n) to O(n) - but that does not apply to sorting).

For large data sets, a cache miss may end up going to virtual memory i.e. disk or tape, to retrieve the data. In that case, it can be argued that the cache-miss time ought not be treated as O(1). The larger the set of data to sort, the longer the seek time could be.

O(1) = O(1000000)

Agreed O(1)=O(1000000). But if you are working with an array of n items, the time to look up the item at the end of the array, if the item is already in cache or normal memory is O(1), but it might be sensible to treat it as O(n) when it is out on tape, or on other seekable storage such as hard disk.

With tape, yes, it can become O(n) due to the nature of the medium. With disk, you have to be very, very careful about what to assume and what to claim, because so many issues are e.g. OS or filesystem-specific, and also algorithm dependent.

With tape, you can assume that seeking to byte N takes the same time as reading sequentially up to byte N. But clearly that is not true of disk, where reading sequentially to byte N still takes O(N), but seeking to the same position is... different. Naively, it is still just O(N) with a different constant, so that seek time still varies linearly with position, but in practice, depending on the disk, seeking can be regarded as actually taking constant (O(1)) time.

This is further complicated by the very real issue of the layout of the data on disk. It is simply linear in rather few filesystems these days. Uh... never mind, I'm not sure where I was going with that.

Try this thought experiment. You have an nGb file on a 500Gb hard drive. Your machine has 1Gb of memory. What is the BigOh complexity of reversing the lines of that file? I'd argue that reversing is worse than O(n), assuming that you do not have a second hard drive as well. I also see some validity in the argument that it IS O(n), treating disk seek time as a constant, even though it isn't.

I don't see that. For simplicity, pretend lines are all the same length, and that disk blocks hold an integral number of lines. Swap blocks 1 and N (reversing line order within blocks in RAM can be regarded as zero cost), then repeat for 2 and N-1, etc. The seek time is O(N) for the first swap, O(N-2) for the second, etc.

So the total seek time to reverse the entire file is O(N + N-1 + N-2 + ... + 3 + 2 + 1) = O( N^2 /2 ) = O(N^2).

Seems pretty straightforward.

I think the above thought experiment needs to be stated as either:

   reverse size O(n) file on an O(n) sized hard drive with O(1) cache

   OR:

   reverse size O(n) file on an O(n) sized hard drive with O(n) cache.

Only then it is clear which answer to give.

CategoryPerformance CategoryComplexity