# Big Oh

The Big-O notation (in mathematics named after Landau) describes the behaviour of a function for big inputs. It tries to capture the core of a function.

Let's make a few examples.

```  f(x) = x is in O(x)
f(x) = x^2 is in O(x^2)
f(x) = 100 x^2 + 13 x + 7 + sqrt(x) is in O(x^2)
```
To be a bit more mathematical: O(x) is the set of all functions which grow at most as c*x for some constant c. Thus, we also have
```  f(x) = x is in O(x^2) and O(x log x)
```
In this sense, O(.) is like the greater or equal sign for functions.

The notation is normally used when x is sufficiently large (approaches infinity), but can also be applied when x approaches some finite limiting value.

Programmers use Big-O to get a rough estimate of "how many seconds" and "how much memory" various algorithms use for "large" inputs: "MergeSort is O(n ln n) in time and O(n) in extra space. BubbleSort is O(n^2) in time and O(1) in extra space."

If your algorithm is O(2^n) in time, and your input has a size of n=64 or more, that's a large input. Big O notation immediately tells you there's no point doing a detailed analysis or timing measurement, much less writing it in assembly language -- it's going to take over a hundred years, even if you could get the inner loop to run in 1 clock cycle.

If your algorithm is O(2^n) in space, and your input has size of n=265 or more, that's a large input. Big O notation immediately tells you there's no point doing a detailed analysis or memory usage measurement -- if you could store 1 bit per electron, you would still need more electrons than exist in the universe.

If your input is always small -- 1 or 2 or 3 -- then Big-O notation is irrelevant.

For a humorous description of high-complexity algorithms, especially naive implementations, see OhMyGodComplexity.

(moved from BigOhNotation)

Big-O notation is a ways of talking about the asymptotic performance ("speed") of an algorithm -- basically how it performs for extremely large input sizes. In this arena, a lot of real-world measurements are abstracted away. If you have two algorithms for the same problem, one of which always takes twice as long as the other, they are considered equivalent from a Big-O point of view.

Here is an example of Big-O notation.

The running time of bubble-sort on a set of size n is O(n^2).

This means that for "large enough" sets, bubble sort's worst-case performance will be roughly proportional to the square of the size of the set.

The actual performance of bubble sort might be 2n^2 + 40n + 80 cycles, but the order notation conceals the coefficient of the highest-order term as well as all lower-order terms, and leaves only the (scaled) highest-order term.

The technical definition is:

f(n) is O(g(n)) is equivalent to: There exist constants x and k such that for all n>k, x*f(n) < g(n)

Example: What x and k work for the above example 2n^2 + 40n + 80 is O(n^2)? This means f(n) = 2n^2 + 40n + 80 and g(n) = n^2.

One Possible Answer: x = 1/4, k = 25 . It doesn't have to be tight bounds in order to determine the order.

Example: a + bn + cn^2 + ... + dn^s is O(what)?

Also note that 2n^2 + 40n + 80 is O(n^3). But it is obviously not O(n).

Not all functions are comparable. For instance f(n)=n and g(n)=n^(1 + sin(x)). f(n) is not O(g(n)) and g(n) is not O(f(n)). Also, order notation is only defined for functions that are "eventually nonnegative" -- that is, there exists some k that for all n>k, f(n) is nonnegative.

We can think of O(g(n)) as a set consisting of all the functions that asymptotically grow at the same rate as, or slower than, g(n). But most people use "is" or the equal sign instead of set membership.

O(g(n)) = { f(n) | f(n) is a function : There exist x,k such that for all n>k, x*f(n) < g(n) }

If we have f(n) is O(g(n)) and g(n) is O(f(n)) then we say that f(n) is Theta(g(n)) and g(n) is Theta(f(n)). Theta is slightly preferred over O because it is a more exact and tight measure. But sometimes O is all that can be proved.

When we say that an ALGORITHM is O(f(n)), we mean the running time of that algorithm.

When we say that a PROBLEM is O(f(n)), we mean the running time of the best algorithm for that problem. For instance, "Comparison-based sorting is O(n * log(n) )"

(For more information, see the classic textbook IntroductionToAlgorithms by ThomasCormen?, CharlesLeiserson? and RonRivest?. ISBN 0-262-03293-7 )

Big O notation expresses the asymptotic (for large N) behavior of an algorithm. Clearly this only approximates the actual behavior (time and space) of algorithms coded for real computers. Those who performance tune computer codes are advised to be aware of both asymptotic behavior and actual measures.

Measurement on real computers may expose theoretical analysis to have counted the wrong thing. For example, sorting algorithms are often compared by counting compare operations while measurements could easily be dominated by caching or paging behavior. Here is another example.

Thus, though Big O is a powerful tool that a performance tuner must both acknowledge and understand, life is rarely so simple in practice. Its a bit like learning kinematics and then expecting all cows to be spherical point masses in a vacuum. If large test realistic data sets are unavailable during testing, awareness of how the program will theoretically scale under load is critical. If during testing actual test sets are available that represent both current and future sizes loads then theoretical predictions of Big O are always overruled by a correctly performed measurement. Be aware that providing realistic and worst case loads and datasets is probably at least as arcane a process as computing Big O.

As was pointed out above, even for a simple (sort) algorithm, "which N?" becomes an issue.

In practice,I have needed to compute the Big O for an algorithm in which there were many user configurable free variables, N,M,P,Q,R,S. A basic Big O analysis showed big O was implicitly, obviously, and inescapably, O(c^Q), but Q was always one of (2,3,4, or 5). It was O(N) Linear in N where N = 1..1E6, It was polynomial in M, O(M^Q), It was weird things I could not work out in P,R,S,... Thus even if you held all else still while one variable in the sane range of values (say 1-10,000) some other currently fixed variable was an exponent in the constant part of the expression. Uggh. In the end I settled for the following analysis. "The last 10 times I used computation Algorithm A, on average, the computation Algorithm B would have been 1000 times faster. The process now appears likely to be IO bound."

That example does seem quite complicated. How often does the professional performance tuner encounter such awkward cases?

That is the most complicated example I (as a sample of one) ever met, but it does point out a class of complication invisible to many (would be) part-time performance tuners. Hence my claim for the validity of the statement "life is rarely so simple in practice".

Giving one extreme "counter-example" doesn't show that simplicity is rare. To do that, you would need to show that cases from the "invisible class" substantially predominate.

In my experience, the most common way in which life is rarely as simple as Big-O, is that often N is just not SufficientlyLarge. Often, sufficient improvement is easily achievable by changing the k that is left out of BigO analysis. Often, BigO of 'comparison ops' is not relevant when BigO of 'cache misses' is. Optimizing the inner loop of a dot product speeds up many Machine Learning algorithms by a lot (sufficiently), but makes no change to BigO.

• Your case seems to be primarily against misuse of BigO, rather than against BigO per se. A dot product is not an obvious case for optimization - unless you have (specialized) knowledge of how to do it, and also know it's a principal component of the algorithm(s) used.

• I would say that BigOh in terms of computer algorithms is generally "misused". Most BigOh analysis is done with the assumption of random access machines, but most non-embedded machines are decidedly not random access equivalent. They have many levels of caching, including hardware caching and virtual memory. For example, this makes two compare operations not equal, and thus your BigOh analysis is not (as) reliable. For example, B-trees exist because the important operation is disk access, not comparisons. In my current product, we changed a matrix from column major to row major to improve locality, thus cache hits, thus performance. The speed boost was huge. The moral is BigOh can be a useful tool, but only if you identify the correct function modeling its performance characterist, and this is sometimes nontrivial, so always measure with real data, including worst case and average case.

Also, consider "pruning" and the (MachineLearning) NN back-propagation algorithm; what is the change on the BigO created by "pruning"? What are we interested in measuring the change in? We could count the percentage of Neurones pruned and decide that this simply modified the k and not the BigO. The BigO after all is in terms of numbers of training cases. Even if we measured BigO in terms of Network architecture size, pruning still is only a kind of improvement that may be viewed as simplifying by a factor of k and not changing the BigO. It is, however, well-known that in practice pruning radically increases the performance of a back-propagation algorithm. This happens, I believe, because it increases the rate of convergence.

... and possibly BigO can be applied to that rate of convergence. Whether that is the case or not, the example seems to be far from typical. BigO is not taught as a universal technique with no limitations.

In my limited experience, I expect that for many (most) applications programmers making business systems, the computation of BigO is no where near as difficult as my examples above, but I still have seen it got wrong way too often. On one business system, the case was made that our new GUI front end would bog down the servers. I pointed out that on the use cases we had, the new front end "A" hit the data base fewer times "B". The back end no longer had to run the 3270 terminals and respond to real time user inputs.

You say "the case was made..." - misusing BigO? If so, why not explain the misuse as well as showing improvements in particular cases?

Note also: as has been subtly hinted at above, performance tuning is always evil; just don't do it. The first step in performance tuning is making a measure to prove that you just must do it. You must also then stop tuning as soon as it goes fast enough.

In many cases, near-optimal buffer sizes may be selected by default. When they're not, is it evil to specify near-optimal sizes? And why choose sizes known to be good enough, when other sizes are already known to be better?

It seems odd to introduce BigO, but then concentrate on its misapplication and hint that it's evil anyway, along with optimization in general. I would suggest a typical sequence is "notice (or anticipate) poor performance", then "identify the problem area", and then "alleviate the problem (using BigO if appropriate)".

When I was taught BigOh at Columbia University CS in the mid-1980's, the emphasis was on "tight upper bounds" (see some examples above that aren't so tight), and on classifying a proposed computation in one of three categories: "possible", "feasible" or "efficient". "Possible" covers all that's theoretically computable, but where computation time or space grows exponentially with input size. "Feasible" includes polynomial complexities, while "Efficient" covers the gamut of logarithmic, linear and constant spaces or times. BigOh is "big picture", not intended for sub-optimizations. For guys whose only take on optimization is "write it in assembler", BigOh could be an insightful study. -- WaldenMathews

DeletedUnlessDefended: I trimmed the sentence Note that if f(x) is allowed negative values, the unsigned value must be in O(g(x)) for one to say f is in O(g(x)); g(x) has to be positive. from the description. Since the time and space used by any algorithm is at minimum zero, it seems silly to deal with negative numbers. -- DavidCary

It's about bounding of a function, not just time- and space-complexity

Actually, BigOhNotation isn't defined to be specifically about the time or space that an algorithm takes. In NumericalAnalysis, for example, the same notation describes the error bound of a function that approximates another function. The error bound is itself a function of the size of the interval at which the function to be approximated is sample. A high exponent in the BigOh, e.g. O(h^5), is good because usually the interval width h is less than 1, and a number less than one raised to high power becomes very tiny indeed.

My definition, is that BigOh notation specifies (or gives) abstract run-time performance rather than clock-time performance. "Abstract" here referencing a purely logical evaluation, akin to how number represents an abstract quantity, rather than 1 milliliter, for example. For any particiluar implementation, then on a machine, the performance will be the BigOh operand multiplied by some constant factor peculiar to the specific machine (Intel 1.8GHz Pentium, for example). -- MarkJanssen

To an extent, yes. In ComputerScience, Big O describes the relative time complexity of algorithms given some input n. O(1) means an algorithm whose time is constant relative to n, O(n) means the algorithm's time is related linearly to n, O(n**2) means its time is the square of n and so on. However, given two algorithms -- one O(1) and one O(n!) -- and some n, it's entirely conceivable for the O(1) algorithm to take days to complete (though it will always take the same number of days, regardless of n) whilst the O(n!) algorithm completes in milliseconds. Big O only gives relative performance proportional to n, not absolute performance.