Merge Sort Discussion

There are a number of dubious claims being made here that are not backed up by argument nor citation.

I suggest that any further claims that are contrary to the standard literature and standard introductory textbooks should indeed be backed up by either argument or citation.

It's also a very good idea to actually try out an algorithm before making any claims about it.

MergeSort can be done in place. Let us suppose the array is divided in two sorted halves. The idea is to grab the first sorted half and make it grow at the expense of the second sorted half.

Let i and j be the pointers to the start of each half. Whenever i is incremented, the final result grows. a[i] is the first element of the first half and a[j] is the first element of the second half.

If a[i] <= a[j] then i++; else begin

  swap( a, i, j );
  insert( a, j, n ); // insert a[j] into a[j..n]

end

-- GuillermoSchwarz

Did you just make this up on the fly?

No. See SchwarzMergeSortAlgorithm?.

If not, please cite which variation this is (i.e. who published and where, etc), and what its time complexity is.

I have no idea what time complexity it has. Who cares anyway? We all know that BigOh is only the worst case scenario and the interesting scenario is the average scenario. Although I was taught how to calculate both, I recognize that doing average analysis is beyond my limited mathematical capabilities. I prefer to do real tests to back up what I'm saying. -- GuillermoSchwarz

If so, that's unwise; there are often subtle bugs in attempted new sorting algorithm variants.

I would love to see if there are any bugs in my algorithm. See SchwarzMergeSortAlgorithm?.

And the usual simple in-place merge sort costs O(n^2), making it more of a curiosity than a practical tool. At first glance, this looks like what you may be offering up here, but I don't feel like doing a deep analysis of a random algorithm.

I took the time to verify that this algorithm is faster than QuickSort, but only for big n. This is exactly the same that Knuth said when comparing QuickSort to InsertSort when n=10.

Did he prove that? I doubt it, because you can't do asymptotic analysis when n=10.
- On modern architectures that n varies from 15 to 30. Besides DonKnuth proposes to do a big InsertSort at the end, while I've found that doing small InsertSorts where needed gives the best performance. -- GuillermoSchwarz

Apparently, there are moderately recent in-place mergesorts that have fairly good time complexity (although not as good as quicksort), but I was under the impression that they were far more complex than the little tweak above.

MergeSort has better time complexity than QuickSort, namely O(n *log n) vs O(n*n). QuickSort is faster in practice because the constant for QuickSort is smaller than the constant for MergeSort (that's the explanation usually given, but not proven).

It seems evident the above tweak is sound, and remains sound if "less than" is replaced by "less than or equals" (now done), which would be sensible. I'm not sure what the typical and worst case time complexities are.

I didn't say it absolutely isn't sound, but I am saying that it's easy to end up with subtle bugs. Have you at least tried it?

Also it's essentially useless to introduce a new (or variant) sorting algorithm without knowing its time complexity. Like I said, in the absence of analysis, there is existing reason to think that such a thing would be O(n^2)

You shouldn't expect readers of this page to either trust it blindly, nor to do the analysis for you.

Actually I just looked at it a little harder, and it is (1) broken; it omits essential work, and (2) once fixed, it is indeed O(n^2). Give it a try on some worst possible case data.

As it stood, it clearly wasn't intended to be complete. It conditionally inserts the lowest element from the second half into the first half, replacing one element there which is inserted into the second half. It is sound in the sense that it preserves the order of each half and doesn't use additional storage. It also looks inefficient because of the insert process, so I wasn't motivated to establish the time complexity with certainty. Subtle bugs can arise in almost any code, even Knuth's (occasionally).

The insert process can be done fast if:

The point of insertion is found fast. This means using BinarySearch.
The shift is done in machine language. Most languages have a way to call the underlying machine instruction to do the shift. In Java you can use System.arraycopy, in UNIX/C you can call memmove, me thinks.

I was being polite when I said "subtle". I don't mean like Knuth's bugs. I mean, it's very easy to have glaring, nasty bugs in sorting and searching algorithms, and yet overlook them.

There's a reason why Knuth is so famous. :-) And for why his books are considered so useful.

About Knuth and O (BigOh) notation [Almost every top-level statement in this section is incorrect.]

According to Knuth, InsertSort is almost as bad as BubbleSort, which are O(n*n). QuickSort is much better but still O(n*n). Why is it better then? Because on average the DividePhase? is fast.

Well, on modern architectures (1999+) it turns out that Knuth was wrong, and he admitted it: InsertSort is not almost as bad as BubbleSort when you have HardwareCaches, InsertSort then performs almost as good as QuickSort

For a different opinion, see CachesDoNotAffectBigOhTimeComplexity.

because the MIX abstraction assumes that all machine instructions have constant cost.

Not quite, it doesn't, but at any rate, machine instruction costs varied decades ago, and nothing has changed. He chose the abstract MIX architecture for pedagogical reasons. I didn't say equals cost, I said constant cost. The difference is that in the first case the cost doesn't change as architectures change, while in the second case, for the same machine, the cost of an operation is the same all the time. When a HardwareCache is in place, the same operation may take longer because the data is not in the cache.

This is not true on architectures that have paging and big and fast L1 and L2 caches: Locality is a lot more important. This explains why on modern architectures InsertSort is closer to QuickSort than to BubbleSort, while on old architectures InsertSort was closer to BubbleSort. Incorrect Why?

This is just plain confused. And incorrect. Look up InsertionSort.

So you see, using O (BigOh) notation is outdated because it does not consider HardwareCaches. I don't know if Knuth, or another mathematician, has came up with a better abstraction yet. Incorrect Why?

-- GuillermoSchwarz

I found it out by myself when bored and tested the sorting algorithms. I thought I had done something wrong but couldn't find out what. And then I read that Knuth gave an interview to explain why his work did not apply any longer on modern architectures. The reason given by Knuth is that the MIX model no longer applies, because MIX assumes constant cost for all its machine instructions. This is no longer true because of the caches: If you access an array in order (for 1..n) it is faster than if you do the same randomly. BubbleSort is more random than InsertSort and it shows for large n. I can't find the link now.

Timing results can show many things. But they cannot and will not show that the field of complexity analysis in general, nor Knuth's Art of Computer Programming in particular, are wrong. You're mixed up. Why? The important thing are results, not theories. If theories can't predict results, they are useless.
Knuth is replacing the old-fashioned MIX with a newer MMIX Risc-style machine in future volumes; that appears to be what you're thinking of. But that has nothing to do with any of these other issues. What does MMIX have to do with HardwareCaches?

P.S. BigOh/BigTheta is not outdated; each has always had, and always will have, different appropriate uses. Sometimes one is used where it shouldn't, sometimes people forget that one must look at where the curves cross, etc, etc, but this has all been known from the beginning. It's just that people aren't always careful to apply what is known.

According to Knuth, the reason MIX is so important is that if he used a high level programming language, he would have no way to know the real cost of any operation.

True, but irrelevant. This is merely about getting exact timings rather than asymptotic relationships. I care about asymptotic relations, not exact timings, because that's impossible, AFAIK. No, this is not true. Look at the back pack problem, well known to be NP-complete. A O(n^2) pseudo-polynomial solution exists if you do it by dynamic programming. It assumes that the underlying elementary operations (adding/subtracting etc.) are in O(1) - but they are not.
- Yes, they are. On typical CPUs an addition requires 1 clock cycle. Nor would this change if it required 32 clock cycles, since O(1) == O(32). Furthermore, all known O(n^2) solutions to NP-complete problems only probabilistically yield an optimal answer, so that's irrelevant.

If MIX is broken, so are BigOh and BigTheta. Interestingly enough, most ComputerScientists apply BigOh and BigTheta to higher level languages too. The problem is that if it is broken for MIX, then it is broken for any other higher level language. What we need is a new abstraction that takes into consideration program locality. Methinks. -- GuillermoSchwarz

Incorrect. Why?

About InPlace? MergeSort being broken

Actually I just looked at it a little harder, and it is (1) broken; it omits essential work, and (2) once fixed, it is indeed O(n^2). Give it a try on some worst possible case data.

What's the worst possible data in this case?

-- GuillermoSchwarz

Don't know, but you might first want to take a look at the recent literature. A little web searching turns up:

Proximity Mergesort: optimal in-place sorting in the cache-oblivious model. (no direct link, but as of April 2004, available free via AcmWeb)

Practical in-place mergesort - http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html

That in-place MergeSort does the merge recursively. MergeSort is exactly like SillySort, but the Merge phase must be fast. I do not see how double recursion can be better than my approach that takes at between n/2 steps + n/2 * the cost of an insert. The cost of an insert is n/2 at most. If we are trying to calculate averages, n/2 increments + n/4 will be swaps + inserts (if data is random). The cost of the average insert is n/4, therefore the total cost of the merge phase is n/4 * n/4 = n^2/16. This is irrelevant, because it does not consider HardwareCaches. You can try the algorithm and see that it performs better.

-- GuillermoSchwarz

Geesh, Knuth's work will NEVER be outdated. MIX was not the standard to which Big-O applies. We are talking minimal/average costs for various operations. If you get down to the Nitty gritty of coming up with very precise Big-O's, you just about totally miss the point of Big-O. Most of the good general purpose sort algorithms are on the order of O(N*logN). This is because it has been proven that ANY sort algorithm based on key comparison must do at least N*logN comparisons. -- GeraldLindsly

Well Gerald, you'll see, I'm not claiming that inplace MergeSort is better than O(n * log n). As you say, that's impossible for a comparison-based sorting algorithm. What I'm saying is that MergeSort beats QuickSort, I'm talking in terms of averages here. Please also note that QuickSort is O(n*n) because it's worse behavior is quadratic, while at the same time HeapSort and MergeSort are O(n *log n) even for their worse behavior.

Also I don't see how you can justify your claim that Knuth will never be outdated. -- GuillermoSchwarz

Gerald is exactly correct here. -- DougMerritt

[GS, please brush up on your theory - making elementary blunders detracts from the value of your valid points.]