Moved from MergeSort.
There are a number of dubious claims being made here that are not backed up by argument nor citation.
I suggest that any further claims that are contrary to the standard literature and standard introductory textbooks should indeed be backed up by either argument or citation.
It's also a very good idea to actually try out an algorithm before making any claims about it.
MergeSort can be done in place. Let us suppose the array is divided in two sorted halves. The idea is to grab the first sorted half and make it grow at the expense of the second sorted half.
Let i and j be the pointers to the start of each half. Whenever i is incremented, the final result grows. a[i] is the first element of the first half and a[j] is the first element of the second half.
If a[i] <= a[j] then i++; else begin
swap( a, i, j ); insert( a, j, n ); // insert a[j] into a[j..n]end
Did you just make this up on the fly?
No. See SchwarzMergeSortAlgorithm?.
If not, please cite which variation this is (i.e. who published and where, etc), and what its time complexity is.
I have no idea what time complexity it has. Who cares anyway? We all know that BigOh is only the worst case scenario and the interesting scenario is the average scenario. Although I was taught how to calculate both, I recognize that doing average analysis is beyond my limited mathematical capabilities. I prefer to do real tests to back up what I'm saying. -- GuillermoSchwarz
If so, that's unwise; there are often subtle bugs in attempted new sorting algorithm variants.
I would love to see if there are any bugs in my algorithm. See SchwarzMergeSortAlgorithm?.
And the usual simple in-place merge sort costs O(n^2), making it more of a curiosity than a practical tool. At first glance, this looks like what you may be offering up here, but I don't feel like doing a deep analysis of a random algorithm.
I took the time to verify that this algorithm is faster than QuickSort, but only for big n. This is exactly the same that Knuth said when comparing QuickSort to InsertSort when n=10.
I didn't say it absolutely isn't sound, but I am saying that it's easy to end up with subtle bugs. Have you at least tried it?
Also it's essentially useless to introduce a new (or variant) sorting algorithm without knowing its time complexity. Like I said, in the absence of analysis, there is existing reason to think that such a thing would be O(n^2)
You shouldn't expect readers of this page to either trust it blindly, nor to do the analysis for you.
Actually I just looked at it a little harder, and it is (1) broken; it omits essential work, and (2) once fixed, it is indeed O(n^2). Give it a try on some worst possible case data.
As it stood, it clearly wasn't intended to be complete. It conditionally inserts the lowest element from the second half into the first half, replacing one element there which is inserted into the second half. It is sound in the sense that it preserves the order of each half and doesn't use additional storage. It also looks inefficient because of the insert process, so I wasn't motivated to establish the time complexity with certainty. Subtle bugs can arise in almost any code, even Knuth's (occasionally).
The insert process can be done fast if:
About Knuth and O (BigOh) notation [Almost every top-level statement in this section is incorrect.]
According to Knuth, InsertSort is almost as bad as BubbleSort, which are O(n*n). QuickSort is much better but still O(n*n). Why is it better then? Because on average the DividePhase? is fast.
Well, on modern architectures (1999+) it turns out that Knuth was wrong, and he admitted it: InsertSort is not almost as bad as BubbleSort when you have HardwareCaches, InsertSort then performs almost as good as QuickSort
For a different opinion, see CachesDoNotAffectBigOhTimeComplexity.
because the MIX abstraction assumes that all machine instructions have constant cost.
I found it out by myself when bored and tested the sorting algorithms. I thought I had done something wrong but couldn't find out what. And then I read that Knuth gave an interview to explain why his work did not apply any longer on modern architectures. The reason given by Knuth is that the MIX model no longer applies, because MIX assumes constant cost for all its machine instructions. This is no longer true because of the caches: If you access an array in order (for 1..n) it is faster than if you do the same randomly. BubbleSort is more random than InsertSort and it shows for large n. I can't find the link now.
According to Knuth, the reason MIX is so important is that if he used a high level programming language, he would have no way to know the real cost of any operation.
About InPlace? MergeSort being broken
Actually I just looked at it a little harder, and it is (1) broken; it omits essential work, and (2) once fixed, it is indeed O(n^2). Give it a try on some worst possible case data.
What's the worst possible data in this case?
Don't know, but you might first want to take a look at the recent literature. A little web searching turns up:
Proximity Mergesort: optimal in-place sorting in the cache-oblivious model. (no direct link, but as of April 2004, available free via AcmWeb)
Practical in-place mergesort - http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html
That in-place MergeSort does the merge recursively. MergeSort is exactly like SillySort, but the Merge phase must be fast. I do not see how double recursion can be better than my approach that takes at between n/2 steps + n/2 * the cost of an insert. The cost of an insert is n/2 at most. If we are trying to calculate averages, n/2 increments + n/4 will be swaps + inserts (if data is random). The cost of the average insert is n/4, therefore the total cost of the merge phase is n/4 * n/4 = n^2/16. This is irrelevant, because it does not consider HardwareCaches. You can try the algorithm and see that it performs better.
Geesh, Knuth's work will NEVER be outdated. MIX was not the standard to which Big-O applies. We are talking minimal/average costs for various operations. If you get down to the Nitty gritty of coming up with very precise Big-O's, you just about totally miss the point of Big-O. Most of the good general purpose sort algorithms are on the order of O(N*logN). This is because it has been proven that ANY sort algorithm based on key comparison must do at least N*logN comparisons. -- GeraldLindsly
Well Gerald, you'll see, I'm not claiming that inplace MergeSort is better than O(n * log n). As you say, that's impossible for a comparison-based sorting algorithm. What I'm saying is that MergeSort beats QuickSort, I'm talking in terms of averages here. Please also note that QuickSort is O(n*n) because it's worse behavior is quadratic, while at the same time HeapSort and MergeSort are O(n *log n) even for their worse behavior.
Also I don't see how you can justify your claim that Knuth will never be outdated. -- GuillermoSchwarz
Gerald is exactly correct here. -- DougMerritt
[GS, please brush up on your theory - making elementary blunders detracts from the value of your valid points.]