Tim Sort Simplified

I don't understand TimSort fully. It has a lot of optimisations that make it fast, but hard to understand from source code alone.

This page, however, contains my interpretation of the basic operating principal in TimSort, which is find ordered runs, and merge them.

EDIT: This algorithm actually came before TimSort, which is an improvement on it.

private {
sealed abstract class Direction;
case object Asc extends Direction;
case object Dsc extends Direction;
case object Init extends Direction;
}

object RunSort? { // <-- My current idea for the name of this simplified TimSort. // Good idea? Bad idea? Not sure? // EDIT: I checked Wikipedia, this is actually called NaturalMergeSort?. def sort[T, S <: Seq[T]](input : Seq[T]) = { val runs = new ListBuilder?[(Direction, ListBuilder?[T]]) var run = new ListBuilder?[T] var prev = input.head run += prev var dir = Init for (curr <- input.tail) { val continueRun = dir match { case Asc => cmp(prev, curr) // If ascending, look for elements >= case Dsc => cmp(curr, prev) // If descending, look for elements >= in the opposite order. // This (probably) makes it stable. case Init => dir = if (cmp(prev, curr)) // If neither, find out what the next run will be. Asc else Dsc true // ALWAYS continue a run with only one element. } if (!continueRun) { runs += (dir, run) dir = Init run = new ListBuilder?[T] } run += curr prev = curr } // And now, merge as normal. } }

Notes: - Run-finding can be done in-place, but that requires mucking about with mutability, and I wanted to show the strategy, rather than implementation details. In essence, you would record the start and end of all runs, as in timsort. This has O(N) space complexity, O(1) extra space if you can record said times in a bitset, like SmoothSort. Also like timsort, performs faster the more ordered the input is. However, it's easier to understand than TimSort, as it never uses insertion sort, and thus probably won't work well for small runs.

Worst-case time complexity: O(n log n), same as mergesort. However, the merge phase will start with sub-lists of length two, so it could be slightly faster, as it won't recur quite as deeply. On the other hand, if comparison is expensive, this will do O(n) comparisons, and then merge the sublists.

Best-case time complexity: O(n), when there's only one run. Average-case time complexity: I have no idea. Somewhere between O(n) and O(n log n), I'd assume.

Depends strongly on the average type of input you're giving it, I expect.


EditText of this page (last edited May 29, 2011) or FindPage with title or text search