Linear Shuffle Summary

Here's an attempt at a summary of the LinearShuffle page. There's lots more discussion on that page, but much of it is redundant and needs to have RefactorMercilessly applied to it.

In short, the objective is to take an array and shuffle it in linear time. Here's an obvious algorithm that's wrong:

Icon:

  procedure shuffle(x)
    every !x :=: ?x
    return x
  end
C:

  void shuffle(int *x, int n)
  {
    int i;
    for (i=0;i<n;i++)
      swap(x,i,rand(n));
  }
Short, elegant, clear, and wrong. These routines do not produce each of the possible n! orderings with equal likelihood. A simple proof goes like this. If n=3 then there are three swaps, each with three possible targets, giving 27 possible results. However, 3!=6 and 6 does not divide evenly into 27, so some of the outputs must be duplicated. Explicit numbers are given on the LinearShuffle page.

A correct version can easily be produced by developing a proof along with the code. It can run forwards or backwards, depending on your inclination.

Here's the forward version. We assume that the elements from 0 to i - 1 inclusive are a shuffled permutation of the elements from 0 to i-1 from the original array. At each stage, we swap the next element, i, with some randomly chosen element from the shuffled sub-array, thus extending the sub-array an element at a time, maintaining the invariant, and eventually terminating.

Could someone explain exactly why the invariant is maintained?

  void shuffle(int *x, int n)
  {
    int i;
    for (i = 1; i < n; i++)
      swap(x, i, rand(i + 1));
  }
And here's the backwards version. From those elements we have, choose one at random and remove it. Repeat until there aren't any left.

  void shuffle(int *x, int n)
  {
    for (;n>1;n--)
      swap(x,rand(n),n-1);
  }
I've deliberately left the proof of the algorithm as a clear indication rather than a laborious step-by-step formal verification to show how a mathematician thinks about proofs. The code needs to be checked against the definition of the algorithm.

In all the above, rand(j) is assumed to return a uniformly distributed random number from 0 to j-1 inclusive. Yes, we could pass in void * and a swap function making this effectively polymorphic but this version works on ints for simplicity.

Question: How would you write UnitTests for the above routines?


To avoid rehashing the same arguments over and over on this page, please add discussion below after reading LinearShuffle.


EditText of this page (last edited August 20, 2006) or FindPage with title or text search