Linear Shuffle

Shuffle a deck of cards in linear time. DonKnuth suggests attaching a random number to each card and sorting. But that would be O(Nlog N), more than linear.

(A good sorting routine might be faster than the algorithms below if the data is stored on disk or swapped out, because sorting usually makes more sequential accesses to the data -- JoergKreienbuehl?)

In fact, DonKnuth provides the correct form of this algorithm, as noted later on this page (that is: starting at the top of the deck, exchange each card with the card at a random position from the current position to the bottom of the deck. Although I may not be phrasing it correctly. This is in Volume 2 of TheArtOfComputerProgramming. -- BrettNeumeier

See also LinearShuffleSummary

(Caution: Cool trick but not completely even shuffling)

[Knuth's algorithm] It's harder to code too. That's why the trick DaveDodson taught me is so cool. Simply exchange each element of the deck with a random element of the deck. "But some elements will get moved more than once, maybe many times," I protested. "Yes," he said, "but each element is equally likely to appear in any position; what better definition of shuffle?"

It works, of course, and is easy to code in just about any language. But here is the best expression I've ever found.

every !deck :=: ?deck

Which reads, "every element-of deck exchange-with random-element-of deck." I wrote this in Icon. It's my small contribution to RalphGriswold's Icon book.

-- WardCunningham

http://www.cs.arizona.edu/icon/progcorn/pc_inl09.htm

The algorithm is cool (I use it), but it is not true that each element is equally likely to appear in any position. Consider shuffling a deck of 3 cards, labeled A, B, C. There are six possible orderings of the cards:

ABC, ACB, BAC, BCA, CAB, CBA.

The algorithm performs 3 random swaps. For each random swap, there are 3 possible outcomes. So there are 27 equally likely outcomes. These 27 possibilities cannot be mapped on the 6 orderings, so that each is equally likely, because 27 is not a multiple of 6. Some ordering must be (marginally) favored over others, with respect to the initial ordering.

The technique is a good choice in most settings, but it is not truly random. The distribution is as fair as it can be. In this case there are three "favored" combinations with probability 5/27 and three "disfavored" ones with probability 4/27. Which are favored depends on how the random number generator is mapped to the index selection. Here's one possible result:

abc 4/27 acb 5/27 bac 5/27 bca 5/27 cab 4/27 cba 4/27

Note that for a deck as small as three, a favored combination has a probability that is 25% better than a disfavored ones. And, e.g., b lands in the first position 25% more often than in third.

It is not hard to write a linear shuffle which produces every possible ordering with an equal probability (assuming a good random number generator).

I'll write it in C, since that is what I know best:

  for(i = 0; i < 52; i++) {
j = i + (random() % (52 - i));
tmp = card[i];
card[i] = card[j];
card[j] = tmp;
  }

The idea is to partition the deck into shuffled and unshuffled sections. The shuffled cards are cards[x], where x < i. Unshuffled cards are cards[x], where x > i. Then, just randomly pick a card from the unshuffled section and add it to the top of the shuffled section.

This is still linear, and as far as I can tell, just requires one additional subtract per loop iteration. It's interesting to compare these statistics to a HandShuffle.

-- DonDodson

Forgive me, but LinearShuffle sounds like a great name for a dance step for line dancing. Imagine applying the algorithm of DonDodson to a group of dancers in a row, each dancer's movement dictated by the outcome of each iteration of the loop. All we need now is a diagram of the footwork! The LinearShuffle consists of two basic moves: the Unshuffled Shuffle and the Shuffled Shuffle. Dancers use the steps to either remain in place, or to migrate to the top of the shuffled section of the line dance. Sounds like a PLoP activity. -- DonOlson

Don, this is a wonderful idea. We also need a mechanism to generate the random component in each iteration. Since this is not a computer activity, the line dance LinearShuffle has the benefit of true randomness, if it so chooses. How would you do that? Rolling dice might impede the footwork, and anyway dice don't generate quite big enough numbers. Shuffling real decks of cards might work, but there's the mess of dropping cards when you're trying to shuffle and shuffle at the same time. I'm afraid I'm stuck on this one. -- WaldenMathews

The Mathematical Argument

The random choice variant of this algorithm is incorrect because it produces a non-equal distribution of the card permutations (as AlistairCockburn points out for the N=3 case). In general, there are N! permutations, and the random choice algorithm selects among them in N^N (equally likely) different ways. The only way this could be a fair shuffle is if N! divides N^N without remainder. This can only happen when N <= 2 (N-1 does not divide any N greater than 2). -- MikeKoss

Well, I did the math last night and confirmed TomCargill's analysis of my algorithm. Working it out made it clear to me what was going wrong. It sure looks like DonDodson's enhancement fixes the problem. Anyway, I coded up both versions and ran 27000 trials each using identical Random streams (r). My version ...

  1 to: a size do: 
[:i | a swap: i with: (r next * a size) floor + 1].

tallied up as follows ...

abc 4024 acb 4973 bac 4976 bca 5055 cab 4012 cba 3960

which agrees closely with Tom's prediction. (4/27=4000 and 5/27 = 5000)

Don's modification, which I coded as ...

  1 to: a size do: 
[:i | a swap: i with: (r next * (a size - i + 1)) floor + i].

and tallied as ...

abc 4510 acb 4480 bac 4544 bca 4515 cab 4455 cba 4496

looks pretty even to me.

I think Alistair missed that an element could be exchanged with itself. Those who count alternatives will note that Don's version reduces the number of choices with each iteration creating a total of N! possibilities.

-- WardCunningham

Actually the ChiSquared of those numbers is 1.044 (expected 6+-4.9), indicating that the spread is to even. Did you pick a good one Ward? Or is ChiSquared malplaced here?

In the spirit of the original (from above):

every !deck :=: ?deck

A fair IconLanguage version:

every i := *deck to 1 by -1 do deck[?i] :=: deck[i]

This also restricts the choice as time goes on; unfortunately, there is no equivalent to the PerlLanguage splice() in IconLanguage; their lists are DoubleEndedQueues, and elements cannot be inserted or deleted from the center.

(I asked DaveDodson to recall our conversation of nearly twenty years ago. This is his reply. It seems I misunderstood him at the time. -- WardCunningham)

The proper algorithm is to exchange the n-th element with a random element in the first n, then decrease n by 1 and repeat the process until n = 1. In Fortran, with random() being a function that returns a random number between 0 and 1, this is:

do i = n, 1, -1
j = i * random() + 1
t = a(i)
a(i) = a(j)
a(j) = t
end do

It would be incorrect for the second line to read

j = n * random() + 1

Hope this helps.

These solutions only serve to illustrate that if you have a perfect random number generator, you have no need to 'shuffle' anything; simply extract a random element from your list of elements as needed. The original problem was:

: Shuffle a deck of cards in linear time.

"Shuffle" could be more realistically defined in terms of human experience as:

: Use an imperfect random number generator to create a distribution which approximates true randomness.

Try doing that in linear time. -- AndyPierce

See ImprovingRandomNumbers.

With shuffling, you can't get the same item twice. With a perfect random number generator, you can. Unless "perfect" means "each number comes up exactly once" which means "not very random at all." -- DaveHarris

One could use a random number generator, *if* one can "extract" (remove without replacing) a random element in O(1) time. The problem is that most extraction schemes are more like O(N/2) time, where N is the number of elements. (One can probably improve the deletion time, but one also has to be able to efficiently choose a random element. Optimizing both of these seems hard.) -- anon.

Actually, this is the insight of the linear shuffle algorithm. If you don't care about corrupting the order of a list (such as the unshuffled cards), you can extract any given element in O(1) time by swapping it out with the final element in the list and decrementing the list size. [This made it all clear - see below.] I wonder if this additional mixing in the "unshuffled" part of the deck could be utilized somehow to decrease the shuffling time to faster than linear? -- AndyPierce

: (Well, you don't have to shuffle the last card, but that only gets you O(N-1) (still linear). I doubt a less-than-linear shuffle is possible unless you have an operation which can move N elements in less than O(N) time. -- anon.)

On the other hand, modern computers are fast enough for random extraction with small N. I recently used a random-extraction loop in Perl to shuffle 8000+ elements. Approximately 16 million (8000 * (8000/2)/2) 32-bit words were moved as elements were extracted. It took a few seconds on a slowish computer. For a 52-element array the same loop should take a few milliseconds at most. -- anon.

Ah! There's an algorithm lurking behind this page, which you need to know to understand what the page is about. It's something like:

bool alreadyPicked[52];
int pack[52];
for (int i = 0; i < 52; ++i)
alreadyPicked[i] = false;
int j = 0;
while (j < 52) {
int card = rand() % 52;
if (!alreadyPicked[card]) {
pack[j++] = card;
alreadyPicked[card] = true;
}
}

In other words, pick a card at random and add it to the pack if it hasn't already been picked. The bool array gives you removal without replacing of a random element in O(1) time. People really do write things like this; it's obvious and fair. The problem is that when you pick the last card, almost all the cards have already been picked, so your chance of hitting the last free card is 1/52. Which makes this O(N*N) instead of O(N) - decidedly non-linear. LinearShuffle is about not making this mistake.

: That is one way to attempt an optimized (but non-linear time) shuffling. In high-level languages with more array/collection support, it is even easier to write an obviously correct shuffle. Here's a sample in Perl:

  # @deck is the old unshuffled array
  @deck = 1..52;
  @shuffled = ();  # an empty array
  $numcards = 52;
  while ($numcards > 0) {
push(@shuffled, splice(@deck, (rand $numcards), 1));
$numcards--;
  }
  # @shuffled contains the shuffled cards, @deck is empty

: The key is that the "splice" command removes the target element from the array. (Each time the while loop executes, @deck contains one less card.) The "push" command then adds that extracted card to the shuffled deck.
: This is not a linear algorithm, since "splice" takes O(N) time to execute. (It needs to move all the array items after the splice.) However, the O(N) operation of "splice" is usually implemented with a memory-move primitive which is extremely fast (especially since the entire deck array should fit in the CPU caches). Here are some possible cases:

Worst case: 52 * (51/2) = 1326 memory moves in "splice" (the first card of @deck is always chosen, and all other cards need to be moved)
Average case: 52 * (51/2)/2 = 663 moves (a card in the middle of @deck is chosen, so half the deck needs to be moved)
Best case: 52 * 0 = 0 moves (the last card is always chosen, so no moves are needed)

: When I first used this to sort 8000+ elements, I didn't do all the analysis above. I was willing to let the program run for several minutes if necessary - I could use the time to catch up on RecentChanges. I was pleasantly surprised that it was so fast. (The whole operation of reading in 8000 lines of text, shuffling them, and writing them to 16 output files took less than 10 seconds.)
: The moral of this story is that the acceptable N for O(N*N) is often big enough for one's application. (Beware if the application's N can grow, however.) -- anon.

After a recent comment, I realized that the linear shuffle was confusing because it combines two optimizations:

The linear-time optimization of extracting a random element, then compacting the array (filling in the hole) in O(1) time.
The space optimization of using the same array for both the initial and shuffled deck.

Here is a modified version of the Perl code which should work in linear time:

  # @deck is the old unshuffled array
  @deck = 1..52;
  @shuffled = ();  # an empty array
  $numcards = 52;
  while ($numcards > 0) {
push(@shuffled, splice(@deck, (rand $numcards), 1, $deck[$numcards-1]));
pop(@deck);  # optional (keeps $numcards == number of cards in @deck)
$numcards--;
  }
  # @shuffled contains the shuffled cards, @deck is empty.

(This presumes "splice" does the obvious optimization of replacing rather than deletion and inserting.) The key is splicing in the last element to replace the selected one, rather than just removing the selected element. In this version, instead of moving all the following cards (average (N/2)/2 moves), only the last card is moved into the "hole" left by the selected card. -- anon.

Or, in mildly-obfuscated Perl:

 # If this is obvious, you need help.
 @d=1..52;@s=();push(@s,splice@d,rand$#d+1,1,$d[$#d]),pop@d until$#d<0;

PerlGolf

 # If this is obvious, you need help.
 @d=1..52;@s=();push(@s,splice@d,rand@d,1,$d[$#d]),pop@d while@d

You don't need to predelare @s, and you can replace $d[$#d] with $d[-1] which is both shorter and slightly more legible!

 @d=1..52;push(@s,splice@d,rand@d,1,$d[$-1]),pop@d while@d

If you allow the hack of declaring @d to be one larger than you need so you can pop it off straight-away, then you can shorten further to:

 @d=1..53;push@s,splice@d,rand@d,1,$d[-1]while pop@d

How about this to shuffle it in place:

 @d=1..52;$r=rand$_,($_,$d[$r])=($d[$r],$_)for@d

You could also use:

 @d=1..52;splice@d,rand@d,0,pop@d for @d

but this suffers from the 4000/5000 split problem from above.

I have this algorithm.

I did a Prolog version of a LinearShuffle about 6 years ago:

http://www.csci.csusb.edu/cs320/prolog/lotto93.plg

You'll find my use of the idea of LinearShuffle as a way to maintain a collection of sets of items in an array in pages 201 to 203 of Software - Practice and Experience, Volume 7 (1977), entitled "Efficient Storage for Amorphous Data".

-- DickBotting

The file seems to have moved. A wroking archived version here: http://web.archive.org/web/20010430185721/http://www.csci.csusb.edu/cs320/prolog/lotto93.plg

How about sublinear shuffling... Here is a bit of Java for a change:

  public class Shuffler {
private int max;
private int[] deck;
public Shuffler(int[] deck) {
this.deck = deck;
max = deck.length;
}

public int next(){
int i = rand(max);
int tmp = deck[i];
deck[i] = deck[--max];
deck[max] = tmp;
return tmp;
}

private int rand(int i){return (int)Math.floor(Math.random()*i);}
  }

This just does what is described above - however it only shuffles on demand which is an advantage in games like blackjack where the rules call for only a part of the deck to be used. -- JanLarsen

Finally. Now my computer won't hang when it's trying to shuffle 52 element lists. -- ShamelessSarcastic?

Fair version of LinearShuffle in Python

import random

def Shuffle(list):
for i in range(len(list)):
n = random.randrange(i, len(list))
list[i], list[n] = list[n], list[i]

counts = {}
for i in range(60000):
list = ['A', 'B', 'C']
Shuffle(list)
key = ''.join(list)
counts[key] = counts.setdefault(key, 1) + 1
print counts

{'BCA': 9929, 'CAB': 10025, 'BAC': 10041, 'CBA': 10065, 'ABC': 9947, 'ACB': 9993}

This implements the correct algorithm where some swaps are actually leave-alones. (See above for implementations in other languages.) -- SteveHowell

Or trivially:

import random
deck = range(decksize)
random.shuffle(deck)

Since the random module now includes a shuffle in place component in the latest python distribution. I can't speak for the correctness or the optimalness of implementation though. :) -- AndyPierce

C++ version:

 #include <algorithm>
 ...
 random_shuffle(deck.begin(), deck.end());

The STL doc page (http://www.sgi.com/tech/stl/random_shuffle.html) states that it runs in linear time and that it's the algorithm described in Knuth volume 2. -- AmitPatel

Original discussion moved to HandShuffle -- BelTorak

If these rand() and random() functions are the standard c library calls, they generate a pseudorandom number in the range 0 to 32767. Performing a modulo on the output of this results in an uneven distribution for almost all modulos. Take mod 10 for example, the contribution of 0-32759 are even, but then 32760 through 32767 boost 0-7 higher than 8 and 9. Similar reasoning applies to any modulo that doesn't evenly divide 32768 (i.e. any non-power-of-2).

Well, of course is is really in range between 0 and RAND_MAX, which happens to be 2147483647 on my system. The same argument still applies, although the larger RAND_MAX is, the smaller the deviation is. More fundamentally, any fixed mapping of 0..RAND_MAX-1 onto 0..9 will be "unfair" in this way, unless RAND_MAX is a multiple of 10. A correct way to get a number between 0 and 10 is the following:

  do {
n = rand() % 16;  /* assuming RAND_MAX is a multiple of 16 */
  } while (n >= 10);

For some reason, which I remember Knuth explains, RNGs create numbers that are more random in the higher bits than in the lower ones. So, i'd write such:

  do {
n = (rand() / (RAND_MAX/16));
  } while (n >= 10);

Of course, if your language/library can return a number in a range, it's probably optimized for these things.

If you had a function p(n) which mapped from numbers in the range 0 ... N-1 back onto itself and which was one-to-one (ie was a permutation), then rather than shuffling and dealing, you could count, apply p to the index, and fetch that item from the list. Furthermore, if this function was parameterisable and random(ish), this would even be useful. Such functions can be found in cryptography; DES, for example, is such a function on the set of numbers in the range 0 ... 2**64 - 1, parameterised by a 56-bit key. If you have 256 elements, you might try an offset modular multiplication: p(n) = (((n + 1) * (k + 1)) mod 257) - 1, for some value k in 0...255. There is probably no point to any of this.

Is there some information-theory reason why the fastest shuffle algorithm is O(N), while the fastest sorting algorithm is O(NlogN)? That is, is there a mathematical proof that sorting must be slower than shuffling? --AndyPierce

Yes. There are N! possible orderings; to distinguish between them you need to do at least log2(N!) comparisons, which is of order N log N. The basic operation in shuffling is moving an item to a particular place in the list, which lets you add log N bits of entropy in a single go. You can think of it as being a little like radix sort.

In other words, pick a card at random and add it to the pack if it hasn't already been picked... The problem is that when you pick the last card, almost all the cards have already been picked, so your chance of hitting the last free card is 1/52. Which makes this O(N*N) instead of O(N) - decidedly non-linear.

While the idea is right, the conclusion is not quite, as far as I can tell. I was taught that big-O is a worst-case bound, and (PRNG limitations notwithstanding) there really aren't such bounds on this process - a 1/52 chance never has to occur.

But even in the average case, this method isn't O(N^2). Since there are 2 remaining cards to choose from in the second-last draw, it will take on average N/2 tries to find one - and so on, so the expected running time is O(N + N/2 + ... + N/N), which is clearly seen to be O(N lg N). (Or it's clear to me, anyway. I just looked at the integral of 1/x, knowing it differs from the sum of 1/N by at most a constant.) -- KarlKnechtel

P.S. Well of course it is O(N^2) by that definition, but it's also better than that. :)

This may be implementation-specific, but I believe there is a bug in the STL implementation of random_shuffle of Microsoft Visual C++ 6. The implementation of <algorithm.h> supplied with VC++6 (dated 1995) has this:

// TEMPLATE FUNCTION random_shuffle
template<class _RI> inline
void random_shuffle(_RI _F, _RI _L)
{if (_F != _L)
_Random_shuffle(_F, _L, _Dist_type(_F)); }
template<class _RI, class _Pd> inline
void _Random_shuffle(_RI _F, _RI _L, _Pd *)
{const int _RBITS = 15;
const int _RMAX = (1U << _RBITS) - 1;
_RI _X = _F;
for (_Pd _D = 1; ++_X != _L; ++_D)
{unsigned long _Rm = _RMAX;
unsigned long _Rn = rand() & _RMAX;
for (; _Rm < _D && _Rm != ~0UL;
_Rm = _Rm << _RBITS | _RMAX)
_Rn = _Rn << _RBITS | _RMAX;
iter_swap(_X, _F + _Pd(_Rn % _D)); }}
template<class _RI, class _Pf> inline
void random_shuffle(_RI _F, _RI _L, _Pf& _R)
{if (_F != _L)
_Random_shuffle(_F, _L, _R, _Dist_type(_F)); }
template<class _RI, class _Pf, class _Pd> inline
void _Random_shuffle(_RI _F, _RI _L, _Pf& _R, _Pd *)
{_RI _X = _F;
for (_Pd _D = 1; ++_X != _L; ++_D)
iter_swap(_X, _F + _Pd(_R(_D))); }

There are two random_shuffle function definitions, with and and without the option to supply your own random-number generator). If this code is tidied up to be legible, it reads:

// TEMPLATE FUNCTION random_shuffle
template<class _RandomAccessIterator> inline
void random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last)
{
if (_First != _Last) _Random_shuffle(_First, _Last, _Dist_type(_First));
}
template<class _RandomAccessIterator, class _Predicate> inline
void _Random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _Predicate *)
{
const int _RBITS = 15;
const int _RMAX = (1U << _RBITS) - 1;
_RandomAccessIterator _X = _First;
for (_Predicate _D = 1; ++_X != _Last; ++_D) {
unsigned long _Rm = _RMAX;
unsigned long _RandomNumber = rand() & _RMAX;
for (; _Rm < _D && _Rm != ~0UL; _Rm = _Rm << _RBITS | _RMAX) _RandomNumber = _RandomNumber << _RBITS | _RMAX;
iter_swap(_X, _First + _Predicate(_RandomNumber % _D));
}
}
template<class _RandomAccessIterator, class _PredicateFunction> inline
void random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _PredicateFunction& _R)
{
if (_First != _Last) _Random_shuffle(_First, _Last, _R, _Dist_type(_First));
}
template<class _RandomAccessIterator, class _PredicateFunction, class _Predicate> inline
void _Random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _PredicateFunction& _R, _Predicate *)
{
_RandomAccessIterator _X = _First;
for (_Predicate _D = 1; ++_X != _Last; ++_D)
iter_swap(_X, _First + _Predicate(_R(_D)));
}

I believe this iterates too few times.

...but once you've picked 51 of the cards, you already know what the 52nd is... Here's how the algorithm works: Take the _D'th card. Swap it with a card before or at position _D (picked randomly). Continue through all cards. It's pretty clear that _D can start at 1 rather than 0 with no loss of correctness.

... well, OK - but the algorithm shown above definitely gives the wrong results! See below.

For example, if you take a vector {1,2} and call random_shuffle(vector.begin(), vector.end()), then you *always* obtain {2,1}. If you shuffle {1,2,3}, you obtain {3,1,2} or {2,3,1} only, and so forth. To implement Don Dodson's algorithm (above), the following seems to give better results:

// TEMPLATE FUNCTION better_random_shuffle
template<class _RandomAccessIterator> inline
void better_random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last)
{
if (_First != _Last) _better_Random_shuffle(_First, _Last, std::_Dist_type(_First));
}
template<class _RandomAccessIterator, class _Predicate> inline
void _better_Random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _Predicate *)
{
const int _RBITS = 15;
const int _RMAX = (1U << _RBITS) - 1;
_RandomAccessIterator _X = _First;
_Predicate _N = _Last - _First;
for (_Predicate _D = 0; _X != _Last; ++_D, ++_X) {
unsigned long _Rm = _RMAX;
unsigned long _RandomNumber = rand() & _RMAX;
for (; _Rm < _D && _Rm != ~0UL; _Rm = _Rm << _RBITS | _RMAX) _RandomNumber = _RandomNumber << _RBITS | _RMAX;
std::iter_swap(_X, _X + _Predicate(_RandomNumber % (_N - _D)));
}
}
template<class _RandomAccessIterator, class _PredicateFunction> inline
void better_random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _PredicateFunction& _R)
{
if (_First != _Last) _better_Random_shuffle(_First, _Last, _R, std::_Dist_type(_First));
}
template<class _RandomAccessIterator, class _PredicateFunction, class _Predicate> inline
void _better_Random_shuffle(_RandomAccessIterator _First, _RandomAccessIterator _Last, _PredicateFunction& _R, _Predicate *)
{
_RandomAccessIterator _X = _First;
_Predicate _N = _Last - _First;
for (_Predicate _D = 0; _X != _Last; ++_D, ++_X)
std::iter_swap(_X, _X + _Predicate(_R((_N - _D))));
}

The code adds a few "std::" namespace handles (but could be written with a "using namespace" call instead).

Rudolf Cardinal (rudolf@pobox.com), 29 July 2003.

I was thinking about this problem and it seems that when shuffling a deck of cards and then comparing the shuffle to the previous deck of cards, there should be, on average, 1 card that matches its previous position in the deck (each card has a 1/52 chance of being in its original position and there are 52 cards).

I created the following Java program to test this out. Notice that the "bad" algorithm does not do a good job (> 2 matches per shuffle), the original suggested algorithm does "too good" of a job (< 0.9 matches per shuffle), and the modify algorithm hits about 1.0 matches per shuffle (for 10,000,000 shuffles):

[14:02] kregan@kregan ->java ShuffleTest

Bad shuffle # matches: 20106808 (2.0106808 per shuffle)

Good shuffle # matches: 8999434 (0.8999434 per shuffle)

Even better shuffle # matches: 10002152 (1.0002152 per shuffle)

Here is the program:

 import java.util.Random;


 public class ShuffleTest {


private static final int NUM_ITERATIONS = 10000000;
private static final int DECK_SIZE = 52;


public static void main(String[] args) {

//
// Create the deck
//

int[] deck = new int[DECK_SIZE];

for (int i = 0; i < deck.length; i++) {
deck[i] = i;
}

//
// Count the number of matches for various shuffles.
//

Random random = new Random(System.currentTimeMillis());

int badShuffleMatchCount = 0;
int goodShuffleMatchCount = 0;
int evenBetterShuffleMatchCount = 0;

for (int i = 0; i < NUM_ITERATIONS; i++) {

//
// Bad shuffle
//

int[] deckCopy = copyDeck(deck);

for (int j = 0; j < 100; j++) {
int randomIndex1 = random.nextInt(deck.length);
int randomIndex2 = random.nextInt(deck.length);

int tmp = deck[randomIndex1];
deck[randomIndex1] = deck[randomIndex2];
deck[randomIndex2] = tmp;
}

badShuffleMatchCount += countMatches(deckCopy, deck);

//
// Good shuffle
//

deckCopy = copyDeck(deck);

for (int j = 0; j < deck.length; j++) {
int randomIndex = random.nextInt(deck.length);

int tmp = deck[j];
deck[j] = deck[randomIndex];
deck[randomIndex] = tmp;
}

goodShuffleMatchCount += countMatches(deckCopy, deck);

//
// Even better shuffle
//

deckCopy = copyDeck(deck);

for (int j = 0; j < deck.length; j++) {
int randomIndex = j + random.nextInt(deck.length - j);

int tmp = deck[j];
deck[j] = deck[randomIndex];
deck[randomIndex] = tmp;
}

evenBetterShuffleMatchCount += countMatches(deckCopy, deck);
}

System.out.println("Bad shuffle # matches: " + badShuffleMatchCount +
" (" +
(badShuffleMatchCount / (double)NUM_ITERATIONS) +
" per shuffle)");

System.out.println("Good shuffle # matches: " + goodShuffleMatchCount +
" (" +
(goodShuffleMatchCount / (double)NUM_ITERATIONS) +
" per shuffle)");

System.out.println("Even better shuffle # matches: " +
evenBetterShuffleMatchCount +
" (" +
(evenBetterShuffleMatchCount /
(double)NUM_ITERATIONS) +
" per shuffle)");

} // main


private static int[] copyDeck(int[] deck) {

int[] deckCopy = new int[deck.length];

for (int i = 0; i < deckCopy.length; i++) {
deckCopy[i] = deck[i];
}

return deckCopy;

} // copyDeck


private static int countMatches(int[] deck1, int[] deck2) {

int numMatches = 0;

for (int i = 0; i < deck1.length; i++) {
if (deck1[i] == deck2[i]) {
numMatches++;
}
}

return numMatches;

} // countMatches

 } // ShuffleTest

Kevin Regan (kregan@amazon.com), 1 June 2004

Using n¡ to represent the subfactorial function Sum[k=0..n](-1)^k/k!: n¡ is the number of shufflings of n items in which none are in the same place as their starting positions. If precisely one card in the pack of 52 is unmoved by a shuffle, the rest can be in any one of 51¡ possible orders, giving a total number of "one-card-unmoved" orderings of 52×51¡. The probability that 52 randomly-shuffled cards ends up in one of these arrangements is therefore 52×51¡/52! = 51¡/51! ≈ 37%. (As the equality shows, this is the probability that after shuffling 51 cards, none of them will be in their original location). As n increases without limit, the probability approaches 1/e.

Why do a linear shuffle? How about constant time shuffle? Look at an interesting solution at http://blogs.mastergaurav.com/2011/12/26/card-and-deck-deal-and-shuffle/

Code inlined for completeness.

  class Deck
  {
private int[] cards = new int[52];
private int index;

public Deck() {
for(int i = 0; i < cards.Length; i++) {
cards[i] = i + 1;
}
Shuffle();
}

public int Remaining {
get { return index; }
}

public void Shuffle() {
index = cards.Length;
}

public int Deal() {
if(Remaining > 0)
{
//Get the random card
int dealIndex = new Random().Next(index);
int card = cards[dealIndex];

//Now swap
int tmp = cards[dealIndex];
cards[dealIndex] = cards[index - 1];
cards[index - 1] = tmp;

//Update the "end point" => cards remaining / dealt
index--;

//Deal
return card;
}
throw new InvalidOperationException?();
  }

}

LIKEABOSS! Gaurav Vaish (gaurav-dot-vaish~AT~gmail-dot-com), December 26, 2011