Unit Testing Non Deterministic Code

What strategies are people using to UnitTest code that is non-deterministic?

Some example problems below.


I've been writing a MarkovChainer and would like a repeatable way to test certain results. I try passing it a sort of MockRandomizer and controlling it externally, but the randomness is buried so deep inside the chainer that it's extremely hard for me to think about how to test the chainer. (There's also an issue of a CombinatorialExplosion, I suppose; for almost any markov chainer there's an infinite number of possible chains you can receive.)

Why is it difficult to mock the randomizer? Some consider it to be a CodeSmell when they encounter code that is difficult to unit test.

I agree that under most circumstances, code that's difficult to UnitTest is a smell. But I have the sneaking suspicion that non-deterministic code is far more difficult than other types of code, even if it's well-factored.

I tried mocking the randomizer, and it was really quite difficult. Maybe I'm doing something wrong:

In short, yes, you can UnitTest it. But I can't think of a way to do it that doesn't introduce more of a mess than it cleans up.

You're trying to do too much. I think you're mixing UnitTests and AcceptanceTests. UnitTests are used to test interactions between a few (at most) classes. MockObjects are used to reduce the number of complex classes interacting. If you want to test the overall behaviour of the system, such as the final results, that's an acceptance test and you shouldn't use MockObjects for that. UnitTests are tools that help you write code. A side effect is that they verify behaviour, but except at the lowest level, that is not their main purpose.

Example: You want to test that the state transitions occur with the proper probability. Write a MockRandom class that returns [0-1] based on a predefined list of numbers. If the probabilities are 0.2, 0.3, and 0.5, set up the mock random to return boundary conditions 0.0, 0.19, 0.2, 0.49, 0.5, 1.0. Then assert that the proper state transition occurs in each case. If it works, you're done. Now, writing the acceptance test to see that you get the proper overall behaviour is a different beast. You'll want to keep the randomness and test things like averages and variances.


Writing a hashCode() method for a Java class. Two instances of this class are equal if they match across three String fields and a Date field. (It's for eliminating duplicates in a database.) So hashCode() uses the four fields to determine a proper hashCode. But after I programmed it the simplest way I can think -- get the hashCodes of all the values, and sum them, not worrying about integer overflow -- I realized that it would be possible for two non-equal instances to return the same hashCode, if their fields had different hashCodes that happened to come to the same sum. A proper UnitTest would come up with those two non-equal instances and test with them. But since hashCode() is defined non-deterministically, I can't imagine how I would do that.

Take courage: The requirements for hashCode(), according to Sun, are not as vexing.

http://java.sun.com/products/jdk/1.2/docs/api/java/lang/Object.html#hashCode()
Of particular interest: "It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer rsults." In other words, it is not hashCode() that you need to test, but whatever uses hashCode(). And that can be tested with a mock object which returns a deterministic hashCode.

In other words again: "equal() objects must have the same hashCode()" - that's all you need for hashCode() to be correct. So your test could just be to create some equal objects and check that they have the same hashCode()s.

The idea is that if two objects have different hash codes then you know they aren't equal, but if they have the same hash code then they might be the equal (use equal() to find out for sure). For hashCode() to be good it should usually return different values for non-equal() objects, but that's another kettle of fish!

Well, then let me redefine the problem. I, as the creator of the class, want non-equal methods to return non-distinct hashCodes. Especially since the only reason I'm defining hashCode() in the first place is to use the class as an element in a HashSet, and I have absolutely no control over what HashSet does with the elements internally. So my requirements for hashCode() are more rigid than Sun's requirements. They're still reasonable requirements, though.

Would it be easier to write your own Set? You may note that the lexical key of your members form a total ordering of the class. That is given u=(s1,s2,s3,d), v=(S1,S2,S3,D), u<v if (s1<S1 or (if (s1==S1) s2<S2 or (if (s2==S2) ... )))). Then you can use any of the comparison based implementations of a set (e.g. red-black tree) to maintain O(logn) efficiency instead of the crappy O(n) effficiency of a naive Set implementation.

I'm not sure why I should redefine my own Set. That's not even the class I'm trying to test. The HashSet in the JDK 2 probably works fine, if it's passed an element of a class with a functioning hashCode() function.


If the code has any real purpose, then there must be a way to describe the characteristics of the result. These may be in terms of ranges of acceptable values and distributions within the range. This may require the test code run multiple passes through the operational code to collect a variety of samples. A simple check for a normal distribution might use five result ranges. You might decide the high and low ranges should have 0 results, the 2nd and 4th ranges should fall within 10% of each other, and the middle range should have more hits than either the 2nd or 4th ranges. If you need a greater degree of precision, simply expand the number of sample ranges and increase the number of samples taken.

I suppose this would work most of the time, given a large enough sample range. Maybe I'll end up doing this. But since you're dealing with non-deterministic stuff, you still have the chance of creating a false negative -- or, even worse, creating a false positive. Normally I think of UnitTests as being an ironclad guarantee -- I'd like to feel the same amount of security when testing non-deterministic code.

And in particular, there is a meaningful way to describe the characteristics of the result -- they just aren't guaranteed to be consistent between different invocations of the same program. hashCode(), for example, has a well-defined meaning, but you can't write code like

 assertEquals(1756294, myObject.hashCode())

and be guaranteed that the test will always work.


I fail to see what's non-deterministic about hashCode(). I'm not even sure that's relevant...

hashCode is non-deterministic because the results are not guaranteed to be constant between separate invocations of the same program. It seems quite relevant to me, given the title of this page.

I'm pretty sure there's no randomness involved in hashCode(). I could be wrong, of course... I don't have the source to java.lang.String handy, for instance, but I seem to remember all it does is XOR a few characters together. The details of the implementation might vary from one VM vendor to another, of course : the specification can rightly said to be non-deterministic, insofar as it does not fully constrain the implementation.


Here's how I usually approach the question, when there's something I don't quite know how to test. Alice and Bob, another pair in the team, just told me they've completed the hashCode() method. (Chet and I have been working on the rest of the class.) You know Alice and Bob... they always forget to test for the most obvious, the most important thing that could possibly break. Chances are, their code has that bug - you know the one I'm thinking of. It won't be too hard to code up a situation in which that bug badly crashes the program, right? If we have a Platypus instance with this name and that birthdate, and this other instance is set up like that, they won't be summed right and frob() will raise an exception. (That's your unit test mapped out, the rest is "a mere matter of coding".)

Maybe I'm not being clear enough: Below is a further restatement of the problem.

I have a class I'll call Person, which implements hashCode(). To do so, it simply sums the hashCodes of four different member variables, not worrying about integer overflow.

 int hashCode = firstName.hashCode() + lastName.hashCode() + country.hashCode() + birthday.hashCode();

99% of the time, this will work. But I'm sure that for any given invocation of this code there exists a separate instance, with different member variables, which will return the same hashCode with this algorithm. (Someone better than me at the math could probably draw up a quick proof of this, in fact.) Or, stated somewhat more formally, that there exist some different variables firstName2, lastName2, country2, birthday2 such that

 (new Person(firstName, lastName, country, birthday)).hashCode() == (new Person(firstName2, lastName2, country2, birthday2)).hashCode()

The problem is this: Between different invocations of the same program, there is no consistency between what values should be chosen for the second set: firstName2, lastName2, country2, birthday2. That's because hashCode is not guaranteed to be deterministic. So this prevents me from writing test code like this:

 Person person1 = new Person("John", "Doe", "us", new GregorianCalendar(1972, 5, 16).getTime());
 Person personWithSameHashcode = new Person("ha8734hg", "m6d76*", "324tjhb", new GregorianCalendar(1999, 6, 21).getTime());
 assertTrue(person1.hashCode() != personWithSameHashcode());

... or, more precisely, I can write this code, but since the test member variables are a moving target, almost every time I run this test it will return a false negative.

What's interesting about this is that the code for the hashCode() function itself is fairly trivial. My code is only about 17 lines of code -- it stores all hashCode values in a static Hashtable, and then when it generates a new hashCode it enters a loop to make sure that there isn't already a object that has returned such a hashCode for an unequal object. The code itself isn't very difficult, even; without UnitTests I'm mostly sure it does what it's supposed to. And yet I recognize the value of UnitTests, and would love to write a UnitTest for this. -- FrancisHwang

I still don't see why there is a problem there. That the hashCode() method will return equal values for non-equal instances isn't a possibility - it's a certainty, since there are zillions more distinct instances of almost any conceivable Person-like class than there are values of 'int'.

In fact, you could perfectly well DoSimpleThings and have each and every hashCode() method you write return 0. This will satisfy the contract for hashCode(). Classes that use your hashCode() might make assumptions about that contract, such that they will break with the version that just returns 0. But then it's these classes that are broken, and need a UnitTest for whatever they do; not your hashCode() implementation. (You'll definitely want to have good unit tests for your implementation of equals(), which of the two is far more critical.)

The version that just returns 0 is correct, and cannot possibly break. It doesn't need a unit test. The next step might be to write a UnitTest that forces you to use a "real" hashCode() implementation. The only reason you'd want to do that is performance : it will make Hashtable lookups or Vector comparisons much, much faster. So the "natural" UnitTest to write would be one that compares execution times for these operations with a stupid hashCode() against execution times with a working hashCode(). (Performance tests aren't fully "deterministic" either, due to multitasking. But you can take steps to make this a negligible factor.)

Or you could write a test that checks for characteristics of the distribution of hashCode values for "random" instances, as suggested above, since that is what the optimizations of Hashtable et al. rely on. (Note that you would then be introducing randomness into your test to check for a fully deterministic property of hashCode() - not testing something that relies on randomness.)

Am I still missing something?

Ah, you're quite right, aren't you? It hadn't occurred to me that every class that already relies on hashCode() -- including java.util.HashSet -- already has to take into account the possibility of unequal objects hashing the same. So I dropped the requirement, and changed my Person code to forget about trying to find non-uniqueness. Problem solved, thanks.

My brief exposure to the possibility, though, gives me the nagging suspicion that somewhere out there, there exists a requirement like the one I was trying to impose. But that's probably best left as a bridge not to be crossed yet -- and maybe nobody ever has to cross it, who knows. -- fh.


Something that is commonly overlooked, but quite useful, is to create a StringBuffer containing the unique data you'd like to hash on, and then getting the hashCode() from the resulting String from that StringBuffer. This is deterministic, because the algorithm that String uses to produce its hashCode is known and consistent. In addition, the hashCode produced using this techinque is guaranteed to be unique (so far as the data is concerned) across virtual machines. For those of you who are performance worry warts, simply cache the hashCode in a member variable called fHashCode or _hashCode or whatever. In the constructor of your class, set this member to 0 or -1 or something you can check for in hashCode() and, if needed, create the StringBuffer, fill it, and then populate your hashCode value. This technique works well for identity type tests (equivalent to a database key), but not for weak comparisons. -- JeffPanici


This is a JavaUnit test I wrote for a tic tac toe game. It found a mistake for me once while moving this behavior from Board to Player. It also signaled false alarm once or twice, but never twice in a row. If it was the only test that kept me from GreenBar then I just pushed the button again. This test relies on the fact that in the limit the distribution of a random variable isn't random. -- WardCunningham

public void testChoosing() {
player.board = new Board (" xx xx xx");
int f[] = {0,0,0,0,0,0,0,0,0};
for (int i=0; i <1000; i++)
f[player.choose()]++;
assertEquals(0, f[1]);
assertEquals(0, f[2]);
assertEquals(0, f[4]);
assertEquals(0, f[5]);
assertEquals(0, f[7]);
assertEquals(0, f[8]);
assertTrue(f[0]>300);
assertTrue(f[3]>300);
assertTrue(f[6]>300);
}


It looks like there is a pattern here trying to crawl out. EncapsulateRandom?, perhaps.

Deck.shuffle() depends on a random number generator to arrange the cards. So we refactor the RNG into it's own object, and use a mock to replace it in the UnitTests

testShuffle ()
{
deck = new Deck();
deck.rand = new MockRand();

// call shuffle, and verify the results. We // should be able to predict the exact ordering // of the deck, because we can predict the exact // sequence generated by MockRand }

Poker.getHand() depends on Deck.shuffle(), but the randomness is buried so deep inside the Deck that it's extremely hard for me to think about how to test it. So instead, mock the Deck!

testGetHand()
{
poker = new Poker ();
poker.deck = new MockDeck();

// call getHand, and verify the results. Again // we can predict the exact results, because we // can predict the behavior of Deck }

MatryoshkaDoll?, rinse, and repeat.

Ward's test would end up looking like

public void testChoosing() {
player.board = new Board ("xx xx xx");
player.rand = new MockRand() ;

int f[] = {0,0,0,0,0,0,0,0,0}; for (int i=0; i < 1000; i++) f[player.choose()]++;

// and because we know what the right answers are.... int expected[] = {347,0,0,321,0,0,332,0,0} ;

for (int i=0; i < 9; i++) assertEquals( expected[i], f[i] ) ; }

I prefer this approach, though it is more work, because this is a UnitTest, a safety harness to ensure that refactoring the code doesn't change its behavior.


Ah, this is much much more helpful, thanks. I can definitely see the value of a strategy like this. Maybe it can be stated like this:

-- FrancisHwang


See UnitTestingRandomness, NonDeterministic


CategoryTesting


EditText of this page (last edited September 2, 2009) or FindPage with title or text search