Before I begin let me start by saying that I have committed all of these errors at some point(s) in time. Sadly, I will probably continue to commit some of them, but it's a work in progress.
At Attivio we strive to deliver high quality software and part of our strategy involves Junit tests. Currently we have 18,000+ tests that run as a part of our continuous integration. While many tests are straightforward (2+2=4) we do have some very complex tests that test multiple processes, threads, and their interactions.
Over time we've struggled with 'flaky tests' that fail 'sometimes'. These are even worse than the usual 'it works on my machine' type scenarios since it will likely work on most people's machines and even the build machine 99% of the time. Worse still, we often times find that the test itself is the problem and not the underlying code. This blog post is meant to cover some of the issues we've encountered to help us spot bad tests.
If you've ever used Thread.sleep(someAmountOfTime) in any unit test you're probably had that test fail at some point for no apparent reason. If you're running your tests in a hosted or virtualized environment, they probably fail more often. The problem with waiting 'long enough' is that any number of events outside your control could cause this sort of testing to fail. For example a garbage collection could be triggered at just the wrong time or another process on the machine could grab all the resources, making your timing-based test useless. In a virtualized environment your own test in another VM on the same physical machine might even be the culprit! For example, we've had code similar to this fail in our continuous build:
long start = System.currentTimeMillis();While seeing Thread.sleep() in a test doesn't necessarily mean your test will be flaky, it does mean you should take another look at the test itself. If there isn't a while (someConditionNotMet)} block around your {{Thread.sleep() you should be even more cautious.
Thread.sleep(1000);
long stop = System.currentTimeMillis();
Assert.assertTrue(5000 > stop-start);
Pop quiz: What is the order of the output for this program?
Thread t1 = new Thread() { public void run() { doSomethingThatTakes30seconds(); System.out.println("t1 done"); } }.start();
Thread t2 = new Thread() { public void run() { sleepFor10seconds(); System.out.println("t2 done"); } }.start();
Thread t3 = new Thread() { public void run() { // do nothing System.out.println("t3 done"); } }.start();
If you said the output was:
t3 done
t2 done
t1 done
You'd be right, most of the time. If you said
t1 done
t2 done
t3 done
You would also be right.... but very rarely. Using threads and relying on them to 'start doing stuff' when you call start() is dangerous. Anytime you're using threads, you need to make sure you're using something from java.util.concurrent, synchronization or just not checking any state until you've called join() on all threads, and then make sure the threads don't affect each other. You also need to stop and think about your code and test code carefully as this stuff is almost never as straightforward as it seems. In many of our tests, the logic and data structures used for testing a block of multithreaded code are far more complex than the actual code itself.
If you've ever created a socket server or read from a local file system outside your source tree, you've probably had tests fail randomly on someone's machine. They failed because there was another process running on that port (maybe another test session running) or because you keep your test data in c:\mydata instead of c:\somedata. Either way, your test is brittle and can be affected by other non-code related issues. At Attivio, we use a system property for a baseport that defaults to a standard value. Anyone running tests locally gets the default port value but the continuous integration builds set the system property to a different value for each plan to avoid conflicts. Further, any files we go after are either in the source tree or pulled from maven by the build system. This also helps us to enforce good coding practices to make sure someone doesn't accidentally hard code a default configuration value or file path into production code.
The easiest way to ensure that your unit tests take a long time to run and are brittle to other up or downstream changes is to pull in other unrelated functionality into your tests and then rely on that logic for your test to succeed.
If you have a component that moves data from Point-A to Point-B, make sure it only tests that. Don't add in Point C that wraps A to B logic and is easy to test, but not a part of the code you are currently working with. Since our system is built on top of an ESB, we often see developers want to put their component into a workflow in a running system and then feed content through it and the rest of the default workflows to make sure it can perform it's job.
The worst case of this is pulling in a larger framework than you need to test your code. For example, you don't need to load a spring context to test your bean. Even if it implements Spring lifecycle methods like InitializingBean, you can still call afterpropertiesSet() yourself and avoid the spring overhead.
I'm sure there are many other bad practices, but the ones I've listed above represent some of our worst offenders. As I said in the intro though, everyone makes these mistakes. The key is being able to track why a test is failing so you can start working on a fix.