A coworker of mine is getting more into the code side of making software work. I was talking to him recently about increasing the test coverage on the project he’s coming onto and he asked me, that eternal question:
So, what do I test?
The answer, of course, is “everything.”
Less glibly, software testing is a pretty deep topic. There are a lot of kinds of testing you can do, some very time consuming, others very quick, and with a similar range in effectiveness in finding bugs. Of course, there are other reasons to do tests (maybe you don’t care about all bugs and just want to be able to make some performance guarantees, for example.) But most of the time when people talk about software testing they want to get an idea of how buggy the software is; how close its behavior is to the expected/desired. In this blog post, I’ll talk about a few broad categories of testing. But first, a quick testing glossary:
- verification: Examining a product to make sure that it does what it was designed to do (what do the blueprints say?)
- validation: Examining a product to make sure that it does what it is supposed to do (what did the customer want?)
- reliability: How likely is it that a product will function when it is needed?
- assertion: Generally, automated tests perform some number of operations with the system, and then inspect some property of the system afterward. That property is called an assertion, and this term is used pretty universally across testing frameworks. It is common to talk about a test or an assertion being “strong” or “weak” based on how specific the requirements are. (x>2 is usually a weaker assertion than x == 6).
- mocking: The practice of creating an object that *acts* like some component of the system, but without being a full implementation. This is useful in testing because you can guarantee the return values from methods on a mock object will be appropriate for your test. It’s also nice to not actually make that bank withdrawal when testing your bill-paying application. You can mock up objects (ie, instances of a class in some programming language) but it is also possible to mock up an external service, or a user account, or whatever environment is needed for testing.
Smoke tests just exist to verify that, upon running your software, the magic smoke does not escape from your computer. They generally make very weak assertions about the system, like, for example, that a homepage in a web application has a title. But, even in this example, a passing test shows a lot of good things about the system: the templates are configured properly, the system successfully binds to a port, etc.
Unit tests intend to test a single component of a system. While the precise definition of a unit is up to individual teams, it is pretty common to discuss unit tests as ones that operate on a single class (in Java code, for example.) Ideally, in a unit test, all objects except for the one under test should be mocked.
Integration testing verifies that components in a system interact correctly. This is the type of testing that you are doing when you set up that server and just check that the homepage comes up. Again, the line between unit and integration testing is kind of fuzzy.
Black Box Testing
Black box testing uses only the parts of a system that are intended for use by external components. For example, if you were black-box testing a Queue class, then you would not make any assertions about the internal storage, only based on the standard Queue operations. You also wouldn’t test or modify any helper classes used in the implementation of the Queue, and code inspection would not be an option. Black box testing encompasses techniques like:
- fuzz testing, using randomly generated garbage data as input
- equivalence classes, inspecting a component’s specification to identify ranges of input that should be functionally equivalent (thereby allowing the tester to avoid redundant effort
- user testing, simply handing software to a user and asking him to identify places where it works unexpectedly or incorrectly
White Box Testing
White box testing assumes access to the code and internals of a system. It allows techniques that modify and inspect the code, things like:
- static analysis, where a tool like lint or findbugs inspects code to heuristically detect errors
- code coverage analysis, where a tool like cobertura instruments the system during a run to identify unused sections of code.
- mutation testing, randomly modifying code to gather statistics about how much has to break before automated tests start to fail
- code review (by real live programmers!)
I guess I still haven’t answered the question; what should we test? It’s going to vary a lot between projects. A lot of shell scripts I write don’t get tested at all (well, except for the one time I run them), but I’d like to know that the guys working on the software for the next plane I’m on have a pretty solid testing system. And there are a lot of points in the space between those extrema. In general, I would say probably getting basic tests in place to begin with is good, and of course as a project goes on problem points can be identified. Automated tests are good for ensuring old behavior doesn’t change unintentionally, but there is a balance to strike between setting up that safety line and tying the whole project down. Because (especially in web programming) things do actually change from time to time.