Tracking the number of tests written, running, and passing at a story level is one way to show a story’s status. The number of tests written shows progress of tests to drive development. Knowing how many tests aren’t passing yet gives you an idea of how much code still needs to be written.
After a test passes, it needs to stay “green” as long as the functionality is present in the code. Graphs of the number of tests passing and failing over time show whether there’s a problem with regression failures and also show the growth of the code base. Again, it’s the trend that’s important. Watch for anomalies.
These types of measurements can be reported simply and still be effective.
Lisa’s Story
My team emails a color-coded calendar out every day showing whether the “full build” with the full suite of regression tests passed each day (see Figure 15-10). Two “red” days in a row (the darkest color) are a cause for concern and noticed by management as well as the development team. Seeing the visual test results helps the organization pull together to fix the failing tests or any other problems causing the build to not run, such as hardware or database issues.
—Lisa
Figure 15-10 Full build result email from Lisa’s team
There are different ways to measure the number of tests. Choose one and try to stay consistent across the board with all types of tests, otherwise your metrics may get confusing. Measuring the number of test scripts or classes is one way, but each one may contain multiple individual test cases or “asserts,” so it may be more accurate to count those.
If you’re going to count tests, be sure to report the information so that it can be used. Build emails or build status UIs can communicate the number of tests run, passed, and failed at various levels. The customer team may be content to see this information only at the end of each sprint, in the sprint review, or an email.
Whatever metrics you choose to gather, be sure the team buys into them.
Janet’s Story
I started a new contract with a team that had been doing agile for a couple of years, and they had developed a large number of automated functional tests. I started keeping track of the number of tests passing each day. The team didn’t see a problem when the trending showed fewer and fewer tests were passing. The unit tests were maintained and were doing what they were supposed to do, so the team felt confident in the release. It seemed this happened with every release, and the team would spend the last week before the release to make all of the tests pass. It was costly to maintain the tests, but the team didn’t want to slow down to fix them. Everyone was okay with this except me.
I did not see how fixing the tests at that late date could ensure the right expected results were captured. I felt that we ran the risk of getting false positives.
At the start of the next release cycle, I got the team to agree to try fixing the tests as they broke. It didn’t take long for the team to realize that it wasn’t so tough to fix the tests as soon as we knew they were broken, and we found a lot of issues early that hadn’t usually been caught until much later. The team soon set a goal of having 95% of the tests passing at all times.
We also realized how brittle the tests were. The team made a concerted effort to refactor some of the more complex tests and eliminate redundant ones. Over time, the number of high-level tests was reduced, but the quality and coverage was increased.
We started out measuring passing rates, but we ended up with far more.
—Janet
Don’t get so caught up in the actual measurements that you don’t recognize other side effects of the trending. Be open to adjusting what you are measuring if the need is there.
Code coverage is another traditional metric. How much of our code is exercised by our tests? There are excellent commercial and open source code coverage tools available, and these can be integrated into your build process so that you know right away if coverage has gone up or down. As with most metrics, the trend is the thing to watch. Figure 15-11 shows a sample code coverage report.
Table 15-11. Sample code coverage report from Lisa’s team. “Ghidrah” is the new architecture; “Whitney” is the legacy system.