At the lowest level, there are developer functional tests that exercise specific functions via an API and verify the results using another read-only user API. There are currently over 24,000 tests implemented in JUnit that every developer must run before they can “check in” their changes to the source code.
The next level is a set of GUI tests that test the marshalling of user data back and forth to the API, particularly around “master-data” creation and updates. There are currently over 500 of these tests implemented using Watij (an open source library similar to Watir but using Java) and JUnit that run multiple times a day.
The final level of testing is a set of integration tests created by the users that run in a Fit-like harness. Users identify dense test cases that reflect real-world cases covering many of the functions that work together to produce financial and regulatory outputs. These test cases are then transcribed into import templates and then processed using a domain language that mirrors the way end customers think about their processes.
For example, after an end customer has created the configuration of facilities and contracts they wish to exercise in their test, they work with a developer to use the domain language to process their facilities in the correct order. The end users also supply a set of expected outputs that are then verified using a read-only API. These outputs can contain thousands of numbers, any of which can change for seemingly minor reasons in an evolving product. It is a constant challenge to sort through what is a legitimate business change from what is a defect. There are currently over 400 integration tests, and they run twice per day, providing feedback to the end customers and developers.
Exploratory testing is done continuously throughout the development cycle and is augmented at the end of releases.
Our first attempt at PASFIT (which is what we called the functional test framework) was a spreadsheet of color-coded inputs and outputs. We then generated Java code based on the color of the cells to create the data in PAS. That proved difficult to maintain, partly because the application was in major flux both at the GUI and database level.
Our next iteration of PASFIT didn’t evolve for nearly a year after the previous attempt. After we had a more stable set of database views and GUI, we were able to create an engine that used simple imperative language (i.e., a script) to do actions with arguments against a GUI (e.g., Go to Balancing Page, Balance Battery: Oil, Water). The script evolved into following the thought process of a production accountant and became a domain-specific language. The engine was written using Ruby and Watir, and an instruction from the script was basically a Ruby method that was invoked dynamically so that it was easy to update. After the script ran, the framework then loaded a snapshot of the views that the test wished to compare and did a simple row-by-row, cell-by-cell comparison of what was to be asserted and what actually happened. Eventually this was enhanced in the spreadsheet to use Pivot tables to enable the users to focus in on only the results they wished to assert for their test. All in all it has been quite successful, although the requirements for our application mean that 300 tests take about 12 hours to run, which is a long time.
Getting the business more involved in maintaining the regression tests has also been difficult, but when it happens it is very good. Currently, we have a stand-up where the business users and the developers meet for 15 minutes to pick up any of the scenario tests that are breaking that day. It is quite effective in that people often know when they come to the stand-up what they might have broken the day before. Future enhancements are likely to include asserting against actual user reports instead of the views and running a migration each night against the scenario script.
PASFIT achieved a balance between letting business experts write tests in a DSL and automating those tests with a highly complex application. Success came with some trial and error. Teams that write their own test frameworks need time to experiment to find the right solution for both the business and the development team.
Strategies for Writing Tests
The best tools in the world won’t help if you don’t use them wisely. Test tools might make it very easy to specify tests, but whether you’re specifying the right tests at the right time is up to you. Lisa’s team found that too much detail up front clouded the big picture to such a degree that the programmers didn’t know what to code. This won’t be true for every team, and at some point we do need details. The latest time to provide them is when a programmer picks up a coding task card and starts working on a story.