As we are developing an application, we change fields, add columns in the database, or remove obsolete ones. Different teams tackle this in different ways. Some teams re-create the database with each new build. This works for new applications, because there is no existing data. However, after an application exists in production and has associated data, this approach won’t work.
An application needs to consider the data that is part of the product. As with so much in agile development, a joint effort by database experts, programmers, and testers on the team is required to ensure successful release of database changes. Janet has seen a couple of different tactics for dealing with data conversion and backward compatibility. Database scripts can be created by the developers or database administrators as the team makes changes. These scripts become part of the build and are continually tested. Another option is for the team to run “diffs” on the database after all of the database changes have been made.
If you’re a tester, ask your database administrator/developer to help your team ensure that schemas are kept consistent among the production, testing, and staging environments. Find a way to guarantee that all changes made in the test environments will be done in the staging and production environments during release. Keep the schemas matching (except for the new changes still under development) in terms of column names, triggers, constraints, indices, and other components. The same discipline applied to coding and testing also should be applied to database development and maintenance.
Lisa’s Story
We recently had a bug released to production because some of the test schemas, including the one used by regression tests, were missing a constraint. Without the constraint in place, the code didn’t fail. This triggered an effort to make sure the exact same update scripts get run against each schema to make changes for a given release.
It turned out that different test schemas had small differences, such as old columns still remaining in some or columns in different order in different schemas, so it wasn’t possible to run the same script in every environment. Our database administrator led a major effort to re-create all of the test schemas to be perfectly compatible with production. He creates one script in each iteration with all necessary database changes and runs that same script in the staging and production environment when we release. This seems simple, but it’s easy to miss subtle differences when you’re focused on delivering new features.
—Lisa
Automating data migrations enhances your ability to test them and reduces the chance for human error. Native database tools such as SQL, stored procedures, data import tools such as SQL*Loader and bcp, shell scripts, and Windows command files can be used for automation because they can be cloned and altered easily.
No matter how the database update and conversion scripts are created or maintained, they need to be tested. One of the best ways to ensure all of the changes have been captured in the update scripts is to use the customer’s data if it is available. Customers have a habit of using the application in weird and wonderful ways, and the data is not always as clean as we would like it. If the development team cleans up the database and puts extra restrictions on a column, the application on the customer’s site might blow up as soon as a query touches a piece of data that does not match the new restrictions. You need to make sure that any changes you’ve made are still compatible with existing data.
Lisa’s Story
My team uses the staging environment to test the database update scripts. After the scripts are run, we do manual testing to verify that all changes and data conversions completed correctly. Some of our GUI test scripts cover a subset of regression scenarios. This gives us confidence about releasing to production, where our ability to test is more limited.
—Lisa
When planning a data conversion, think about data cleanup as part of the mitigation strategy. You have the opportunity to take the data that was entered in some of the “weird and wonderful” ways we mentioned before and massage or manipulate it so it conforms to the new constraints. This type of job can take a long time to do but is often very worthwhile in terms of maintaining data integrity.