Using an empty database (learn from your mistakes)

I’ve been working on various different MySQL related issues and maintenance procedures some of which have not gone according to plan.  Here is a recipe that may help you avoid wasting a lot of time, especially if your database is large.

In order to do some of these tests make tests against a server configured identically to the one you plan to work on but instead which has no data. That is the mysql database needs to be complete but the other databases need to be dumped with the –no-data or -d options.  Don’t forget to also include any triggers or stored routines.

Now run the “procedure” on this “emtpy instance”. As it has no data most things run very quickly. So if you have issues you can repeat the procedure in no time. Restoring the instance too is easy as it’s tiny. This makes the whole procedure scriptable and you can be confident in the results.

Once you are satisfied that it works you know what will happen and you can run the SAME procedure on the real instance with a lot more confidence.

This procedure, while it does require to you build an extra instance for testing, is actually a much safer way to do many tests. It doesn’t help for certain scenarios where the content of the tables is important but it does save you a lot of wasted time.

You still may need to estimate how LONG certain tasks will take and that must be done separately, but is usually easier to do once you know what you need to measure.

It would certainly have saved me a lot of time when doing various 5.0 to 5.1 upgrades, some of which have given me some problems and also a simple thing like a failed ALTER TABLE which was working on a 50GB table and failed at the end after running for 18 hours due to a foreign key constraint issue. This problem needs to be addressed by MySQL, but to be fair to them I shouldn’t complain about the 18 hours I wasted because I did not follow the procedure I suggest above.

Tags: , ,

One Response to “Using an empty database (learn from your mistakes)”

  1. KB says:

    Hi Simon. I know it’s been a while since you originally posted this but where I work, we call what you have described as a reference data set (RDS). The goal of the RDS is to provide just enough structure and data to allow an application to run cleanly. A RDS usually contains primarily DDL. The neat thing with a RDS is it can be “played” over and over during testing.

    Take that a step further and add data designed to specifically test a feature or known bug that is meant to be played after installing the RDS and now you have a test fixture. This makes it no only possible, but terribly simple to simulate problematic data without putting your code into production.

    Another huge win you get out of these tools together is the ease of tracking them in source control. Now, your application code and data repository code can all be stored together.

    Finally, this also makes it much simpler to automate application testing and verification because it gives you an easy way to repeatably reset your data store to a known consistent state.

    As you can see, combining reference data sets with test fixtures can help developers proceed with greater confidence that the application interacts well with the data store, no matter what kind of store is used because developers can quickly get to a known state and easily verify that the application and data store perform as expected after making a change.

    KB

Leave a Reply