The innodb_plugin – a pleasant surprise!

I’ve heard about the innodb_plugin but not had time to put it to the test.

Recently though due to some problems I’ve been having with the MySQL Enterprise Monitor (Merlin) I’ve had to try a few changes and had the opportunity to try out the innodb plugin.

I have been using Merlin for some time and like it a lot. It is not perfect but does a good job for me.  However, since upgrading to version 2.1 I have been having some database load problems. I long ago split the merlin server into a front- and back-end server with the backend running a standard MySQL 5.1 Advanced package. That has been working fine.

I have been monitoring more and more mysqld servers and recently the database backend could not cope. Basically the writes of data collected from the agents and the deletes of old date (purging) caused too much I/O and that is on a box with 6 disks in RAID-10 with a battery backed write-cache.

So I upgraded the db server to a new box with lots of memory and a 300 GB Fusion IO card. I expected all problems to go away. Well not quite. In spite of the solid state drive which was not I/O bound, and the CPU which was not CPU bound, mysqld could not keep up with the load. This was running the MySQL 5.1.44 Advanced rpm. Looking more deeply it seems that mysqld itself was the bottleneck and there was too much contention on the PK by the different INSERTing and DELETing threads.

The Merlin team suggested trying the innodb_plugin (1.0.6) and all of a sudden the bottleneck seems to have gone away.

This is the iostat output taken before switching to the innodb plugin:

This is the iostat output taken after switching to the innodb plugin:

Note: the first set of figures was taken using CentOS 4, and the second using CentOS 5. The IO statistics aren’t exactly identical and on CentOS 5 for some reason the driver appears not to be providing all the stats. However the wsec/s value clearly shows a significant performance improvement and the original problem mysqld was having of not being able to purge as fast as it was inserting data seems to have been solved. At least initial signs seem to indicate this. The only configuration change made to the server was the following:

Conclusions:

  • If you are having performance problems on your MySQL server and perhaps the hardware is not the bottleneck then try using the plugin. It may make a big difference.
  • Throwing hardware at a problem does not always solve it.

Also a big thanks to the Merlin team for helping me out with this problem and getting things up and running.

Tags: , , , , , ,

2 Responses to “The innodb_plugin – a pleasant surprise!”

  1. Mark Robson says:

    Hi,

    I’m interested in what the validation process was that you went through to evaluate the plugin. You have lots of servers in production, some of which were overloaded.

    * How did you check for regressions in your app caused by the plugin?
    * How did you confirm that it would benefit performance
    * What non-functional (soak, stress) testing did you do?

    I find it very difficult to upgrade anything because the testing requirements are so onerous (of course the second two above need to be tested on production-grade hardware). Essentially, there is no time to do things like this because the testing will blow the effort up to more than can be spared – Ops simply install more and more hardware instead.

  2. Simon Mudd says:

    Hi Mark,

    To be fair the article I’ve written are preliminary findings. However, that said I had a problem which the current hardware could not solve. It could not solve it because the problem was not in the hardware but was software (mysqld) related.

    Just to re-iterate I’m using a tool (Merlin) which collects performance data from many other database servers pretty much continuously during the day. This information can then be used to see graphs, or to trigger alerts about problematic behaviour. To avoid the disk filling up on the database where this information is stored there is a separate process which removes old data. That application is written by the MySQL developers.

    However, in the environment where I run merlin, and with the number of servers that I’m monitoring, the database server was really coming under load. So much so that the purge process which deleted old data could not delete it quickly enough: too much new data was being inserted.

    On the original db hardware I looked at the I/O statistics and saw that the disk subsystem was sustaining over 300 I/Os per second, and basically was saturated.

    So as I had a spare box available, which was much bigger and had these Fusion IO solid state disks, I thought I could temporarily solve the problem by throwing hardware at it. The server has 48GB of RAM and the Fusion IO card so on this hardware the problem would just go away…

    However, I was completely wrong. The box was much faster but the 2 competing sets of threads were still causing problems: The purge thread was still unable to DELETE data quickly enough.

    Looking on comments such as this http://www.mysqlperformanceblog.com/2010/02/25/index-lock-and-adaptive-search-next-two-biggest-innodb-problems/ gave me the hint. I was hitting the same problem. I knew this as the Fusion IO card was not running flat out and also the CPUs were mainly idle.

    So at the suggestion of the merlin developers I simply tried the plugin. I had nothing to lose as the database as it was was going to fill up the disk, unless I did something drastic like TRUNCATE TABLE. Not helpful for looking at graphs.

    The results as I explained were quite clear. I made the alterations shown and watched to see what would happen. The IO rate increased significantly and the “purge” thread that previously had been having problems started to catch up and was eventually able to delete the old data as fast as new data was being inserted.

    In the meantime use of the merlin front-end has continued to work fine.

    So no this is not 100% scientific, but _I_ am convinced that the plugin works and I have seen the benefit it made to me. It may well be possible to improve performance some more but right now that is not my concern as performance is sufficient and I have other more pressing things to look at.

    You say that it’s hard to test things but I’d certainly suggest, if you can do this in your environment, that you clone your database as a slave. Doing so allows you to put a read load (SELECT only) on to it and if you have tools to do this you can stress the box until it [nearly] breaks. It’s harder to stress the writes, unless you have tools to simulate that, but the SELECT load test gives you a good idea.

    I hope that explains what I did and why I’m convinced that the plugin worked for me.

Leave a Reply