Initial thoughts on space compression using the innodb_plugin

While setting up MySQL Enterprise Monitor 2.2 (Merlin) on a system which had been running version 2.1 I thought I’d try and see what difference the change from using normal innodb tables to using the compressed table format available in the innodb plugin.

I’ve been using a separate db backend for merlin because for me it’s easier to manage and also the database backend has been put on a dedicated server. I’ve also been trying the innodb_plugin on another busier server as I had performance problems with the normal 5.1.42 built-in innodb engine which the plugin managed to solve.

So given that I was using a separate db server I upgraded it to 5.1.47, configured the server to use the plugin (1.0.8) rather than to use the built-in innodb engine and then decided to alter the data tables (dc_p_long, dc_p_string and dc_p_double) to use the new innodb compressed table format. These tables are designed for storing a large number of rows of a specific type but there was no harm in trying.

Here are the results by doing the following:

SET GLOBAL innodb_file_format = "Barracuda";
ALTER TABLE dc_p_xxxx ROW_FORMAT=compressed;

Using the older Antelope storage format:

dc_p_string 178 MB
dc_p_double 514 MB
dc_p_long 15.3 GB

Using Barracuda COMPRESSED:
dc_p_string 35 MB
dc_p_double 223 MB
dc_p_long 6.8 GB

The compressed format is using the standard block size. I need to do further tests to see how much difference using the smaller 1 kb, 2 kb or 4 kb blocks will make.

So from the point of view of Merlin only it does seem to make sense to use this format, assuming performance is not significantly affected. After all it means I can store twice the amount of data on disk, and one of the problems I have had in the past is that I could only keep a week’s worth of data because of storage limitations. Note: the 2.1 to 2.2 update will save a lot of space as the string table will drop in size significantly. However gaining an extra 50% by using innodb compression seems worth doing if it comes for free.

That said I have been told that there are still a few issues with this new table format so for anyone looking to use it in production it may be best to wait for 5.1.48 which should remove a few of these edge cases. If you only want to see how much difference the storage usage is then 5.1.47 should be ok. YMMV.

In the meantime I’m going to leave this server running for a while. Merlin does hammer the db quite heavily, so I’ll be able to see if it survives in a few days.

I also have a few servers which currently use MyISAM tables because of the smaller disk footprint compared to innodb. These servers do suffer from some of MyISAM’s weaknesses such as concurrent reading and writing on the same table is not possible, so now it looks like we might have a good reason to try this compressed table format out and with that gain a lot using innodb. Recovery after a crash has always been a problem on this type of server and innodb recovery should be both quicker and less intrusive.

I’m looking forward to further experimentation but so far this new compressed format does look promising. So thanks to the innodb folk who have made this possible.

After writing this I thought I’d add a few more comments:

  • On the Merlin tables the selectivity of the main value column was:  dc_p_long (0.2), dc_p_string (0.3), dc_p_double (0.002). These values probably significantly affect how compression may work.
  • I’d like to see some sort of writeup by Innodb on how performance is going to be affected by using the compressed format compared to the normal one so if we switch we know what gotchas to be aware of [1].
  • It would be nice to have a writeup by Innodb on how to determine the approximate space savings converting the tables to the compressed format without having to do this. It took me over 2 hours to convert a 15GB table so much larger tables would take a lot longer. Having some idea of whether the saving is worthwhile and how much it might be would make it clearer to people if this change is necessary

[1] Having said that for the change I made I could have used Merlin directly to do this. It’s a shame I didn’t remember. Basically all that is needed is to run the mysql proxy on the database backend and feed the connections into that proxy. Then after altering all tables we use the Query Analysis (QUAN) functionality in merlin to compare the top queries before making the change and then look again after making the change. Any performance change will be easy to measure. Hopefully the Merlin developers can perhaps do this on a test setup they have and report back the results, and also provide the history size and also number of servers monitored.

Tags: , , , , , , , ,

One Response to “Initial thoughts on space compression using the innodb_plugin”

  1. Randomthoughts…

    […]Initial thoughts on space compression using the innodb_plugin | Simon J Mudd's Blog[…]…

Leave a Reply