On warming up a MySQL 5.6 server

In the past…

One of the typical problems you have when restarting mysqld is that the InnoDB buffer pool (buffer pool from now on) is empty and consequently access to the database requires reading directly from disk. Performance suffers dramatically as a consequence.

So the common solution is to artificially warm upthe server by doing queries which will fill the buffer pool. Typical solutions might be to do: SELECT COUNT(*) FROM some_table FORCE INDEX (PRIMARY) LIMIT ... on a number of tables to fill up the pool on startup. Fitting this into the standard mysql init start script is somewhat tricky as no hooks are provided for this sort of post-start action. (It would be nice to have this for other tasks too.)

Of course choosing the right parameters here can be tricky as workload changes over time, and as the ratio of the size of the database to the size of the buffer pool increases, you need to be more selective in what you load in.  This is a tricky problem to solve well and solve generically, I am sure you’ll agree.

Then MySQL 5.6 comes to the rescue (almost)

So I was pleased to hear that MySQL 5.6 had a new feature to take the problem away.  When shutting down mysqld, it can be configured to save the list of pages in the buffer pool, and on startup you can use this list to populate the empty buffer pool, hence putting you in the same warm state you were in before.  See: http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html.
Good news, but…
As the systems I manage have got busier and technology has advanced, so has the size of the memory in my servers. Some servers I manage now have 192 GB of ram and we have tried set ups with even more memory.  The innodb buffer pool size is likely to be around 90% of that size.  So loading in 160 GB of data from hard disk, even if read 100% sequentially, is going to take some time, so doing this as efficiently as possible is essential.   Yet, perhaps loading in the full pool is not really necessary. Experimentation, depending on the different set ups I use, indicates that mysqld performs sufficiently well even if if buffer pool is only 25% full.  After that natural usage will pull in missing data and gradually performance will improve to expected levels.
The first systems I started using MySQL 5.6 on needed fast disks so were configured with SSDs. Here the absolute need for a warm up script was less important as SSDs are so much faster anyway, so it is only now that I have paid more attention to this new feature and see some potential weaknesses in its implementation. Having said that it’s a good first step, and for many people may be just what they need if they enable it.
Below are some suggestions for improvement, but they will probably have to wait for 5.7 as new features should not really go into the current GA release.
  • Optionally time-limit this warm up process, stopping when the time limit has exceeded
  • Optionally limit the number of pages loaded in, probably to a percentage of the current pool size. The pages loaded in however, should be in MRU first as these are the hotest and most needed pages.  Ensure the load sorts the page list to ensure that I/O is as sequential and fast as possible.
  • Optionally define the number of concurrent threads that can be doing warmup I/O. My experience is that depending on the underlying disk subsystem running several parallel warmup threads may reduce the warmup time considerably. I think the current implementation in 5.6 only uses a single thread to load in pages.
  • Stop warming up once the buffer pool is full. Other normal database activity on the server may trigger data being loaded into the buffer pool so it may fill up before the warmup process finishes. So if buffer pool page eviction starts there’s no point continuing the warmup process.
  • Take into account the fact that copying an on disk mysql image may result in the server being started not being the one that the buffer pool list was saved for and the configuration in terms of memory size and buffer pools may change.  If this makes a difference ensure that the warm up process still does the right thing of warming up the server with the right pages.
I have posted a feature request as Bug#68863 and hope some of these ideas will be considered.
So I still have to do more testing but think that I may well still be doing manual warm ups for while yet.

MySQL 5.6 is out, so what is next?

MySQL 5.6 is out now and that is good news. I have already been using pre-GA versions on some production servers with good success and now that the few wrinkles I have seen have been ironed out, I am sure a lot of people will find the new features in 5.6 well worth waiting for.

However, that does lead to the question of: “what next?”

I still have several things that I would like to see in MySQL in no specific order of preference such as:

  • Session transaction state exposed as a variable to allow you to determine if you have started a transaction or not, and thus allowing you to use BEGIN WORK, ROLLBACK or COMMIT as needed.  This information is available via the C API I believe but not via SQL.  Something like @@in_transaction = 1.  Makes modular programming easier.
  • Table space management. The default behaviour now in 5.6 is to move to innodb_file_per_table = 1, but really on a large system I’d prefer to not have to manage each individual file, or partition, but blocks of files containing one or more tables: that is mange table spaces.  This potentially should help reduce the number of file handles that are needed open and allow the DBA to manage the “database” or “instance” in blocks which are of a manageable size to him/her.
  • Better configuration management. If I want to compare the running configuration of the variables in SHOW GLOBAL VARIABLES with the [mysqld] section in /etc/my.cnf then there are quite a lot of inconsistencies in variables/parameter naming, some parameters are not visible in SHOW GLOBAL VARIABLES and vice versa. After seeing that the default performance_schema settings in MySQL 5.6 did not match my needs, I have had to use quite a bit of magic to setup /etc/my.cnf with the correct parameters. If you add the way you have to configure performance_schema then checking for consistency and / or adjusting the running configuration to match /etc/my.cnf is more of a challenge than really necessary. So clean this up please and make life for us easier.
  • More dynamic configuration. Many configuration settings require restarting MySQL.  Now that mysql is no longer a toy and people depend on it 24×7, any downtime is costly and really unhelpful, especially if it’s not necessary.
    • Many replication configuration settings are still static.
    • Many innodb settings are also static (innodb_{read,write}_io_threads, back_log comes to mind), so having to restart mysqld is really not helpful, just to see if a change will improve performance or not.
  • Move the SHOW ENGINE INNODB STATUS output to a tabular form (probably in multiple tables in I_S) so that the information is easy to parse and thus people can better use the information.  Right now if you want to see which threads are busy doing I/O or want information on the actively running transactions, locks or whatever parsing the current output is a real pain. It’s true that much of this information may well be now in P_S but not everything is visible there, and the information that is missing should move to a place where it is easy to extract and use.
  • Better Memory management.  I have some servers with 96 GB of RAM and a certain size innodb buffer pool, and some of these when busy with heavy queries use up too much memory and start to swap, whilst others use the full buffer pool and have plenty of memory left over. The end result is the wrong amount of memory is being used. Workload changes the memory usage completely and the change can make a server swap or run out of memory.  The memory in the system is there to be used by the database yet MySQL does not manage this in any coherent way, or allow you to control it to prevent overuse.    Most other RDBMSes handle memory quite differently, pinning the configured size into RAM and only using that memory and nothing else. This appears to be more efficient and something similar in MySQL would make life a lot simpler.
  • Better binlog/relay log space management.
    • Several servers I manage write out a lot of binlogs in a day, so space is an issue.  Nevertheless, keeping as many binlogs as space allows is also important.  Also many of the applications are configured to large INSERTS, UPDATES including using ODKU (up to our configured max_packet_size of 16M) and whether using RBR or SBR the resulting output is likely to be compressible.  Given that compressing the whole bin log/ relay log files means you can’t jump to any location, it struck me that just compressing the individual events as they are written, or when they are being read might be possible and might be a way to save 2-3 x the binlog space usage. (See: Bug#46435)
    • Related to this point is the binlog retention period.  expire_logs_days, configured as a number of days sounds quite nice until you realise that you may fill the disk with 1 or 2 day’s worth of binlogs, so it would be nice to be able to configure a decimal value here which would make smaller periods possible if needed. That or have a new setting to configure the retention size rather than the period, a new variable such as: binlog_retention_size = 200G.
    • I have filled up many servers disks after cloning a slave as it catches up and pulls over old relay logs from the master, writing them locally before being able to apply them.  There is a configuration setting to limit this size but again it is static.

So these are just a few of the ideas I personally would like to see. I’ll see if I can update the post later providing some of the feature request bug reports as reference.

However, this is just part of the story. I have my ideas of what I would like to see in MySQL 5.7 and you have yours, but there does not appear to be a way currently with MySQL to voice interests to Oracle in a place where different people can be heard and it is possible to see what is most wanted.

  • http://bugs.mysql.com does allow you to add feature requests and indeed I have added many, but finding them or voting on them is not possible. Oracle’s official feature request mechanism makes the request go into a black hole where any progress is completely invisible to the requestor!
  • Oracle obviously has its own agenda and has the inconvenience of competitors who are in the same arena.  That does mean that Oracle, MPAB and Percona therefore have to do their own thing and try to guess what people want (though they’re obviously influenced by their main customers), but we do not see what appears until the new release comes out.  I think that can lead to rough edges as developers do not always realise the real world use cases and problems that the DBA faces on a day to day basis, and thus may focus on stuff which in the end may not be the most important to us.
  • Some Open Source projects have a bounty mechanism where people can offer to pay to get a feature implemented. The idea sounds quite good and if several people want the same thing, they may be able to push towards a common goal more quickly. I am, however, not sure how that might work in practice with something like MySQL, but the idea sounds interesting, if only to help better gauge where there is interest.

My way of using MySQL may vary significantly to yours. My problems may also be quite different, but I would imagine that there is a lot of common ground. So what would you like to see in MySQL 5.7?

In the meantime I guess we will have to wait and see what will come out in 5.7 and where things are going to go. There’s likely I guess to be a 18 month wait for that, so plenty of time to do lots of new and exciting things.  In the meantime MySQL 5.6 needs deploying on new servers once it has been fully tested and any issues resolved and that is quite a bit of work to keep me busy for a few days …

 

Update:

  • 2013-04-08 Adding some more I would likes:
    • A working I_S (mysql data dictionary) to avoid the workarounds needed on servers with hundreds of thousands of tables.
    • A way to compress/uncompress MyISAM tables from the mysql command line. This avoids having to work around locking, blocking and flushing issues if doing this directly from a shell. I’m using compressed MyISAM tables still as they’re much smaller than compressed InnoDB tables (and easier to move around too).