About InnoDB’s: Shortened warm-up times with a preloaded InnoDB buffer pool

This comment is in regard to this InnoDB blog page. Trying to post a message says you need to be logged in, but there’s no register/login page I could see…

Anyway the page talks about a new feature which allows you to save on shutdown an InnoDB buffer pool and to load this on startup, this ensuring that once loaded the database will perform with this “hot” cache.

That sounds interesting as I have seen on numerous occasions that if the buffer pool is not warm then performance can be a magnitude worse.

This looks like a very welcome feature. However, a couple of things are not clear to me.

  1. Having some example benchmark times of using this process and comparing it to trying to warm up the buffer pool by hand would be useful.  While this may heavily dependent on database content it would at least give us an idea of perhaps how much difference this may make.
  2. On servers I have to warm up currently we find performance becomes acceptable when perhaps 20% of the buffer pool has been warmed up.  The implementation behind the new functionality isn’t explained but one option which might be useful would be to save and load only a configurable percentage of the most recently used pages, not 100% of the pages as this patch seems to do. This would save time on save and on load, and possibly be good enough for most people. Many servers I use have 40 GB or 90 GB pools.  So loading files of this size into memory is going to take time. Thus having some control over how much of the buffer pool is loaded / saved seems like a good idea if this is possible.
  3. I assume the buffer pool is loaded and saved sequentially (in large blocks) giving the mysqld and the OS the chance to read/write the pages faster in larger chunks. Is this so?  Certainly currently when I see mysqld shutdown (in 5.1 and I believe 5.5 too) the shutdown times seem to be rather slow, and from what I can see the data does not appear to be written out quickly. Perhaps the shutdown generates a lot of random I/O. So moving to a faster linear read/write seems likely to be much more efficient.
  4. There is no mention of how the content of the file is checked for “consistency” to make sure that invalid data won’t be loaded. I’m sure there is some checking but otherwise this feature may be a way to corrupt the database inadvertently if the buffer pool cache file gets corrupted.

Tags: , , , ,

11 Responses to “About InnoDB’s: Shortened warm-up times with a preloaded InnoDB buffer pool”

  1. Ewen says:

    A similar feature was introduced to Percona Server early last year.

    http://www.percona.com/docs/wiki/percona-server:features:innodb_buffer_pool_restore_at_startup

    That documentation can help you understand how that works, I believe the Oracle implementation is similar.

    Vadim blogged about it, including some benchmark figures.

    http://www.mysqlperformanceblog.com/2010/01/20/xtradb-feature-save-restore-buffer-pool/

  2. Mark Callaghan says:

    You might have more luck with the Percona version of this. It only dumps page IDs. Reads are more random during the preloading, but this doesn’t require GBs of disk space. I think they fetch all pages in page order, but the Percona patch can be adjusted without changing too much code to only fetch the prefix of the LRU.

  3. Simon J Mudd says:

    Thanks to Ewen and Mark for the comments about the Percona server’s version of this command. Yes, it does sound good and I guess I really need to try out this version of MySQL. It seems that InnoDB save and load the whole buffer pool, at least that’s what I understand from the blog comment, whereas the Percona version only saves the page list which is bound to be much faster. If that’s not the case then they should be a bit clearer.

    As was stated in the post about the Percona patch actually figuring out how to warm a cache manually is next to impossible unless the application that accesses it is very stable. In my case the applications and schemas in the database of some servers change considerably so it’s almost a lost cause trying to get this right.

  4. Inaam Rana says:

    Simon,

    Good observations. Let me try to clarify some points about the 5.6.3 implementation that you raised:
    * The whole content of the buffer pool is not dumped. Only pair is dumped. So the text file that we get is usually kilobytes in size. IIRC you can actually view the file as it is in ascii format and you can even edit it (though not recommended).
    * We do sort the dump file at load time trying to make read access pattern as much sequential as possible.
    * Because the dump file is just a list of pages in the LRU list and the actual read at load time happens from the actual datafile therefore there is no chance of corrupting the database. There is a provision to skip such pages which existed at time of the dump but are no longer valid at load time.
    * Having an option of dump/load a certain percentage of the buffer pool is a good idea. And I think it is certainly doable. We were thinking more on the lines of providing options like load a specific index or a particular set of tables etc. but what you suggested also makes pretty good sense. Thank you for providing this feedback. We’ll very likely look into that in not so distant future.

  5. James Day says:

    A patch implementing this can be found at http://lists.mysql.com/commits/139851 :

    +We store the space id in the high 32 bits and page no in low 32 bits. */
    +typedef ib_uint64_t buf_dump_t;

    +Perform a buffer pool load from the file specified by
    +innodb_buffer_pool_filename.

    + /* First scan the file to estimate how many entries are in it.
    + This file is tiny (approx 500KB per 1GB buffer pool), reading it
    + two times is fine. */

    So that’s storing just the space id and page number in the dump file. On read the list is sorted by space and page then read asynchronously. There’s code present to handle these possible error cases among others:

    Page missing when there’s an attempt to load it.
    Page already in the buffer pool (because the load is asynchronous).
    Buffer pool smaller than the list of pages to load.
    Buffer pool bigger than the list of pages to load.

    It’ll just load the first part of the list if the new buffer pool size is too small and ignores pages that are missing.

    I agree that this could be clearer, I’ll check that nothing has changed and arrange something to improve the description and perhaps add some internals description.

    Loading just a specified percentage of the LRU list seems like a useful approach for larger pools, though hopefully asynchronous loading means it’s not too painful to load everything.

    All subject to change once I’ve confirmed with the developers, this is just from code reading.

    James Day, MySQL Principal Support Engineer, Oracle

  6. James Day says:

    I see (from an internal email) that Inaam has posted a reply that’s awaiting moderation. Perhaps best to delete my reply rather than approving it since he’s a more authoritative source.

  7. James Day says:

    I’ve clarified the blog entry, adding this text in various places:

    The file contains only the space and page IDs and is about 500k per gigabyte of buffer pool.

    Before loading the list is sorted so that pages will be read in as close to sequential order as possible. The load is asynchronous so it should not interfere with normal operations too much. The load is clever enough to handle different buffer pool sizes or pages that are missing during the load. A page will not be loaded if it is already in the buffer pool so there is no need to worry about problems with changed pages.

    The dump file is plain text so it is possible to edit it, though we do not recommend this.

    James Day, MySQL Principal Support Engineer, Oracle

  8. Simon J Mudd says:

    Thanks Inaam and James for clarifying the implementation.

    This makes things much clearer. I still think the option of doing a percentage of the buffer pool is good as many times certainly in my environment we build servers by cloning a similar slave. This means that we induce a replication delay while the whole database is copied (usually filesystem lvm snapshot) from the source server, and thus when the server starts up it is likely just due to replication to modify many “hot / common” pages. So loading in the whole buffer pool is likely to slow down the replication recovery (as the disks are likely to be saturated) if you want to catch up as soon as possible. Of course if you don’t warm up the cache a bit then replication itself it affected. So getting the right balance to ensure the slave catches up AND the buffer pool is full initialised seems tricky to implement, but I believe this could be a common reason why not loading the whole buffer cache first may be a good idea.

  9. James Day says:

    For replication slaves the page cleaner thread in 5.6 labs releases should also help: http://blogs.innodb.com/wp/2011/04/introducing-page_cleaner-thread-in-innodb/ . For slaves it was often the replication SQL thread that caused the async writes and was blocked until the flushing was done.

    For versions of MySQL without that flushing improvement you can fake it by setting innodb_max_dirty_pages_pct to a value sufficient to prevent 75% of the InnoDB log file space being used. Adaptive flushing turned on in 5.1 plugin and later also helps, as does setting the total InnoDB log file size larger than you might otherwise set it.

  10. Vasil Dimov says:

    Hi Simon,

    Thank you very much for your feedback, it is invaluable!

    Wrt loading only the first 20% of the LRU – there is a quick workaround for this until we have implemented the switch for doing it – the dump file contains the pages in reverse order as they are in the LRU. So loading only the first 20% of the LRU would mean to strip the first 80% of the file before loading its contents.

    Something like:

    $ total=$(wc -l foo
    $ mv foo ib_buffer_pool

    Hope this helps!

  11. Vasil Dimov says:

    Notice that if you are using multiple buffer pools then the contents of the dump file will be (assuming 3 buffer pools):

    oldest page from bp0, …, newest page from bp0, oldest page from bp1, …, newest page from bp1,
    oldest page from bp2, …, newest page from bp2

    A rough estimate is that a 60GB buffer pool will load for about 10 minutes (assuming 100MB/s sequential read speed). During this time clients will not be blocked from connecting to MySQL.

Leave a Reply