About InnoDB’s: Shortened warm-up times with a preloaded InnoDB buffer pool

This comment is in regard to this InnoDB blog page. Trying to post a message says you need to be logged in, but there’s no register/login page I could see…

Anyway the page talks about a new feature which allows you to save on shutdown an InnoDB buffer pool and to load this on startup, this ensuring that once loaded the database will perform with this “hot” cache.

That sounds interesting as I have seen on numerous occasions that if the buffer pool is not warm then performance can be a magnitude worse.

This looks like a very welcome feature. However, a couple of things are not clear to me.

  1. Having some example benchmark times of using this process and comparing it to trying to warm up the buffer pool by hand would be useful.  While this may heavily dependent on database content it would at least give us an idea of perhaps how much difference this may make.
  2. On servers I have to warm up currently we find performance becomes acceptable when perhaps 20% of the buffer pool has been warmed up.  The implementation behind the new functionality isn’t explained but one option which might be useful would be to save and load only a configurable percentage of the most recently used pages, not 100% of the pages as this patch seems to do. This would save time on save and on load, and possibly be good enough for most people. Many servers I use have 40 GB or 90 GB pools.  So loading files of this size into memory is going to take time. Thus having some control over how much of the buffer pool is loaded / saved seems like a good idea if this is possible.
  3. I assume the buffer pool is loaded and saved sequentially (in large blocks) giving the mysqld and the OS the chance to read/write the pages faster in larger chunks. Is this so?  Certainly currently when I see mysqld shutdown (in 5.1 and I believe 5.5 too) the shutdown times seem to be rather slow, and from what I can see the data does not appear to be written out quickly. Perhaps the shutdown generates a lot of random I/O. So moving to a faster linear read/write seems likely to be much more efficient.
  4. There is no mention of how the content of the file is checked for “consistency” to make sure that invalid data won’t be loaded. I’m sure there is some checking but otherwise this feature may be a way to corrupt the database inadvertently if the buffer pool cache file gets corrupted.

CentOS 6 – Blocked by the installer not quite doing things as I want

RHEL 6 has been out for some time and promises some nice new features. Finally CentOS 6 has been released so I was looking forward to getting it installed and upgrade my existing system.

This is currently running CentOS 5.6 and has quite a few services on it.  The best way to upgrade is usually to do a fresh install and then migrate existing over existing services. That’s what I was planning to do.  I only have a single PC so was trying to figure out the best way to go about this and at the same time minimise downtime.

I use LVM quite a lot. It makes worrying about which disk is used and where the partitions are located much less of an issue. My current setup is quite simple: 2 disks with 2 partitions each. sda1 holds /boot, sdb1 is free. sda2/sdb2 are configured in RAID-1 and hold a single volume group.

Plan A

My first plan was to use a free USB disk drive and install CentOS 6 onto that, leave some space on the USB disk and migrate the CentOS 5 volume group to the USB disk. Then I could move the CentOS 6 installation to the hard disks and complete the migration. That should have left me with the option during this process of booting CentOS 5 or CentOS 6 and migrating data from one system to another.

Plan 1 worked in the sense I was able to complete the install of CentOS 6 on the USB disk but I couldn’t get the installer to install grub on /dev/sdc (MBR): it insisted on doing it on sdc1 which of course the BIOS just ignored.  So I couldn’t get CentOS 6 to boot. I could have spent some more time fiddling around with this but then came up with what I thought was a better plan B.

Plan B

Plan B was simpler: given my setup I had a free partition I could boot from (/dev/sdb1) and also an existing volume group where I could put the new partitions. There was plenty of free space in the existing Volume Group so this should be much easier.  However, this does not work. The CentOS 6 installer shows the disks correctly and it is possible to edit /dev/sdb1 to be /boot. It also shows you the previously created Volume Group. However, it does not show you the existing logical volumes inside that volume group. The list is empty. So I could not tell it to reuse my swap partition, or to use /home from my existing home partition, and I was also fearful of creating new partitions and have the installer perhaps recreate the VG and thus wipe out my existing system.  So I have reported the problem to the CentOS team, though this looks like a bug with RHEL 6, and will play around further to see if I can get this to work.

This makes me wish I had a bigger environment at home.  In a work environment you can afford to install a new server, and set it up, migrating existing functionality from the current one, and then finally switch over. I don’t have the hardware to do that so upgrading is quite a bit more work, at least as I want to minimise downtime.

So I will continue to work out how to best to get the CentOS 6 install running and then migrate everything over.

2011-07-30 Update

After further investigation Plan A seemed to be a problem of my own making. The installer first shows you a list of disks and then asks which one you want to install the OS on, the others being called “data disks”.  There’s an option here to install boot loader. I missed this and if you do then later you are not offered the possibility of installing the bootloader on that disks MBR. So human error though I would prefer perhaps the install disks MBR to be an option, even if there’s a warning “You indicated you did not want to install the boot loader on the MBR of this disk”.

MySQL Partitioning and its Confusing Syntax

While looking at partitioning I recently made a mistake which I guess can happen to others. Often this is due to not fully reading the documentation or scanning it too quickly and misunderstanding what’s being said.

So this post is to complain about the MySQL partitioning syntax and to warn others who may easily make the same mistake without realising.

First we probably need to ask why we are partitioning a table in the first place. The main reasons for this are I think:

  • to improve query performance
  • to reduce individual .ibd file sizes for large tables (if using innodb_file_per_table)

In my case I wanted to do both. I had a several tables which store a large number of rows (batches of data) based on an incremental batch number. One of these tables was around 40 GB and had about 500,000,000 rows in it.  When processing data in this table often all the data from a particular batch run would be aggregated with the resultant data being stored elsewhere.

So I knew I wanted to partition and performance of these aggregations would be improved as the time to table scan a batch would be reduced to the time to scan the partition rather than the whole table.

The primary key of these tables was already in the form ( batch_id, other_key ) so I really wanted to just partition by the batch_id key, using in my case 64 partitions. batch_id is defined as int unsigned NOT NULL.

This is where I made the mistake. There are various ways to partition tables in MySQL and they are named: RANGE, LIST, HASH, KEY.  Given the batch_id was a gradually increasing value and I didn’t want to modify the partitioning once it was created RANGE and LIST seemed inappropriate. Of HASH and KEY, I incorrectly assumed that HASH would do some complex hash function of my integer batch_id, and since batch_id was part of the key that KEY would be the right way to partition.

So I incorrectly defined the table like this:


When you read the documentation you see that for HASH-type partitioning you provide a functional value which must be numeric and it actually determines the partition to use by doing MOD( your_functional_value, number_of_partitions ).

KEY-type partitioning works diferently and only allows you to provide column names and then it uses its own internal hashing function to generate the partition id.

So using PARTITION BY KEY ( numeric_column_name ) seems to correct but is likely to be more expensive to calculate. For large tables this is likely to cause additional performance issues.  It looks like I’m going to have to rebuild the tables I’ve just partitioned and that’ll be another weekend of work.

A suggestion to those at Oracle is that:

  • PARTITION BY KEY ( numeric_column_name ) should be modified to behave like PARTITION BY HASH ( numeric_column_name ). However, as this is likely to cause on disk incompatibilities during a version change if it were implemented, I’m guessing it just won’t happen, unless there’s some easy way to distinguish the current behaviour and my proposed new behaviour.
  • The documentation is made a little clearer and mentions this obvious case. What I see with a lot of the MySQL documentation is that it documents technically how things are done rather than documenting the problem and then how to implement the solution. Since it’s likely that many people are going to partition on one of the columns especially if a multi-column primary key is used this use case should be made clearer.

If I had time I’d look at how partitioning is implemented in Oracle database, Sybase, Postgres or DB2 and see whether it’s just MySQL which has chosen these unfortunate keywords for defining their partitioning methods.

However, I’m curious: am I the only one to fall in this trap?