Archive for January, 2012

Better Controlling MySQL Memory Usage

Wednesday, January 25th, 2012

MySQL, like a lot of other software, has many knobs you can tweak. Most of these knobs may affect behaviour, but more importantly most affect the memory usage of the server, so getting these settings right is very important.

Most of MySQL’s memory is really used just as a cache, in one form or another, information that otherwise is on disk. So ensuring you have as large a cache as possible is important. However, making these memory sizes too large will trigger the server to start swapping and possibly can cause it to crash or cause the kernel to kill the process when it runs out of memory.  So that’s something we want to avoid.

Certain settings affect memory allocation on a per connection/thread basis, being bounded by thread_cache_size and max_connections.  If you configure for the worst behaviour (max_connections) you may end up not actually using all the memory you have available, memory which normally could be used for other purposes.

Recently a server I managed was configured incorrectly with a large sort_buffer_size (4M to 256M) and larger read_buffer_size (4M to 20M).  The change in configuration on first glance looks quite innocent, and not noticing that these are per-connection settings this got rolled out. max_connections on this server was set to 1000, while normally there were ~40 connections of which only a few were active. The mysqld memory footprint on startup looked fine. In fact under normal usage it also worked fine. However spiky load suddenly changed this nice behaviour: as configured mysqld ramped up the thread count and hence memory usage, resulting in swapping and finally server death…

The fault of course was mine for not noticing, but this has been an issue with MySQL forever.  Most other RDBMSes manage memory slightly differently: you define how much memory you want to use (maximum), this is optinally locked into memory, and further requests for different buffers are taken from this global pool.  That is much safer, avoids unexpected swapping, or memory over-allocation, but does raise the question of what happens when a memory request can not be honoured.

Initially I would expect two different behaviours: either

  1. kill a thread whose memory can not be allocated, or
  2. wait a certain time to allocate that memory, after which it gets killed anyway.

Option (2) is probably saner, and possibly some sort of deadlock detection can kick if in all threads are waiting for memory, perhaps killing the younger thread, or thread which has done least work first. Possibly there are other better ways of doing this?

I can imagine that changing MySQL’s current behaviour to do something like this could be quite hard, especially as ideally the engines would also use the same memory management mechanisms, but I see this as being a good thing and would make MySQL more robust, especially under load, which is after all what counts.  Of course this will not happen in today’s 5.5 GA version, or tomorrow’s 5.6 version which is probably likely to appear some time this year. That’s a major change. It would be nice if Oracle look at this for 5.7 as a way of ensuring that when resource usage does come under pressure MySQL does not go heads up, but attempts to use the allocated resources as best as possible.

In the meantime what would help would be:

  • better documentation so we can see clearly how all mysql memory is allocated. There are several web pages commenting ways to calculate this, but certainly no definitive guide.
  • The InnoDB engine’s documentation talks about memory usage and most people think that the innodb_buffer_pool_size is the main setting. yet read further and there’s talk of an overhead of perhaps 1/8th. I have recently been playing with innodb_buffer_pool_instances settings > 1 (using values in the range of 20-40) and am inclined to think that this increases that overhead somewhat more, yet there’s no documentation on this and whether my guess is right or not. Please InnoDB developers improve your documentation, if only to prove me wrong.
  • Ideally some tools to tell you if you server is possibly misconfigured. Coming from a Sybase environment I’d be tempted to suggest a stored procedure in the mysql database which can tell you total memory usage and how it is broken down as doing this with a single SELECT is going to be tricky.

Then once that is done consider adding some extra variable to enable total memory usage to be controlled. I made a feature request for this at http://bugs.mysql.com/?id=64108. If you think this feature might interest you please let Oracle know.

Why online disk storage is not the same as using tapes

Tuesday, January 10th, 2012

A couple of years ago I posted about an unlucky encounter with my Thecus N5200 Pro. On the 1st January at 06:15 I had a New Year’s present. A disk died on me. The Thecus duly notified me by starting to beep.

I still have my RAID-6 setup so there was no real problem. I have 2 redundant disks. So I went and bought a replacement and fitted it and the Thecus duly started rebuilding the RAID array with an estimated time to finish of about 1 day (5x1TB disks, so ~3TB array).

Disk prices have increased significantly since I originally bought my array but buying a single disk was not a problem.  During the time that the array was rebuilding or shortly afterwards however a second disk failed just as it had 2 years ago when I ended up losing my array as the RAID-5 setup did not have any further disks. This time however there was no problem. RAID-6 saved me and the photos and backup information I had on my Thecus so I was happy and I ended up running out and buying another disk. This one triggered the array rebuild and completed successfully.

That lead me to think. A lot of people push the move to external disks as a backup alternative. Certainly this gives you online information and generally works well. I am using this device at home to keep my photos and other files and also to do a backup of my main machine to a separate location.  However, what strikes me as being clear: replacing disks is expensive. Had I been using tapes for backups throwing away the 2 tapes and buying new ones would have been much cheaper than buying 2 new hard disks for my array.  Of course a disk array does not provide you a site backup and also does not provide you a convenient way to store multiple backups over time. For that it is not yet cost effective.

So after this experience I am tempted to look and see if there are any small tape devices which I could attach to my PC and use that to give me these extra facilities. The hardware no doubt exists but from previous searches it is expensive and out of the price range of the average home user. That is a shame.  With a lot of people now dismissing tape devices as old fashioned failure of drives in a small array like my 5-disk array may turn out to be pretty expensive (in money) or pretty costly (lost data).  RAID-1 or RAID-5 sound great but can you afford for a second disk to fail while the array is rebuilding after the first disk has failed? If you can not then arrays of this type may not be for you and may lead you to a false sense of security.

The title of my original post was Unlucky Encounter… So have I just been unlucky (again) or is this sort of problem something which has happened to more people?