Initial Reactions to MySQL 5.6

New versions of MySQL are always interesting to try out. Often they have features which I may have asked for myself so it’s satisfying to see them eventually appear on a system I use. Often other new features make life easier for the DBA. Finally we hope overall performance will improve and managing the server(s) will be come easier.

So I had a system which needs to make heavy writes, and performance was a problem, even when writing to SSDs. Checkpointing seemed to be the big issue and the ib_logfile size in MySQL 5.5 is limited to 4 GB. That seems a lot, but once MySQL starts to fill these files (and this happens at ~70% of the total I believe),  checkpointing kicks in heavily, and slows things down.  So the main reason for trying out MySQL 5.6 was to see how things performed with larger ib_logfiles. (Yes, MariaDB 5.5 can do this too.)

Things improved a lot for my specific workload which was great news, but one thing which I noticed was that initial install / setup time of a system with these large files increased a huge amount. (I was using 2 x 16000M files.)  That was noticeable even on a box with SSDs. On a box with HDs that would be horrible.

mysql_install_db now uses innodb to setup some system things and restarts the mysqld binary a couple of times. When you also have a large innodb_buffer_pool (I was trying on a box with 192GB of RAM) the startup of mysqld each time took quite a while.

So initial thoughts are that the bootstrapping probably does not need to use the normal innodb_buffer_pool_size setting (but should perhaps overwrite it with the minimal size needed).

Managing the size of the ib_logfiles is also something that requires restarting mysqld. Given the sort of experimenting that I was doing it would be nice to be able to dynamically configure this.  I can see many issues with changing the sizes of the existing files, but what seems easier to implement would be to be able to increase or decrease the number of files used (removing the older, higher number files once they are “used”/checkpointed), and that would provide a lot more freedom to the DBA.

I clone servers quite a bit and the extra 32 GB of these redo log files is a pain to clone/copy, and time consuming, so being able to reduce the number of files, and checkpoint prior to shutting the server down for cloning, but then re-enabling the normal value afterwards would save a lot of time cloning. So would be a handy feature.

I also tried playing with the new GTID features.  They look great. Except for one thing.  This expects master and slaves to have binlogs enabled, so that the GTID values are kept locally, and on startup.  I manage several boxes where for performance reasons the slaves do not have binlogs enabled.  If it dies the slave gets recloned.  Adding back the requirement for binlogs ( and log_slave_updates ) on a slave to enable this feature seems wrong. I can see the need that the different GTID values are kept somewhere, but don’t see the need to keep all binlog information, at least on a slave. Given the ratio of slaves to masters can be quite high that can be an issue.  If necessary write this information somewhere else, why not in an InnoDB table, so that on sever recovery you have consistent information with the rest of the database, something that might not happen with the binlogs… ?

Talking of binlogs one new feature I’m using, which I think is good is: binlog_row_image = minimal. This reduces the size of the binlogs and thus I/O.  I’m missing the next step which would be to compress those binlog events, and reduce binlog size further. That may not be an issue on many systems but several servers I manage write over 100 GB of binlogs a day. Reducing this further by being able to compress the binlogs would be nice, and having better control of expire_logs_days too (as the daily granularity can be too large in many cases for me) would help.

NOTE: In MySQL 5.6.5 with gtid_mode = ON , mysql_install_db crashes! Bug reported to Oracle, and I guess will get fixed soon.

I have seen a few other crashes in 5.6.4 and 5.6.5, I can’t provide links unfortunately as they’re not public. They are still not resolved.  I’d like to make the details of these crashes public as others may come across them, or have come across them before me, but these issues are not visible in http://bugs.mysql.com. Would be nice if they were but requires me duplicating information which is a pain.  Oracle, please make this easier. It is good for us all.

So with the little testing I’ve done so far MySQL 5.6 looks good. I need to spend more time with it and investigate the new features, many of which will make life easier.

When it finally goes GA I will have to spend a lot more time seeing how it behaves but I believe there are still a few issues which still need resolving prior to that happening.

Tags: , , , , , , , , ,

11 Responses to “Initial Reactions to MySQL 5.6”

  1. You forgot to mention that Percona Server has supported large log files for a long time — we added that feature first :-)

  2. James Day says:

    For cloning, consider what is done to resize the InnoDB log files: ensure that you have a clean shutdown, rename the existing files, change the size in my.cnf/ini, start up the server so it creates new files, delete the original files once you’re sure that all is well. So considering this, why are you copying the InnoDB log files if you can have a clean shutdown and let the destination server create them anew at the same size?

    What type of flushing do you use for InnoDB? Fixed or adaptive? Why?

    Async checkpointing starts at 75%, sync at 85%, with those thresholds I normally suggest a 60% target for flushing, though I do wish that the thresholds could be significantly raised. Async is usually so effective at stopping the log file space use increasing that it’s very rare to see a server getting anywhere near to sync flushing. This paragraph does not consider the changes to flushing in 5.6 and what they do to things.

    Consider what happens when there is a GTID position mismatch and a need to synchronise multiple slaves to the same position. It’s necessary to have a record of the changes that came in from replication so that the information from that can be used to bring all slaves to the same position. The question is where to store it. Binary logs are one place and the place chosen by this implementation. Preserving relay logs for a while would have been another possible choice, with the potential advantage that they already have to be written anyway. The binary log might then need just the GTID and some reference to a relay log position.

    More than just the GTID has to be retained for a while but there are some interesting choices to be made about where the required information is stored.

    Today, binlog off is our normal recommendation for slaves that aren’t being used to make backups. For performance reasons, it’s just redundant work to keep them. That may well change with GTIDs but it’s not without cost.

    Views are my own, not necessary those of anyone involved in any of this feature development or Oracle. For an official Oracle view, consult a PR person.

    James Day, Senior Principal Support Engineer, Oracle

  3. Simon J Mudd says:

    Hi James.

    Your point about copying the ib_logfiles being unnecessary if the shutdown is clean, is valid. It hadn’t been an issue until now, but looks an improvement I can make to my cloning procedures. Thanks.

    I’m using adaptive flushing in 5.5 and 5.6. The point in question is that for this particular setup I am using I am hitting the checkpointing. Merlin shows this very clearly and what seems unfortunate is the ib_logfiles are never filled, so the free space effective ends up being wasted space. Perhaps other workloads manage to fill up the ib_logfiles more than 85% but if they can not why have the threshold there? This is what I see: http://merlin.wl0.org/201204/29/2101736855.png and you see the threshold being reached. 9 GB of unused disk space, while cheap, is wasteful.

    As to the GTID info, you say yourself that you don’t recommend binlogging on a slave which won’t be used for backups:
    * Well the current implementation goes against that very recommendation, so you’ll have resistance from quite a few people to use this new feature if they have to change settings and plan for the extra space on servers where this is not needed. The extra I/O this generates is likely also to be a concern.
    * You are right, the information does need to be stored somewhere, but I’m just questioning the validity of the implementation, especially since this is _outside_ of the database files.
    * Cloning: Now you _have to_ copy over binlog files now to ensure that things stay consistent. If you do not then you lose the GTID information which is likely to cause a problem, but which is unlikely to generate any warning signals. Again, not good, as potentially can lead to silent database corruption (re-execution of statements).

    Many of these details are not currently documented as clearly as they could be. That’s fair enough as 5.6 is not GA, but consider improving this so that potential “gotchas” and pitfalls are made clearer and thus avoided. If you can suggest best practice procedures for doing this then that information will be useful too.

  4. Ernest G. Wilson II says:

    You stated “If it dies the slave gets recloned.” Can you elaborate on how easily and how exactly you do this? Is the master locked while you dump out data to clone? Maybe you could do an article on steps you take to clone a new slave in an existing environment? I was impressed by how casually you stated: “If it dies the slave gets recloned.” Thank you!

  5. James Day says:

    Thanks for sharing that. Looking at the flushing patterns I’d be inclined to:

    1. Increase innodb_io_capacity during the normal and busy load times to decrease the rate of growth of the checkpoint age as the server becomes busy.
    2. Use a cron job to increase innodb_io_capacity off peak to reduce the checkpoint age greatly during the off-peak time to give the greatest possible margin as the server becomes busy. Getting down way below the 10,000MB checkpoint age if your write working set makes that sensible. At the end of the usual lower load time, cut back to the normal value. Or use innodb_max_dirty_pages_pct to achieve the same effect, but increasing innodb_io_capacity does the extra writing at a more gentle rate, so it’s less likely to cause trouble with the remaining foreground load. innodb_max_dirty_pages_pct still has a gentle sawtooth bursting of writes in 5.5, so it’s not smooth, just more smooth, enough so that people seldom notice it.
    3. Set innodb_max_dirty_pages_pct to a low enough value that it causes background flushing before async flushing is reached. It tends to be somewhat more gentle in its effects than an async flush. This is something of a hack, it’s not what this variable is for, but it’s a useful hack in 5.5 with limited log file space.

    It’s not just disk space that is affected by having to leave 25% of the redo log space unused. The operating system cache tends to cache the InnoDB log files and for highest performance needs to, so it also affects the RAM that’s available for other uses. There’s still some conceptual catching up to do to realise that we are no longer in a world where 5MB log files are the norm and that async flushing is extremely effective at preventing a server from going far over the limit.

    There is an exception to this – servers that just don’t have the i/o capacity to do the writing. For those it’s necessary to limit the foreground write rates because they can never catch up. Fortunately DBAs seldom configure their systems to fail, but it does occasionally happen. This is not a feature that is in the server, it’s just an observation of what it takes to handle a rare contingency case.

    I don’t suppose you have a graph you’d care to share about how foreground transaction rates vary over the week?

    You seem to be nicely illustrating how there are opportunities to exploit by observing how load varies during the day and week and seeking to move more work to those times. This sort of thing is sometimes an innocent victim of benchmarks that simulate capacity planning failures or only peak load without the normal variations – you can’t get lower load times to exploit if you never provide any in the testing.

    Looking to 5.6, if you use adaptive flushing with that, it appears that a smoothing time of between six and twelve hours might be suitable for the normal background flushing that’s not happening when the log space is getting close to the limits. That’d help to shift more of the work way from the peak times. Can’t smooth the urgent log filling flushing over such a long period, though. That has to respond within minutes, though some smoothing is highly desirable for it as well.

    Smoothing is probably the most significant improvement to adaptive flushing for normal loads in 5.6. Normal meaning not the log filling parts – there have been other substantial improvements in that area, though those are also a form of smoothing. At the moment it isn’t possible to set smoothing periods differently for the normal background flushing and the urgent flushing.

    With quite long smoothing periods for normal non-urgent flushing and much shorter periods for urgent flushing we might end up with a server that can do an even better job of adjusting the primary flushing rate to what the server needs, while still handling the exceptional situations that sometimes happen. That’d be truly thorough adaptive flushing if we can find a way to get there.

    Not quite complete because we can also observe that even at busy times there are drops in load when it’s nice to try to sneak in some flushing and that there are pages that can be given priority for flushing because we know they aren’t likely to be changed again – say a page almost fully filled during an insert by autoincrement primary key.

    There’s a lot of interesting potential for more clever adaptive flushing when you look into how normal server loads vary.

    Much of this is just theoretical, not necessary anything that is or will be in the server. Just discussing it so you know that there are people in Oracle’s MySQL team who do think about such things, not limited to just me.

    What sort of changes to flushing would you like to see, beyond what we already have released?

    Views are my own, not those of Oracle. For an official Oracle view consult a PR person.

    James Day, Senior Principal Support Engineer, Oracle

  6. Simon J Mudd says:

    Ernest, hi. Cloning slaves is a reasonably straightforward process if you have enough hardware. I usually dedicate a spare instance (a slave) for such purposes. A server dedicated for doing backups can also be used for this purpose. If the server dies due to an unrecoverable crash[1], or to hardware failure, the spare slave can be stopped, and the data copied over to another server (the failed server). Once complete you can restart the failed server and it will continue as before. Some time is needed to allow the recloned slave to “catch up”, and also to “warm up” after being stopped but usually pretty quickly afterwards you have a new working slave.

    I use Linux and thus LVM storage for the servers, and thus to avoid having the cloned slave down for 2 long I take an LVM snapshot of the server’s filesystem (this only contains the MySQL database), and then can restart the server, mounting the snapshot somewhere else to allow it to be copied from.

    [1] In MySQL 5.1 and earlier it was very easy to have an issue due to the master.info file not being in sync with the recovered database position. So after switching on a server after a sudden power off would leave you with with a recovered database (InnoDB) but the log file positions for replication “out of sync”, usually up to 30 seconds behind. So often you’d get duplicate key errors, or perhaps replayed transactions causing “silent corruption” in the database.

  7. Simon J Mudd says:

    James, you spent a long time responding to me and I’ve not responded back until now.

    You make this comment: “The operating system cache tends to cache the InnoDB log files and for highest performance needs to, so it also affects the RAM that’s available for other uses.”

    I don’t really understand that. First why is there no option to use something like O_DIRECT on the ib_logfiles, thus avoiding the filesystem buffer cache? Second: generally all that MySQL does is to write sequentially to the log files. So why does the OS need to cache this information? Also the servers I manage are dedicated to MySQL so I really want to use all the RAM I can. Yes, there are other processes on the server (ntp, puppet, merlin agent, cron, … to name a few), but they generally don’t take up much space. [The merlin agent’s footprint is rather large for my taste sometimes. I’ve seen 400MB+ sizes which seems a lot, though on a server with 48GB+ that’s not so much.]

    You asked for some more graphs: last week was a bit quieter but you can see some here.

    As to what further changes I would like to see: that’s not clear to me yet. I’ve not spent a lot of time look at this 5.6 instance which I’m playing with. I need to spend more time looking at this and also on servers with Hard Disks as this particular instance is SDD based.

    You’re probably aware that these graphs come from Merlin. One thing I notice missing here is that the different I/O rates are spread over several graphs. That’s not helpful as we need to track these metrics together. Also the values shown are in MB/s which is one metric but as seen on another system I am using sometime the raw I/Os per second are more important due to the differences between sequential and random I/O. Those numbers are not shown in Merlin and would also be welcome.

  8. You may be interested in reading https://www.facebook.com/note.php?note_id=10150219759210933 and http://ralpartha.blogspot.com/2011/06/effects-of-allodirect-on-tpcc-workload.html

    Strangely, I don’t think we have blogged about what the ALL_O_DIRECT feature is and why it is helpful. The short version is you can’t use O_DIRECT on the log files as currently designed because they are written in 512-byte increments. The increment has to be increased. At the same time, the 512-byte increment results in “read-around writes” — see page 353 of High Performance MySQL 3rd edition (in a MyISAM context, but it’s the same thing). That’s why having the log files cached in RAM is important for performance. Domas has also blogged about this; see http://dom.as/2010/11/18/logs-memory-pressure/ . I seem to recall that maybe Eric Bergen also wrote about it.

    By performing larger writes to the logs, we can a) avoid read-around writes so we are doing only writes to the logs, not reads followed by writes; and b) use O_DIRECT so they don’t put pressure on the page cache.

  9. James Day says:

    The OS has to cache it for best results because the write sizes require read/modify/write operations. Those are slow if the part of the log file that it is writing to has to be read from the disk first. One possible remedy is pre-reading to load it into the OS cache before the log writing reaches that point.

    This effect is part of why it’s still painful to have say 25% of the log file size unavailable for use even when the log file size is no longer limited to 4G: that 25% still has to be cached even though it’s impossible to use it because flushing is so aggressive that it’s impractical to do enough writing to get beyond that point. So it’s effectively wasting 25% of the log file space in RAM, not just on disk, though on disk is unfortunate enough.

    O_DIRECT for log files isn’t a pure win, sadly. The bigger write sizes can make it slower and there’s real reason to wonder whether it will actually improve things. There’s no doubt that people can construct benchmark tests that will show it’s better for some cases, the problem is whether it’s actually better more generally.

    The easy win is reducing the wasted 25% by letting the DBA set something sensible for their setup. The larger of 1% or 200 megabytes would be ample for almost all practical production systems where the DBA has done even a half-decent job of matching IO capability with server load. Not quite all because there are some extremes out there.

    The next easiest is less clear and I can’t say what we might do about that at the moment, if anything. Doing nothing or slightly changing our recommended settings might really be the best choice, even though it’s not exciting.

    None of this should be taken to suggest that we haven’t tried things. We have. If we do nothing obvious it’ll be because we couldn’t find anything that seemed worth doing in enough cases for it to be worthwhile.

    Thanks for publishing that data. Makes me wonder again about doing more flushing off peak if you’re having any issues with performance at peak times. Peak log writing/flushing load at peak select time isn’t the sort of thing that tends to be optimal for select performance, though it is what adaptive flushing, without smoothing, does. Sometimes there’s no choice about it, if the buffer pool or log files get too close to full, other times there’s plenty of both to use to defer things or do some extra flushing in advance

  10. Simon J Mudd says:

    In your last comment you talk about flushing more off-peak. In this particular case it doesn’t make much sense: there is no peak. However, I agree it would be nice to be able to _allow_ mysqld to flush more aggressively IFF it is not that busy, with that behaviour stopping if activity “picks up”. Right now you have a purge thread, and in 5.6 you can have multiple purge threads running but there’s no control of how aggressively they do their work, at least I don’t think so. I’ve been frustrated sometimes on a “idle slave” (not idle of course as it is replicating) that sometimes takes several minutes to shutdown. I/O load was low so the server could easily have spent “more time” do more flushing than it was. Currently there seems to be little direct control over this process, yet flushing earlier if possible means a sudden crash should require less recovery time, so would be good.

    I’m also looking forward to diving into the innnob_metrics table which hopefully means that all of the horrible parsing of SHOW ENGINE INNODB STATUS should go away and it’ll be much easier to see the different metrics and therefore monitor them over time.

  11. James Day says:

    I’ve been dreaming aloud to developers about InnoDB monitoring total I/O and slipping in some extra flushing when it drops more than a certain amount below the recent average. Sneaking in a bit of extra flushing during the transient drops even when the server is busy.

    At the moment it’s possible to adjust innodb_io_capacity or innodb_max_dirty_pages_pct to adjust flushing up and down with either cron or events. Not as convenient as it could be but doable.

    You blogging about what you want to see for typical servers with the usual sort of hilly workload profile – and whatever other profiles you have – would be useful. It’s fine for me to tell developers what we see but it’s still of great value for customers to provide pretty pictures of production servers and write about what they want.

    Personally I have quite a bit of focus on reducing i/o at the busiest time because that’s when the servers are most likely to be i/o=limited and hence when the i/o operations are most expensive, because they can force buying more hardware.

Leave a Reply