I have been upgrading more MySQL database instances recently and have found a few more potential gotchas, which if you are not careful, can potentially be rather nasty. These are not documented explicitly by MySQL, so it may be handy for you to know if you have not come across this type of thing before.
Most of the issues are those related to upgrading MySQL instances which are replicated, either the master servers or the slaves. Some seem specific to the rpm packages I am using (MySQL enterprise or MySQL advanced rpms), though others are not.
Take care upgrading a 5.0 master when you have 5.1 slaves
It is not a good idea to run a mixed major version of mysql in a replicated environment so why would I be doing this? If you work in a replicated environment and have several slaves then it is recommended that you upgrade the slaves first. I work with quite a few slaves so the process of upgrading them all takes longer than you would think. Quite a long time in fact, while different systems are tested and upgraded. Along came a newer version of 5.0 and I thought of upgrading the master which had been giving a few issues, and one of them was resolved in the lastest 5.0.89. At least when using rpm packages, the package upgrade uses /usr/bin/mysql_install_db –rpm –user=mysql as part of the package upgrade procedure. This ensures that the mysql db is up to date but also writes to the binlog if one is configured. running on a master normally this would be the case. Of course these are 5.0 install commands and they are not really understood by the slaves which try to interpret them too. End result. Broken replication and you need to skip several transactions. If you have several slaves this can be rather painful.
Note: this is the cause of replicating the mysql database which I do as it is a good, quick and clean way to distribute GRANT information on to all slaves. As far as I know MySQL does not discourage this though perhaps many sites do not replicate their MySQL database.
If you upgrade a master server from 5.o to 5.1 by running mysql_upgrade after upgrading the binaries you may have a similar issue as the mysql_upgrade script determines that some new columns or tables are needed on the master and so adds them. Again this gets written to the binlog which is good for point in time recovery on the server itself, but again bad from a replication point of view if the slave does not run the same version of MySQL.
In fact the binlog has these 2 uses: (a) writing changes for point in time recovery, and (b) writing the changes for replication to slaves. Once could argue that for (a) the changes should not be written to the binlog but I think that is wrong if you have to recover just after an upgrade. So I would suggest that the upgrade changes should probably flagged specially in the binlog, allowing the slave to probably ignore them in the normal situation but also be able to recognise them, allowing you to stop the slave and perform the same binary upgrade, and then continue the upgrade with the new binaries exactly in the way this had been executed on the master. This behaviour should be controlled by a runtime flag which can be dynamically configured.
RPM Installs/Upgrades always start the server
rpm(8) is good but some of the design decisions in the MySQL rpms are questionable. One of these is that the current MySQL rpms are designed to always start when either doing a fresh install or doing an upgrade. Currently it’s not good practice to do rpm -Uvh MySQL-server-advanced-gpl…rpm (5.1 rpm) if you are running a MySQL-server-enterprise-gpl…rpm (5.0 rpm) so normally I stop MySQL, remove the enterprise rpm and install the advanced rpm. That starts the server and the slave on a box which is not completely “stable”. Solution is to add slave-skip-start to /etc/my.cnf but that should not be necessary as immediately after running mysql_upgrade you need to remove the value again.
Also during the upgrade process it is a good thing to avoid external client access so sometimes I also set the bind-address to 127.0.0.1 during the period of the upgrade. That may not keep clients away if they are running on the local server but helps in many cases.
All this leads me to a simple conclusion: if you can, upgrade one box the slow way and then clone the other slaves from that. Cloning is simple and requires no thought so is a good idea.
The other conclusion based on the first one is that if you can: build a new master. That is clone a slave to make a new master. This will be the new master. Configure it to write it’s own binlogs and then you can move the existing slaves underneath the new one. Once all slaves are underneath the new master (left as the only slave of the original master) then you can simply point clients to the new master instead of the old one. That keeps down time to a minimum and avoids many problems.
As far as I know none of the comments I have made above are in the current 5.1 upgrade documentation. I have opened quite a few tickets requesting the documentation be improved and I guess that will happen slowly. For many people the points I have mentioned may seem irrelevant for their situation but for me they have caused a few problems. If you do not need to worry, then you can skip the documentation, otherwise if this were documented then you would be saved a few tears when least expecting problems.