MySQL Partitioning and its Confusing Syntax

While looking at partitioning I recently made a mistake which I guess can happen to others. Often this is due to not fully reading the documentation or scanning it too quickly and misunderstanding what’s being said.

So this post is to complain about the MySQL partitioning syntax and to warn others who may easily make the same mistake without realising.

First we probably need to ask why we are partitioning a table in the first place. The main reasons for this are I think:

  • to improve query performance
  • to reduce individual .ibd file sizes for large tables (if using innodb_file_per_table)

In my case I wanted to do both. I had a several tables which store a large number of rows (batches of data) based on an incremental batch number. One of these tables was around 40 GB and had about 500,000,000 rows in it.  When processing data in this table often all the data from a particular batch run would be aggregated with the resultant data being stored elsewhere.

So I knew I wanted to partition and performance of these aggregations would be improved as the time to table scan a batch would be reduced to the time to scan the partition rather than the whole table.

The primary key of these tables was already in the form ( batch_id, other_key ) so I really wanted to just partition by the batch_id key, using in my case 64 partitions. batch_id is defined as int unsigned NOT NULL.

This is where I made the mistake. There are various ways to partition tables in MySQL and they are named: RANGE, LIST, HASH, KEY.  Given the batch_id was a gradually increasing value and I didn’t want to modify the partitioning once it was created RANGE and LIST seemed inappropriate. Of HASH and KEY, I incorrectly assumed that HASH would do some complex hash function of my integer batch_id, and since batch_id was part of the key that KEY would be the right way to partition.

So I incorrectly defined the table like this:

ALTER TABLE xxxxx PARTITION BY KEY ( batch_id ) PARTITIONS 64;

When you read the documentation you see that for HASH-type partitioning you provide a functional value which must be numeric and it actually determines the partition to use by doing MOD( your_functional_value, number_of_partitions ).

KEY-type partitioning works diferently and only allows you to provide column names and then it uses its own internal hashing function to generate the partition id.

So using PARTITION BY KEY ( numeric_column_name ) seems to correct but is likely to be more expensive to calculate. For large tables this is likely to cause additional performance issues.  It looks like I’m going to have to rebuild the tables I’ve just partitioned and that’ll be another weekend of work.

A suggestion to those at Oracle is that:

  • PARTITION BY KEY ( numeric_column_name ) should be modified to behave like PARTITION BY HASH ( numeric_column_name ). However, as this is likely to cause on disk incompatibilities during a version change if it were implemented, I’m guessing it just won’t happen, unless there’s some easy way to distinguish the current behaviour and my proposed new behaviour.
  • The documentation is made a little clearer and mentions this obvious case. What I see with a lot of the MySQL documentation is that it documents technically how things are done rather than documenting the problem and then how to implement the solution. Since it’s likely that many people are going to partition on one of the columns especially if a multi-column primary key is used this use case should be made clearer.

If I had time I’d look at how partitioning is implemented in Oracle database, Sybase, Postgres or DB2 and see whether it’s just MySQL which has chosen these unfortunate keywords for defining their partitioning methods.

However, I’m curious: am I the only one to fall in this trap?

Tags: , , , ,

5 Responses to “MySQL Partitioning and its Confusing Syntax”

  1. Shlomi Noach says:

    Hi,
    I’m not sure I’ve followed correctly; but your mistake, as you describe it, is that you’ve been using KEY partitioning rather than HASH, and the problem is, as you describe it, that calculating the partition based on your key is expensive?

    I’m assuming you’re experiencing performance issues, otherwise you would not be complaining. But, I find it hard to believe that your performance issues are caused by that fact you’re using KEY over HASH. Can you support your diagnosis that the KEY calculation is what causing the performance drop?

    With respect, and I don’t intend to pick on you, you confess you have not read the manual properly initially, but once you have, you realized your mistake.
    As I recall, the first time I read the partitioning docs I got a good picture of the differences between KEY and HASH. I don’t think the syntax is misleading, either. HASH behaves just like most hashes do.

    Regards

  2. Sheeri says:

    It’s not a trap. You said yourself it’s documented. Most people who don’t know about something at least will read the manual page on it. So yeah, I’d say very few people would make that mistake.

  3. Simon J Mudd says:

    Of course reading the manual is a good thing to do, but invariably people do not read everything, especially if a command option seems intuitive. My real gripe here is that the KEY and HASH words don’t seem to reflect the usage you might expect. Also that KEY on a numeric column might well be expected to run as a MOD(key,nr_partitions) as that’s probably a more efficient implementation.

    As to the performance differences I’ll see if I can rebuild one of these tables using both KEY and HASH and see whether in fact the hash function’s extra “complexity” does indeed make much or any difference.

  4. Simon J Mudd says:

    I’d say it does make a difference. Here is the result of a batch run I did:

    ————–
    ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 PARTITION BY KEY ( SendID ) PARTITIONS 64
    ————–

    Query OK, 126778361 rows affected (41 min 42.08 sec)
    Records: 126778361 Duplicates: 0 Warnings: 0

    ————–
    ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 PARTITION BY HASH ( SendID ) PARTITIONS 64
    ————–

    Query OK, 126778361 rows affected (39 min 41.32 sec)
    Records: 126778361 Duplicates: 0 Warnings: 0

    ————–
    ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 PARTITION BY KEY ( SendID ) PARTITIONS 64
    ————–

    Query OK, 126778361 rows affected (42 min 21.50 sec)
    Records: 126778361 Duplicates: 0 Warnings: 0

    ————–
    ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 PARTITION BY HASH ( SendID ) PARTITIONS 64
    ————–

    Query OK, 126778361 rows affected (33 min 2.86 sec)
    Records: 126778361 Duplicates: 0 Warnings: 0

    Bye

    I did this twice to see if it would make a difference if the table was “hot” or not.
    Both times the PARTITION BY HASH is faster.

    This table is defined as:

    CREATE TABLE my_table (
    EventDate timestamp NOT NULL DEFAULT '1970-01-01 02:00:00',
    SendID smallint(5) unsigned NOT NULL,
    OtherID int(10) unsigned NOT NULL,
    PRIMARY KEY (SendID,OtherID,EventDate)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
    /*!50100 PARTITION BY HASH ( SendID)
    PARTITIONS 64 */

    It’s not the largest table I was working with but a smaller one which uncompressed uses 8.5 GB of disk space and compressed 2.8 GB.

    The larger tables I was working with are 4 times that and growing and while they are not that large it does seem to me that there is a difference in using PARTITION BY HASH vs PARTITION BY KEY.

    So with these numbers I stand by my claim. The internal hash algorithm when doing PARTITION BY KEY on integer columns is less effective than the MOD() mechanism used when doing PARTITION BY HASH. I’d therefore say it makes sense to implement the PARTITION BY KEY _for numeric columns_ as if a PARTITION BY HASH ( column_name ) had been requested.

  5. Shlomi Noach says:

    Hi,

    With regard to your test, I do not think it necessarily reflects the actual behavior on a running server. By way of example, consider InnoDB Plugin & compression.
    Altering a table into InnoDB is known to be somewhat slower than MyISAM (does that mean InnoDB is slower than MyISAM?)
    However, altering into a compressed InnoDB tables takes more time, by order of magnitudes. I’ve evidenced x2, x5 and even x10 (!) time to create a compressed table as compared to an uncompressed (this depends on compression level and version of InnoDB plugin).
    However it was very clear that InnoDB plugin provided superior performance on real workloads. You are not ALTERing tables on your real life; you’re doing INSERTs, SELECTs, …, so this is what you should be testing.
    Moreover, I’ve noticed InnoDB plugin took more time to “warm up”. My clients were complaining about the poor performance of innodb plugin just after migration and restarting of the server. A couple hours later, everyone were happy.

    The above is only a way of illustration. Sure, the KEY computation is based on an ecryption algorithm – it IS more expensive than doing modulus.
    But will you ever notice that?
    I understand from your post that your data is very large. I guess that you are I/O bound, not CPU bound. Even with partitioning, you’ll still be I/O bound. I’m *guessing* (for I haven’t examined your system) that the difference in CPU computation time for the partitioning key is neglectable in comparison to your I/O operations.

    Again this is my guess; you’ll have to test INSERTs and SELECTs or whatever it is you’re usually issuing, in order to reach a real conclusion. An ALTER just isn’t a good enough metric.

    With regard to docs: you are right that not everyone has the time to fully read all the docs. Like you, I’ve failed to fully read manuals, to find myself doing things wrong. But I realize this was because I haven’t read the manual properly. Though I share your desire that things should be put to words quickly, I realize that other people expect *other* things to be put to words quickly before that…

    But, really, HASH is just what hashes do. I don’t see why you think this is misleading.

    And, please re-consider the KEY scheme. It’s really more viable to use it if your data is not known to be distributed evenly. For example, I know some cases where some values have more probability to be even than odd. This means a HASH based partitioning scheme will create uneven partitions. KEY should distribute evenly.

Leave a Reply