MySQL Learning Record: KEY Partitioning Causes Bloodshed

Requirements background

The data in the tb_image section of the business table is shown below, where id is unique and image_no is not. image_no indicates the number of each file, and each file is generated in the business system as a number of files, and the unique ID of each file is the field id:

Some of the business table tb_image is as follows:

Query by image_no and query by id;
Stock data 2kw;
Daily growth is around 4w;
The daily query volume is around 20w;
Non-ToC system, so concurrent ceilings are visible;

Options

According to the above analysis of the business, it is completely unnecessary to split the table into separate libraries. If a single library is divided into tables, since it is necessary to query based on image_no and id, one option is to redundantly divide the tables (i.e., one copy of the data is saved with image_no as the fragmentation key, and the other copy of the data is saved with id as the fragmentation key); another option is to only use image_no as the fragmentation key, and based on the query demand for id, the business layer carries out the results of the merger or introduces a third-party middleware.

Taking into account the complexity of the single library table, so decided to use the partitioning characteristics, and capacity assessment of the partitioning table program 128 partitions (each partition of the data volume kw level) to ensure that the business at least 15 years of stable operation (the orange part of the figure is more in line with the actual growth of their own business):

In addition, because RANGE, LIST, HASH partition do not support VARCHAR columns, so decided to use KEY partition, the official introduction of its principle is to MySQL built-in hash algorithm and then take the module of the partition number.

performance testing

After selecting the partition key as image_no and deciding on the number of partitions as 128, it's time to pour in the data for feasibility and performance testing. The reason for choosing 128 is that 1.1 billion/1kw=110≈128, and programmers like to use 2 to the Nth power, you know. However, this partition number 128 is the beginning of all nightmares.

I tried to insert 10 w data to 128 partitions, after insertion, let my surprise phenomenon appeared: all odd-numbered partitions (p1, p3, p5, ... , p2n-1) actually do not have a piece of data , at the same time, any one of the even-numbered partitions but there are a lot of data, and it is not very uniform. This is shown in the figure below:

Note: The ibd file size for odd numbered partitions is 112k, which is the initialized size when the partition table is created, and there is no actual data. We can pass SQL: select partition_name, partition_expression, table_rows from information_schema.partitions where table_schema = schema() and table_ name='image_subpart' ;Verify that the partial result is shown below:

Isn't 10w pieces of data enough to explain the problem? That's an average of nearly 800 pieces of data per partition! Okay, here's the kicker: I insert another 990w of data, totaling 1kw of data. The result is still the same, odd-numbered partitions have no data, even numbered partitions.

Reflections on the issue

Let's recall the principle of KEY partitioning: MySQL's built-in hash algorithm computes the hash value of a slice key and then models the number of partitions. This principle can also be found on the MySQL website at: 22.2.5 KEY Partitioning./doc/refman/5.7/en/The original text is reproduced below:

Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. NDB Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD().

**There can't be such a crummy hash algorithm in this world, can there? **Writing any random algorithm wouldn't be this uneven, would it? At this point I wondered if there was something in the configuration that was causing this. But show variables doesn't have any partition-related variables.

At this point, 10,000 horses gallop by. Could it be that the documentation and source code are out of sync? Well, look at the MySQL source code, after all, the source code is the closest place to the truth . KEY partition related source code in the file sql_partition.cc, I intercepted part of the key source code, as shown below, the initial observation, there is nothing wrong, the first calculation of the hash value of the partition field and then take the mode of the number of partitions:

/**
 Calculate part_id for (SUB)PARTITION BY KEY
 @param file        Handler to storage engine
 @param field_array     Array of fields for PARTTION KEY
 @param num_parts      Number of KEY partitions
 @param func_value[out]   Returns calculated hash value
 @return Calculated partition id
*/
inline
static uint32 get_part_id_key(handler *file,
               Field **field_array,
               uint num_parts,
               longlong *func_value)
{
 DBUG_ENTER("get_part_id_key");
 // Calculate the hash value of the partition field
 *func_value= file->calculate_key_hash_value(field_array);
 // Modulo the number of partitions
 DBUG_RETURN((uint32) (*func_value % num_parts));
}

With a mood of despair, please out of the search engine search: "KEY partition data is not uniform", the search results in the CSDN forum (/topics/390857704) there is a folk master Huaxia pawn answer as follows:

A colleague, based on the password function, analyzed and measured, key partition, can only specify the number of partitions as a prime number to ensure that each partition has data. I measured, from 11 partitions, to 17 partitions. Only 11,13,17 , the data of these 3 partitions is basically an average distribution.

At this time, another 10,000 horses galloped past. But WHAT THE F**K at the same time, the heart is also a little excited, because the solution may be found (although I do not yet know MySQL built-in hash algorithm for the hair will be so), and finally I once again on the KEY partition test and concluded as follows:

Setting an even number of partitions (PARTITIONS 64), such as 40, 64, 128, etc., will result in odd-numbered partitions (p1, p3, p5, p7, ... p2n-1) not being able to insert data at all;
If you set an odd but non-prime number of partitions like 63, 121 (PARTITIONS 63), all partitions will have data, but not evenly;
If you set a prime number of partitions like 137, 31 (PARTITIONS 137), all partitions will have data and be very uniform;

As shown in the figure below, it is the case after I adjusted the number of partitions to 127 and inserted 100w of data, and proved through SQL that the amount of data in each partition is almost the same:

Summarizing the review

I am shocked that there is no official explanation for such a big trap in the use of MySQL's partition by KEY. There are also MySQL bugs: Bug #72428 Partition by KEY() results in uneven data distribution

Students who are reading this article and have a very strong interest can try to go deeper into this issue. I will also find the next time, according to MySQL source code to dig deeper into the implementation of its hash algorithm why so sensitive to the number of partitions.

This article on MySQL Learning Record - KEY Partitioning Bloodshed is introduced to this article, more related MySQL KEY Partitioning Bloodshed content please search my previous articles or continue to browse the following related articles I hope you will support me more in the future!