SoFunction
Updated on 2024-11-17

MySQL Advanced Features - Concepts and Mechanisms of Data Table Partitioning in Detail

MySQL's partitioning is implemented as a layer of wrapping around the data table, which means that indexes are actually defined on a per-partition basis, rather than for the entire table. This feature is different from Oracle, where indexes and data tables can be partitioned in more flexible and complex ways.

MySQL's partitioning determines which partitions a row of data belongs to by defining the conditions of the PATITION BY clause. When executing a query, the query optimizer distinguishes between partitions, which means that the query does not examine all partitions, but only those that contain the data for which the query is being asked.

The main purpose of partitioning is to perform a rough form of indexing and aggregation of data tables. This reduces overly extensive access to the data table and allows related rows of data to be stored close together. The benefits of partitioning are significant, especially for the following scenario:

  • When a data table is too large for the memory space to carry, or a data table has a lot of historical data as well as hot zone rows.
  • Partitioned data is easier to maintain than partitioned data. For example, old data can be easily removed by deleting an entire partition, while optimization, inspection and repair operations can be easily performed on individual partitions.
  • Partitioned data can be physically distributed for storage, which allows servers to use multiple hard drives more efficiently.
  • Partitions can be used to avoid bottlenecks in certain workloads.
  • For data backups, individual partitions can be backed up or restored individually, which is beneficial for large data sets.

The details of MySQL's partitioning implementation are so complex that figuring it out is difficult; we just need to focus on its performance. If you want to know more about it, you can read the section on partitioning in the MySQL manual. With partitioning comes other problems and limitations:

  • The commands to create and change tables are more complex.
  • There is a maximum of 1024 partitions per table.
  • In MySQL version 5.1, a partition expression must be an integer or return an integer; after MySQL 5.5, it is possible to use columns for partitioning in some cases.
  • Any primary key or unique index must contain all the columns in the partition expression.
  • Foreign key constraints cannot be used.

Zoning mechanism

As mentioned earlier, partitioned tables actually have multiple hidden physical storage tables, which are presented through handle objects. We cannot access the partitions directly. Typically, each partition is managed through a storage engine (hence the requirement that all partitions have the same storage engine), and the indexes in the data table are actually indexes to the hidden physical storage tables. From the storage engine's perspective, partitions are also data tables. The storage engine does not actually know whether the data table is standalone or a partition of a larger data table. Operations on partitioned tables are realized through the following logical operations:

SELECT Query

When a query is performed on a partitioned table, the partitioning layer opens and locks all hidden partitions, the query optimizer decides which hidden partitions can be ignored, and then the partitioning layer gets the query results by calling the storage engine that manages the partitions via the handle API.

INSERT operations

When a row of data is inserted, the partitioning layer opens and locks all partitions, then decides which partition stores the current data row and stores that data row in the corresponding partition.

DELETE operations

When deleting a row of data, the partitioning layer opens and locks all partitions, checks which partition contains the row of data, and then sends the delete request to that partition.

UPDATE operation

When modifying a row of data, the partitioning layer opens and locks all partitions, checks which partition contains the row of data and fetches that row for modification, and then determines which partition should contain the new row of data and sends the insert request to that partition and the delete request to the old partition.
Some of the operations above support partition filtering (i.e., ignoring irrelevant partitions). For example, when deleting a row, the server needs to first locate the data row position. If a matching partition expression condition is specified in the WHERE condition, the server can ignore partitions that do not contain the row. Similarly for UPDATE operations, and for INSERT operations themselves, the server looks for only the one partition that needs to be inserted, not all of them.

Just because the partitioning layer opens and locks all partitions does not mean that the partitions will remain locked. A storage engine like InnoDB, which can support row-level locking, will simply unlock the partition at the partitioning level. This locking and unlocking process is similar to the locking process of a normal InnoDB data table.

Types of partitions

MySQL supports several types of partitioning, the most common type being range partitioning-that is, partitioning by different ranges for certain column values or expressions. For example, the following statement groups sales data into partitions based on year:

CREATE TABLE sales (
  order_date DATETIME NOT NULL
  --Other column definitions
) ENGINE=InnoDB PARTITION BY RANGE(YEAR(order_date)) (
  PARTITION p_2018 VALUES LESS THAN (2018),
  PARTITION p_2019 VALUES LESS THAN (2019),
  PARTITION p_2020 VALUES LESS THAN (2020),
  PARTITION p_other VALUES LESS THAN MAXVALUE);

A variety of functions can be used in the partition clause. The main requirement is that a non-constant, deterministic integer must be returned. In the above example the YEAR function is used, but other functions such as TO_DAYS() can also be used. Partitioning using time intervals is a common way to partition date-based data.

MySQL also supports key, hash, and list partitioning methods, some of which support sub-partitioning (which is rarely used). After MySQL 5.5, it is possible to use the RANGE COLUMNS partition type to partition directly by date-based columns without using a function to convert dates to integers. Other common partitioning tricks include:

  • Partitioning with keys to minimize competition for InnoDB's mutexes;
  • Partitioning of ranges can be constructed cyclically using a remainder calculation, e.g. if only the last few days of data need to be maintained, either by taking the remainder of the date pair 7, or by partitioning using the number of days of the week in which it is located.
  • Suppose the data table does not have a self-incrementing primary key, but also wants to partition the aggregated hot zone data. Since the timestamp is not in the primary key, it is also not possible to use timestamp partitioning. This can be done using HASH(id DIV 1000000), which will partition the data every 1,000,000 rows. This makes it possible to accomplish what we want without changing the primary key. This also has an additional effect. We don't need to create partitioned constants to hold the new data.

This is the detailed content of MySQL Advanced Features - Concepts and Mechanisms of Data Table Partitioning, for more information about MySQL Advanced Features Data Table Partitioning, please pay attention to my other related articles!