SoFunction
Updated on 2025-04-27

Creation and data cleaning of MySQL time partition tables

introduction

In the data-driven era, the management of time series data has become an important topic in database operation and maintenance. As one of the most popular relational databases, MySQL has a partitioning function that provides an effective solution for processing large-scale time series data. This article will explore in-depth the principles, creation methods of MySQL time partition tables, and how to efficiently clean out expired partition data, helping you build an automated data lifecycle management system.

1. Basic concepts of MySQL partition tables

1.1 What is a partition table

A partitioned table is a technique that physically divides a large table into multiple small tables (partitions) and still logically manifests as a complete table. Each partition can be stored independently in a different physical location, but querying is still as simple as operating a single table.

1.2 Advantages of time partitioning

Query performance improvement: Just scan relevant partitions instead of full tables

Convenient maintenance: can be individually backed up, restored or cleaned up data in a specific time period

IO distribution: Different partitions can be placed on different disks

Efficient deletion: Deleting the entire partition directly is more efficient than DELETE statement

1.3 Partition type comparison

MySQL supports multiple partition types, and the main ones that are suitable for time series data are:

  • RANGE partition: Assign rows to partition based on the range of column values
  • RANGE COLUMNS partition: similar to RANGE but supports multiple columns
  • LIST partition: based on discrete value list
  • HASH partition: based on user-defined expressions

2. Create a time partition table

2.1 Basic partition table creation

CREATE TABLE sales (
    id INT AUTO_INCREMENT,
    sale_date DATETIME NOT NULL,
    product_id INT,
    amount DECIMAL(10,2),
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (TO_DAYS(sale_date)) (
    PARTITION p202301 VALUES LESS THAN (TO_DAYS('2023-02-01')),
    PARTITION p202302 VALUES LESS THAN (TO_DAYS('2023-03-01')),
    PARTITION p202303 VALUES LESS THAN (TO_DAYS('2023-04-01')),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

2.2 Automatically partition by month

MySQL 5.7+ supports more flexible partition expressions:

CREATE TABLE log_data (
    log_id BIGINT NOT NULL AUTO_INCREMENT,
    log_time DATETIME NOT NULL,
    user_id INT,
    action VARCHAR(50),
    PRIMARY KEY (log_id, log_time)
) PARTITION BY RANGE (YEAR(log_time)*100 + MONTH(log_time)) (
    PARTITION p202301 VALUES LESS THAN (202302),
    PARTITION p202302 VALUES LESS THAN (202303),
    PARTITION p202303 VALUES LESS THAN (202304),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

2.3 View partition information

-- View the partition structure of the table
SELECT partition_name, partition_expression, partition_description 
FROM information_schema.partitions 
WHERE table_name = 'sales';

-- View partition data volume
SELECT partition_name, table_rows 
FROM information_schema.partitions 
WHERE table_name = 'sales';

3. Partition maintenance operation

3.1 Add a new partition

-- existMAXVALUEAdd a new partition before partition
ALTER TABLE sales REORGANIZE PARTITION pmax INTO (
    PARTITION p202304 VALUES LESS THAN (TO_DAYS('2023-05-01')),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

-- Or a simpler way(MySQL 5.7+)
ALTER TABLE sales ADD PARTITION (
    PARTITION p202304 VALUES LESS THAN (TO_DAYS('2023-05-01'))
);

3.2 Merge partitions

-- Merge adjacent partitions
ALTER TABLE sales REORGANIZE PARTITION p202301, p202302 INTO (
    PARTITION p2023_q1 VALUES LESS THAN (TO_DAYS('2023-04-01'))
);

3.3 Rebuild the partition

-- Rebuild partition optimization storage
ALTER TABLE sales REBUILD PARTITION p202301, p202302;

4. Delete expired partition data

4.1 Delete the partition directly

-- Delete a single partition
ALTER TABLE sales DROP PARTITION p202201;

-- Delete multiple partitions
ALTER TABLE sales DROP PARTITION p202201, p202202, p202203;

4.2 Clear partition data

-- Clear partition data but preserve partition structure
ALTER TABLE sales TRUNCATE PARTITION p202301;

4.3 Automatically delete scripts

DELIMITER //
CREATE PROCEDURE clean_time_partitions(
    IN p_table_name VARCHAR(64),
    IN p_retain_months INT
)
BEGIN
    DECLARE v_done INT DEFAULT FALSE;
    DECLARE v_part_name VARCHAR(64);
    DECLARE v_part_date DATE;
    DECLARE v_cutoff_date DATE;
    
    DECLARE v_cur CURSOR FOR 
        SELECT partition_name, 
               STR_TO_DATE(SUBSTRING(partition_name, 2), '%Y%m%d')
        FROM information_schema.partitions 
        WHERE table_schema = DATABASE() 
          AND table_name = p_table_name
          AND partition_name != 'pmax';
    
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done = TRUE;
    
    SET v_cutoff_date = DATE_SUB(CURRENT_DATE(), INTERVAL p_retain_months MONTH);
    
    OPEN v_cur;
    
    read_loop: LOOP
        FETCH v_cur INTO v_part_name, v_part_date;
        IF v_done THEN
            LEAVE read_loop;
        END IF;
        
        IF v_part_date < v_cutoff_date THEN
            SET @sql = CONCAT('ALTER TABLE ', p_table_name, ' DROP PARTITION ', v_part_name);
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
            
            SELECT CONCAT('Dropped partition: ', v_part_name) AS message;
        END IF;
    END LOOP;
    
    CLOSE v_cur;
END //
DELIMITER ;

4.4 Creating a timed event

-- Enable event scheduler
SET GLOBAL event_scheduler = ON;

-- Create monthly execution events
CREATE EVENT event_clean_old_partitions
ON SCHEDULE EVERY 1 MONTH STARTS '2023-05-01 02:00:00'
DO
BEGIN
    -- Keep the latest12Monthly data
    CALL clean_time_partitions('sales', 12);
    CALL clean_time_partitions('log_data', 12);
END;

5. Advanced partition management skills

5.1 Partition and Index Optimization

-- Add a local index to the partition table
ALTER TABLE sales ADD INDEX idx_product (product_id);

-- Check partition index usage
EXPLAIN PARTITIONS SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31';

5.2 Partition table backup strategy

-- Backup a specific partition separately
mysqldump -u username -p dbname sales --where="TO_DAYS(sale_date) &lt; TO_DAYS('2023-02-01')" &gt; sales_partition_q1.sql

-- Physical backup of partition files(needInnoDBFile per tablespace)
cp /var/lib/mysql/dbname/sales#P# /backup/

5.3 Partition table monitoring

-- Monitor partition tablespace usage
SELECT partition_name, table_rows, 
       ROUND(data_length/(1024*1024),2) AS data_mb,
       ROUND(index_length/(1024*1024),2) AS index_mb
FROM information_schema.partitions
WHERE table_name = 'sales';

-- Monitor partition query hit rate
SELECT * FROM sys.schema_table_statistics 
WHERE table_name = 'sales';

6. Frequently Asked Questions and Solutions

6.1 Partition selection failure problem

Problem phenomenon: The query does not use partition clipping correctly

Solution:

-- make sureWHEREConditions use partition keys
EXPLAIN PARTITIONS SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31';

-- Avoid using functions on partition keys
-- Bad writing: WHERE YEAR(sale_date) = 2023
-- Good writing: WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'

6.2 Partition number limit

Problem phenomenon: MySQL default limits the number of partitions to 8192

Solution:

-- Check the current number of partitions
SELECT COUNT(*) FROM information_schema.partitions WHERE table_name = 'sales';

​​​​​​​-- Merge historical partitions if necessary
ALTER TABLE sales REORGANIZE PARTITION p202201, p202202, p202203 INTO (
    PARTITION p2022_q1 VALUES LESS THAN (TO_DAYS('2022-04-01'))
);

6.3 Cross-partition query performance

Problem phenomenon: Performance degradation when query spans multiple partitions

Solution:

-- Consider adjusting partition granularity(If changed from monthly to quarterly)
ALTER TABLE sales PARTITION BY RANGE (QUARTER(sale_date)) (
    PARTITION p2022_q1 VALUES LESS THAN (2),
    PARTITION p2022_q2 VALUES LESS THAN (3),
    -- ...
);

-- Or add a summary table
CREATE TABLE sales_monthly_summary (
    month DATE PRIMARY KEY,
    total_amount DECIMAL(15,2),
    total_orders INT
);

-- Refresh summary data regularly using events

7. Actual case analysis

7.1 Partitioning practice of e-commerce order system

Scenario: Daily order volume is 100,000+, hot data must be retained for 3 years, and earlier data must be archived

Solution:

-- Create quarterly partition table
CREATE TABLE orders (
    order_id BIGINT AUTO_INCREMENT,
    order_date DATETIME NOT NULL,
    customer_id INT,
    amount DECIMAL(12,2),
    PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (QUARTER(order_date)) (
    PARTITION p2022_q1 VALUES LESS THAN (2),
    PARTITION p2022_q2 VALUES LESS THAN (3),
    PARTITION p2022_q3 VALUES LESS THAN (4),
    PARTITION p2022_q4 VALUES LESS THAN (5),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

​​​​​​​-- Creating an archive process
DELIMITER //
CREATE PROCEDURE archive_old_orders(IN retain_years INT)
BEGIN
    DECLARE cutoff_quarter INT;
    DECLARE cutoff_year INT;
    
    SET cutoff_year = YEAR(DATE_SUB(CURRENT_DATE(), INTERVAL retain_years YEAR));
    SET cutoff_quarter = QUARTER(DATE_SUB(CURRENT_DATE(), INTERVAL retain_years YEAR));
    
    -- Archive old data to historical tables
    INSERT INTO orders_archive
    SELECT * FROM orders PARTITION (p2022_q1, p2022_q2)
    WHERE YEAR(order_date) &lt; cutoff_year
       OR (YEAR(order_date) = cutoff_year AND QUARTER(order_date) &lt; cutoff_quarter);
    
    -- Delete archived partitions
    ALTER TABLE orders DROP PARTITION p2022_q1, p2022_q2;
    
    -- Add a new partition
    ALTER TABLE orders ADD PARTITION (
        PARTITION p2023_q1 VALUES LESS THAN (2)
    );
END //
DELIMITER ;

Conclusion

MySQL time partition tables are a powerful tool for managing large-scale time series data. By rationally designing partitioning strategies and automated maintenance scripts, query performance can be significantly improved, data maintenance work can be simplified, and storage costs can be reduced. The technologies and methods introduced in this article have been verified in multiple production environments. We hope that readers can flexibly apply them according to their own business characteristics and build an efficient data life cycle management system.

This is the end of this article about the creation and data cleaning of MySQL time partition tables. For more related contents of MySQL partition tables, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!