introduction
In the data-driven era, the management of time series data has become an important topic in database operation and maintenance. As one of the most popular relational databases, MySQL has a partitioning function that provides an effective solution for processing large-scale time series data. This article will explore in-depth the principles, creation methods of MySQL time partition tables, and how to efficiently clean out expired partition data, helping you build an automated data lifecycle management system.
1. Basic concepts of MySQL partition tables
1.1 What is a partition table
A partitioned table is a technique that physically divides a large table into multiple small tables (partitions) and still logically manifests as a complete table. Each partition can be stored independently in a different physical location, but querying is still as simple as operating a single table.
1.2 Advantages of time partitioning
Query performance improvement: Just scan relevant partitions instead of full tables
Convenient maintenance: can be individually backed up, restored or cleaned up data in a specific time period
IO distribution: Different partitions can be placed on different disks
Efficient deletion: Deleting the entire partition directly is more efficient than DELETE statement
1.3 Partition type comparison
MySQL supports multiple partition types, and the main ones that are suitable for time series data are:
- RANGE partition: Assign rows to partition based on the range of column values
- RANGE COLUMNS partition: similar to RANGE but supports multiple columns
- LIST partition: based on discrete value list
- HASH partition: based on user-defined expressions
2. Create a time partition table
2.1 Basic partition table creation
CREATE TABLE sales ( id INT AUTO_INCREMENT, sale_date DATETIME NOT NULL, product_id INT, amount DECIMAL(10,2), PRIMARY KEY (id, sale_date) ) PARTITION BY RANGE (TO_DAYS(sale_date)) ( PARTITION p202301 VALUES LESS THAN (TO_DAYS('2023-02-01')), PARTITION p202302 VALUES LESS THAN (TO_DAYS('2023-03-01')), PARTITION p202303 VALUES LESS THAN (TO_DAYS('2023-04-01')), PARTITION pmax VALUES LESS THAN MAXVALUE );
2.2 Automatically partition by month
MySQL 5.7+ supports more flexible partition expressions:
CREATE TABLE log_data ( log_id BIGINT NOT NULL AUTO_INCREMENT, log_time DATETIME NOT NULL, user_id INT, action VARCHAR(50), PRIMARY KEY (log_id, log_time) ) PARTITION BY RANGE (YEAR(log_time)*100 + MONTH(log_time)) ( PARTITION p202301 VALUES LESS THAN (202302), PARTITION p202302 VALUES LESS THAN (202303), PARTITION p202303 VALUES LESS THAN (202304), PARTITION pmax VALUES LESS THAN MAXVALUE );
2.3 View partition information
-- View the partition structure of the table SELECT partition_name, partition_expression, partition_description FROM information_schema.partitions WHERE table_name = 'sales'; -- View partition data volume SELECT partition_name, table_rows FROM information_schema.partitions WHERE table_name = 'sales';
3. Partition maintenance operation
3.1 Add a new partition
-- existMAXVALUEAdd a new partition before partition ALTER TABLE sales REORGANIZE PARTITION pmax INTO ( PARTITION p202304 VALUES LESS THAN (TO_DAYS('2023-05-01')), PARTITION pmax VALUES LESS THAN MAXVALUE ); -- Or a simpler way(MySQL 5.7+) ALTER TABLE sales ADD PARTITION ( PARTITION p202304 VALUES LESS THAN (TO_DAYS('2023-05-01')) );
3.2 Merge partitions
-- Merge adjacent partitions ALTER TABLE sales REORGANIZE PARTITION p202301, p202302 INTO ( PARTITION p2023_q1 VALUES LESS THAN (TO_DAYS('2023-04-01')) );
3.3 Rebuild the partition
-- Rebuild partition optimization storage ALTER TABLE sales REBUILD PARTITION p202301, p202302;
4. Delete expired partition data
4.1 Delete the partition directly
-- Delete a single partition ALTER TABLE sales DROP PARTITION p202201; -- Delete multiple partitions ALTER TABLE sales DROP PARTITION p202201, p202202, p202203;
4.2 Clear partition data
-- Clear partition data but preserve partition structure ALTER TABLE sales TRUNCATE PARTITION p202301;
4.3 Automatically delete scripts
DELIMITER // CREATE PROCEDURE clean_time_partitions( IN p_table_name VARCHAR(64), IN p_retain_months INT ) BEGIN DECLARE v_done INT DEFAULT FALSE; DECLARE v_part_name VARCHAR(64); DECLARE v_part_date DATE; DECLARE v_cutoff_date DATE; DECLARE v_cur CURSOR FOR SELECT partition_name, STR_TO_DATE(SUBSTRING(partition_name, 2), '%Y%m%d') FROM information_schema.partitions WHERE table_schema = DATABASE() AND table_name = p_table_name AND partition_name != 'pmax'; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done = TRUE; SET v_cutoff_date = DATE_SUB(CURRENT_DATE(), INTERVAL p_retain_months MONTH); OPEN v_cur; read_loop: LOOP FETCH v_cur INTO v_part_name, v_part_date; IF v_done THEN LEAVE read_loop; END IF; IF v_part_date < v_cutoff_date THEN SET @sql = CONCAT('ALTER TABLE ', p_table_name, ' DROP PARTITION ', v_part_name); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; SELECT CONCAT('Dropped partition: ', v_part_name) AS message; END IF; END LOOP; CLOSE v_cur; END // DELIMITER ;
4.4 Creating a timed event
-- Enable event scheduler SET GLOBAL event_scheduler = ON; -- Create monthly execution events CREATE EVENT event_clean_old_partitions ON SCHEDULE EVERY 1 MONTH STARTS '2023-05-01 02:00:00' DO BEGIN -- Keep the latest12Monthly data CALL clean_time_partitions('sales', 12); CALL clean_time_partitions('log_data', 12); END;
5. Advanced partition management skills
5.1 Partition and Index Optimization
-- Add a local index to the partition table ALTER TABLE sales ADD INDEX idx_product (product_id); -- Check partition index usage EXPLAIN PARTITIONS SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31';
5.2 Partition table backup strategy
-- Backup a specific partition separately mysqldump -u username -p dbname sales --where="TO_DAYS(sale_date) < TO_DAYS('2023-02-01')" > sales_partition_q1.sql -- Physical backup of partition files(needInnoDBFile per tablespace) cp /var/lib/mysql/dbname/sales#P# /backup/
5.3 Partition table monitoring
-- Monitor partition tablespace usage SELECT partition_name, table_rows, ROUND(data_length/(1024*1024),2) AS data_mb, ROUND(index_length/(1024*1024),2) AS index_mb FROM information_schema.partitions WHERE table_name = 'sales'; -- Monitor partition query hit rate SELECT * FROM sys.schema_table_statistics WHERE table_name = 'sales';
6. Frequently Asked Questions and Solutions
6.1 Partition selection failure problem
Problem phenomenon: The query does not use partition clipping correctly
Solution:
-- make sureWHEREConditions use partition keys EXPLAIN PARTITIONS SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'; -- Avoid using functions on partition keys -- Bad writing: WHERE YEAR(sale_date) = 2023 -- Good writing: WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
6.2 Partition number limit
Problem phenomenon: MySQL default limits the number of partitions to 8192
Solution:
-- Check the current number of partitions SELECT COUNT(*) FROM information_schema.partitions WHERE table_name = 'sales'; -- Merge historical partitions if necessary ALTER TABLE sales REORGANIZE PARTITION p202201, p202202, p202203 INTO ( PARTITION p2022_q1 VALUES LESS THAN (TO_DAYS('2022-04-01')) );
6.3 Cross-partition query performance
Problem phenomenon: Performance degradation when query spans multiple partitions
Solution:
-- Consider adjusting partition granularity(If changed from monthly to quarterly) ALTER TABLE sales PARTITION BY RANGE (QUARTER(sale_date)) ( PARTITION p2022_q1 VALUES LESS THAN (2), PARTITION p2022_q2 VALUES LESS THAN (3), -- ... ); -- Or add a summary table CREATE TABLE sales_monthly_summary ( month DATE PRIMARY KEY, total_amount DECIMAL(15,2), total_orders INT ); -- Refresh summary data regularly using events
7. Actual case analysis
7.1 Partitioning practice of e-commerce order system
Scenario: Daily order volume is 100,000+, hot data must be retained for 3 years, and earlier data must be archived
Solution:
-- Create quarterly partition table CREATE TABLE orders ( order_id BIGINT AUTO_INCREMENT, order_date DATETIME NOT NULL, customer_id INT, amount DECIMAL(12,2), PRIMARY KEY (order_id, order_date) ) PARTITION BY RANGE (QUARTER(order_date)) ( PARTITION p2022_q1 VALUES LESS THAN (2), PARTITION p2022_q2 VALUES LESS THAN (3), PARTITION p2022_q3 VALUES LESS THAN (4), PARTITION p2022_q4 VALUES LESS THAN (5), PARTITION pmax VALUES LESS THAN MAXVALUE ); -- Creating an archive process DELIMITER // CREATE PROCEDURE archive_old_orders(IN retain_years INT) BEGIN DECLARE cutoff_quarter INT; DECLARE cutoff_year INT; SET cutoff_year = YEAR(DATE_SUB(CURRENT_DATE(), INTERVAL retain_years YEAR)); SET cutoff_quarter = QUARTER(DATE_SUB(CURRENT_DATE(), INTERVAL retain_years YEAR)); -- Archive old data to historical tables INSERT INTO orders_archive SELECT * FROM orders PARTITION (p2022_q1, p2022_q2) WHERE YEAR(order_date) < cutoff_year OR (YEAR(order_date) = cutoff_year AND QUARTER(order_date) < cutoff_quarter); -- Delete archived partitions ALTER TABLE orders DROP PARTITION p2022_q1, p2022_q2; -- Add a new partition ALTER TABLE orders ADD PARTITION ( PARTITION p2023_q1 VALUES LESS THAN (2) ); END // DELIMITER ;
Conclusion
MySQL time partition tables are a powerful tool for managing large-scale time series data. By rationally designing partitioning strategies and automated maintenance scripts, query performance can be significantly improved, data maintenance work can be simplified, and storage costs can be reduced. The technologies and methods introduced in this article have been verified in multiple production environments. We hope that readers can flexibly apply them according to their own business characteristics and build an efficient data life cycle management system.
This is the end of this article about the creation and data cleaning of MySQL time partition tables. For more related contents of MySQL partition tables, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!