MySQL realizes performance optimization of fast insertion of large data volumes

1. SQL statement optimization

1. Batch insert instead of single insert

Single inserts frequently trigger transaction commits and log writing, which is extremely inefficient.

Batch insertion reduces network transmission and SQL parsing overhead by merging multiple pieces of data into one SQL statement.

-- Inefficient writing：Insert one by one
INSERT INTO table (col1, col2) VALUES (1, 'a');
INSERT INTO table (col1, col2) VALUES (2, 'b');
 
-- Efficient writing method：Batch Insert
INSERT INTO table (col1, col2) VALUES 
(1, 'a'), (2, 'b'), (3, 'c'), ...;

It is recommended to insert data in a single time: control it to 500~2000 rows (avoid exceeding max_allowed_packet).

2. Disable Autocommit (Autocommit)

By default, each insert automatically commits transactions, resulting in frequent disk I/O.

Manually control transactions and merge multiple insert operations into one transaction commit:

START TRANSACTION;
INSERT INTO table ...;
INSERT INTO table ...;
...
COMMIT;

Note: Too large transactions may cause undo log to bloat, and the transaction batches need to be adjusted according to memory (such as committing once every 10,000 to 100,000 rows).

3.Use LOAD DATA INFILE

Importing data directly from a file is more than 20 times faster than INSERT, skipping SQL parsing and transaction overhead.

LOAD DATA LOCAL INFILE '/path/' 
INTO TABLE table
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n';

Applicable scenarios: Import data from CSV or text files.

4. Disable indexes and constraints

Disable indexes before insertion (especially unique indexes and full-text indexes), and rebuild after insertion is completed:

-- Disable indexing
ALTER TABLE table DISABLE KEYS;
-- Insert data...
-- Rebuild the index
ALTER TABLE table ENABLE KEYS;

Disable foreign key checking:

SET FOREIGN_KEY_CHECKS = 0;
-- Insert data...
SET FOREIGN_KEY_CHECKS = 1;

2. Parameter configuration optimization

1. InnoDB engine parameter adjustment

**innodb_flush_log_at_trx_commit**：

The default value is 1 (dish swipe every transaction commit), and changing to 0 or 2 reduces disk I/O.
0: Flash the disk every second (1 second of data may be lost).
2: Write to the OS cache when submitting, and do not force the disk to flush.

**innodb_buffer_pool_size**: Increase the buffer pool size (usually set to 70%~80% of physical memory) and improve the data cache hit rate.

**innodb_autoinc_lock_mode**: Set to 2 (cross mode), reduces self-increasing lock competition (MySQL 8.0+ is required).

2. Resize network and packages

**max_allowed_packet**: Increase the allowed packet size (default 4MB) to avoid batch insertion being truncated.

**bulk_insert_buffer_size**: Increase the size of the batch insert buffer (default 8MB).

3. Other parameters

**back_log**: Increase the length of the connection queue and deal with high concurrent insertions.

**innodb_doublewrite**: Turn off the double write mechanism (sacrificing data security for performance).

3. Storage engine selection

1. MyISAM engine

Advantages: The insertion speed is faster than InnoDB (no transactional and row-level lock overhead).

Disadvantages: It does not support transaction and crash recovery, suitable for read-only or allows data loss.

2. InnoDB Engine

Advantages: Supports transaction and row-level locks, suitable for high concurrent writes.

Optimization tips:

Use innodb_file_per_table to avoid tablespace fragmentation.
The primary key uses autoincrement integers (avoid page splits caused by random writes).

IV. Hardware and architecture optimization

1. Use SSD hard drive

Replace the mechanical hard disk with SSD to improve I/O throughput.

2. District and table

Split a single table into multiple subtables (such as by time or ID range) to reduce single table pressure.
Use middleware (such as ShardingSphere) or partition tables (PARTITION BY).

3. Read and write separation

The main library is responsible for writing, and the slave library is responsible for querying, reducing the pressure on the main library.

4. Asynchronous write

Write data to the message queue (such as Kafka), and then insert it into the database in batches by the consumer.

5. Code-level optimization

1. Multithreaded parallel insertion

Slice data and insert different shards concurrently through multi-threading.

Note: You need to ensure that there are no primary key conflicts between threads.

2. Prepared Statements

Reuse SQL templates to reduce parsing overhead:

// Java exampleString sql = "INSERT INTO table (col1, col2) VALUES (?, ?)";
PreparedStatement ps = (sql);
for (Data data : list) {
    (1, data.getCol1());
    (2, data.getCol2());
    ();
}
();

6. Performance comparison examples

Optimization method	Inserting 100,000 pieces takes time (seconds)
Insert one by one (default)	120
Batch insert (1000 rows/time)	5
LOAD DATA INFILE	1.5

Summary

Core ideas: reduce disk I/O, reduce lock competition, and merge operations.

Recommended steps:

Priority is given to LOAD DATA INFILE or batch insert.
Adjust transaction commit policy and InnoDB parameters.
Optimize table structure (disable non-essential indexes).
Select the storage engine based on the hardware and scenario.
At the architectural level, the library is divided into tables or asynchronous writes.

Through the above method, efficient insertion of tens of thousands or even hundreds of thousands of pieces per second can be achieved in MySQL.

This is the end of this article about MySQL's performance optimization of fast insertion of large data. For more related content on MySQL's large data insertion, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!