A comprehensive solution to the problem of MySQL master-slave synchronization delay

1. In-depth analysis of the causes of synchronization delay

1.1 Review of the principle of master-slave replication

MySQL master-slave replication process:

Main libraryBinlog → Main libraryDumpThread → From the libraryIOThread → From the libraryRelay Log → From the librarySQLThread → From the library数据

1.2 Key links of delay generation

Section	Possible bottleneck	Typical performance
Main library Binlog generation	Big business, DDL operation	Main library CPU/IO high
Network transmission	Cross-computer room synchronization, insufficient bandwidth	Network monitoring indicator abnormality
From library IO thread	Poor disk IO performance	Relay Log Stacking
From library SQL thread	Single-threaded playback, lock conflict	Seconds_Behind_Master Continuous Growth

2. Real-time monitoring and diagnosis plan

2.1 Key monitoring indicators

-- View Delay from Library(Second)
SHOW SLAVE STATUS\G
-- focus on:
-- Seconds_Behind_Master 
-- Slave_SQL_Running_State

-- Check thread status
SHOW PROCESSLIST;

-- CheckBinlogLocation
SHOW MASTER STATUS;
SHOW SLAVE STATUS\G

2.2 Performance diagnostic tools

pt-heartbeat(Percona toolkit)

# Main library installation heartbeatpt-heartbeat --user=monitor --password=xxx --host=master \
             --create-table --database=test --interval=1 --update

# Detect delay from the librarypt-heartbeat --user=monitor --password=xxx --host=slave \
             --database=test --monitor --master-server-id=1

Prometheus+Granfa monitoring system
- Collection indicators:mysql_slave_status_sql_delay
- Alarm threshold: >30 seconds trigger warning

3. System architecture optimization solution

3.1 Copy topology optimization

Solution comparison：

Topology Type	advantage	shortcoming	Applicable scenarios
Traditional masters and slaves	Simple and reliable	Single point delay	Small and medium-sized scale
Cascade copy	Reduce pressure on the main warehouse	Delay accumulation	Read more and write less
Multi-source replication	Multi-main library summary	Complex configuration	Data aggregation
GTID Copy	Easy to failover	High version requirements	High availability environment

Configuration example (GTID mode)：

#Configuration[mysqld]
server-id = 2
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON

3.2 Optimization of read and write separation strategy

Intelligent routing solution：

// Spring Boot + HikariCP implements latency-aware routingpublic class DelayAwareRoutingDataSource extends AbstractRoutingDataSource {

    private long maxAcceptableDelay = 1000; // 1 second    
    @Override
    protected Object determineCurrentLookupKey() {
        if(isWriteOperation()) {
            return "master";
        }
        
        // Get delay from library        long delay = getSlaveDelay();
        
        return delay &lt;= maxAcceptableDelay ? "slave" : "master";
    }
    
    private long getSlaveDelay() {
        // Get real-time delay from monitoring system        return ();
    }
}

4. Parameter tuning plan

4.1 Key parameters of main library

# Control Binlog generationsync_binlog = 1              # Each transaction is submitted to swipe the diskbinlog_group_commit_sync_delay = 0 
binlog_group_commit_sync_no_delay_count = 0

# Big transaction processingbinlog_cache_size = 4M
max_binlog_size = 512M
binlog_rows_query_log_events = ON  # Record the complete SQL

4.2 Key parameters from the library

# Parallel replication configuration (MySQL 5.7+)slave_parallel_workers = 8      # 50-75% of CPU coresslave_parallel_type = LOGICAL_CLOCK
slave_preserve_commit_order = 1 # Ensure transaction order
# Network and IO optimizationslave_net_timeout = 60          # Network timeout (seconds)slave_compressed_protocol = 1   # Enable compressionslave_pending_jobs_size_max = 2G # Memory queue size
# Hardware relatedinnodb_flush_log_at_trx_commit = 2  # From the library can be relaxedsync_relay_log = 10000           # Regularly brush the disk

V. Advanced solutions

5.1 Semi-synchronous replication

Configuration method：

-- Main library installation plug-in
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

-- Configuration parameters
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 10000; # 10 seconds timeout
-- Configuration from library
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;

Effect：

The master library transaction has at least one slave library before it returns to success
Balancing performance and data security

5.2 MGR(MySQL Group Replication)

Architectural Advantages：

Multiple master write
Automatic fault detection
Strong data consistency

Deployment steps：

#Configuration[mysqld]
plugin_load_add = 'group_replication.so'
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = "node1:33061"
loose-group_replication_group_seeds = "node1:33061,node2:33061,node3:33061"
loose-group_replication_bootstrap_group = off

6. Business layer solution

6.1 Read and write separation strategy

Scenario adaptation solution：

Business Type	Reading policy	Implementation method
Financial transactions	Main library reading	@Transactional(readOnly=false)
Product Browsing	Read from the library	@Transactional(readOnly=true)
User Comments	Delay tolerance	After writing, jump to the main library to read
Report statistics	Dedicated slave library	Specify the data source route

6.2 Cache Compensation Policy

public class CacheAspect {
    
    @AfterReturning("@annotation(cacheUpdate)")
    public void afterUpdate(JoinPoint jp) {
        // 1. Update the cache immediately after updating the main library        updateCache();
        
        // 2. Start the delay task check slave library        (() -&gt; {
            if(checkSlaveSync()) {
                refreshCacheFromSlave();
            }
        }, 1, );
    }
    
    private boolean checkSlaveSync() {
        // Check whether the master and slave positions are consistent        return ();
    }
}

7. Emergency treatment plan

7.1 Delay burst processing process

Positioning bottlenecks：

# Check the status of the thread from the librarySHOW PROCESSLIST;

# View the currently executed SQLSELECT * FROM performance_schema.events_statements_current 
WHERE thread_id = (SELECT THREAD_ID FROM performance_schema.threads 
                  WHERE PROCESSLIST_ID = &lt;SQLThreadID&gt;);

Temporary solution：

Skip the error (use with caution):

STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

Rebuild Copy:

mysqldump --master-data=2 --single-transaction -uroot -p dbname >

7.2 Master-slave switch decision tree

Does delay affect business？
├─ yes → yesno有紧急修复方案？
│   ├─ yes → Implement repair(If you skip transaction)
│   └─ no → Trigger failover
└─ no → Monitoring and Observation + Record events

8. Preventive maintenance strategies

Regular checklist：
- Master-slave network delay (<1ms)
- Load from the library server (CPU < 70%)
- Disk IOPS margin (>30%)
- Copy thread status (Running)
Stress testing scheme：

# Use sysbench to generate loadsysbench --db-driver=mysql --mysql-host=master \
         --mysql-user=test --mysql-password=test \
         /usr/share/sysbench/oltp_write_only.lua \
         --tables=10 --table-size=1000000 prepare

# Monitor delay changeswatch -n 1 "mysql -e 'SHOW SLAVE STATUS\G' | grep Seconds_Behind"

Architecture evolution path：

Master-slave copy → Semi-synchronous copy → MGR → Distributed database(likeTiDB)

Through the above multi-level solutions, the appropriate master-slave synchronization delay processing strategy can be selected according to the specific business scenario and technology stack. It is recommended to start with monitoring, first locate the bottleneck points, then implement targeted optimization measures, and at the same time establish a complete emergency plan.

The above is the detailed content of the comprehensive solution to the MySQL master-slave synchronization delay problem. For more information about the MySQL master-slave synchronization delay problem, please pay attention to my other related articles!