SoFunction
Updated on 2025-05-04

A comprehensive solution to the problem of MySQL master-slave synchronization delay

1. In-depth analysis of the causes of synchronization delay

1.1 Review of the principle of master-slave replication

MySQL master-slave replication process:

Main libraryBinlog → Main libraryDumpThread → From the libraryIOThread → From the libraryRelay Log → From the librarySQLThread → From the library数据

1.2 Key links of delay generation

Section Possible bottleneck Typical performance
Main library Binlog generation Big business, DDL operation Main library CPU/IO high
Network transmission Cross-computer room synchronization, insufficient bandwidth Network monitoring indicator abnormality
From library IO thread Poor disk IO performance Relay Log Stacking
From library SQL thread Single-threaded playback, lock conflict Seconds_Behind_Master Continuous Growth

2. Real-time monitoring and diagnosis plan

2.1 Key monitoring indicators

-- View Delay from Library(Second)
SHOW SLAVE STATUS\G
-- focus on:
-- Seconds_Behind_Master 
-- Slave_SQL_Running_State

-- Check thread status
SHOW PROCESSLIST;

-- CheckBinlogLocation
SHOW MASTER STATUS;
SHOW SLAVE STATUS\G

2.2 Performance diagnostic tools

pt-heartbeat(Percona toolkit)

# Main library installation heartbeatpt-heartbeat --user=monitor --password=xxx --host=master \
             --create-table --database=test --interval=1 --update

# Detect delay from the librarypt-heartbeat --user=monitor --password=xxx --host=slave \
             --database=test --monitor --master-server-id=1
  1. Prometheus+Granfa monitoring system

    • Collection indicators:mysql_slave_status_sql_delay
    • Alarm threshold: >30 seconds trigger warning

3. System architecture optimization solution

3.1 Copy topology optimization

Solution comparison

Topology Type advantage shortcoming Applicable scenarios
Traditional masters and slaves Simple and reliable Single point delay Small and medium-sized scale
Cascade copy Reduce pressure on the main warehouse Delay accumulation Read more and write less
Multi-source replication Multi-main library summary Complex configuration Data aggregation
GTID Copy Easy to failover High version requirements High availability environment

Configuration example (GTID mode)

#Configuration[mysqld]
server-id = 2
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON

3.2 Optimization of read and write separation strategy

Intelligent routing solution

// Spring Boot + HikariCP implements latency-aware routingpublic class DelayAwareRoutingDataSource extends AbstractRoutingDataSource {

    private long maxAcceptableDelay = 1000; // 1 second    
    @Override
    protected Object determineCurrentLookupKey() {
        if(isWriteOperation()) {
            return "master";
        }
        
        // Get delay from library        long delay = getSlaveDelay();
        
        return delay <= maxAcceptableDelay ? "slave" : "master";
    }
    
    private long getSlaveDelay() {
        // Get real-time delay from monitoring system        return ();
    }
}

4. Parameter tuning plan

4.1 Key parameters of main library

# Control Binlog generationsync_binlog = 1              # Each transaction is submitted to swipe the diskbinlog_group_commit_sync_delay = 0 
binlog_group_commit_sync_no_delay_count = 0

# Big transaction processingbinlog_cache_size = 4M
max_binlog_size = 512M
binlog_rows_query_log_events = ON  # Record the complete SQL

4.2 Key parameters from the library

# Parallel replication configuration (MySQL 5.7+)slave_parallel_workers = 8      # 50-75% of CPU coresslave_parallel_type = LOGICAL_CLOCK
slave_preserve_commit_order = 1 # Ensure transaction order
# Network and IO optimizationslave_net_timeout = 60          # Network timeout (seconds)slave_compressed_protocol = 1   # Enable compressionslave_pending_jobs_size_max = 2G # Memory queue size
# Hardware relatedinnodb_flush_log_at_trx_commit = 2  # From the library can be relaxedsync_relay_log = 10000           # Regularly brush the disk

V. Advanced solutions

5.1 Semi-synchronous replication

Configuration method

-- Main library installation plug-in
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

-- Configuration parameters
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 10000; # 10 seconds timeout
-- Configuration from library
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;

Effect

  • The master library transaction has at least one slave library before it returns to success
  • Balancing performance and data security

5.2 MGR(MySQL Group Replication)

Architectural Advantages

  • Multiple master write
  • Automatic fault detection
  • Strong data consistency

Deployment steps

#Configuration[mysqld]
plugin_load_add = 'group_replication.so'
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = "node1:33061"
loose-group_replication_group_seeds = "node1:33061,node2:33061,node3:33061"
loose-group_replication_bootstrap_group = off

6. Business layer solution

6.1 Read and write separation strategy

Scenario adaptation solution

Business Type Reading policy Implementation method
Financial transactions Main library reading @Transactional(readOnly=false)
Product Browsing Read from the library @Transactional(readOnly=true)
User Comments Delay tolerance After writing, jump to the main library to read
Report statistics Dedicated slave library Specify the data source route

6.2 Cache Compensation Policy

public class CacheAspect {
    
    @AfterReturning("@annotation(cacheUpdate)")
    public void afterUpdate(JoinPoint jp) {
        // 1. Update the cache immediately after updating the main library        updateCache();
        
        // 2. Start the delay task check slave library        (() -> {
            if(checkSlaveSync()) {
                refreshCacheFromSlave();
            }
        }, 1, );
    }
    
    private boolean checkSlaveSync() {
        // Check whether the master and slave positions are consistent        return ();
    }
}

7. Emergency treatment plan

7.1 Delay burst processing process

Positioning bottlenecks

# Check the status of the thread from the librarySHOW PROCESSLIST;

# View the currently executed SQLSELECT * FROM performance_schema.events_statements_current 
WHERE thread_id = (SELECT THREAD_ID FROM performance_schema.threads 
                  WHERE PROCESSLIST_ID = <SQLThreadID>);

Temporary solution

  • Skip the error (use with caution):
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
  • Rebuild Copy:
mysqldump --master-data=2 --single-transaction -uroot -p dbname > 

7.2 Master-slave switch decision tree

Does delay affect business?
├─ yes → yesno有紧急修复方案?
│   ├─ yes → Implement repair(If you skip transaction)
│   └─ no → Trigger failover
└─ no → Monitoring and Observation + Record events

8. Preventive maintenance strategies

  1. Regular checklist

    • Master-slave network delay (<1ms)
    • Load from the library server (CPU < 70%)
    • Disk IOPS margin (>30%)
    • Copy thread status (Running)
  2. Stress testing scheme

# Use sysbench to generate loadsysbench --db-driver=mysql --mysql-host=master \
         --mysql-user=test --mysql-password=test \
         /usr/share/sysbench/oltp_write_only.lua \
         --tables=10 --table-size=1000000 prepare

# Monitor delay changeswatch -n 1 "mysql -e 'SHOW SLAVE STATUS\G' | grep Seconds_Behind"
  • Architecture evolution path
Master-slave copy → Semi-synchronous copy → MGR → Distributed database(likeTiDB)

Through the above multi-level solutions, the appropriate master-slave synchronization delay processing strategy can be selected according to the specific business scenario and technology stack. It is recommended to start with monitoring, first locate the bottleneck points, then implement targeted optimization measures, and at the same time establish a complete emergency plan.

The above is the detailed content of the comprehensive solution to the MySQL master-slave synchronization delay problem. For more information about the MySQL master-slave synchronization delay problem, please pay attention to my other related articles!