1. In-depth analysis of the causes of synchronization delay
1.1 Review of the principle of master-slave replication
MySQL master-slave replication process:
Main libraryBinlog → Main libraryDumpThread → From the libraryIOThread → From the libraryRelay Log → From the librarySQLThread → From the library数据
1.2 Key links of delay generation
Section | Possible bottleneck | Typical performance |
---|---|---|
Main library Binlog generation | Big business, DDL operation | Main library CPU/IO high |
Network transmission | Cross-computer room synchronization, insufficient bandwidth | Network monitoring indicator abnormality |
From library IO thread | Poor disk IO performance | Relay Log Stacking |
From library SQL thread | Single-threaded playback, lock conflict | Seconds_Behind_Master Continuous Growth |
2. Real-time monitoring and diagnosis plan
2.1 Key monitoring indicators
-- View Delay from Library(Second) SHOW SLAVE STATUS\G -- focus on: -- Seconds_Behind_Master -- Slave_SQL_Running_State -- Check thread status SHOW PROCESSLIST; -- CheckBinlogLocation SHOW MASTER STATUS; SHOW SLAVE STATUS\G
2.2 Performance diagnostic tools
pt-heartbeat(Percona toolkit)
# Main library installation heartbeatpt-heartbeat --user=monitor --password=xxx --host=master \ --create-table --database=test --interval=1 --update # Detect delay from the librarypt-heartbeat --user=monitor --password=xxx --host=slave \ --database=test --monitor --master-server-id=1
-
Prometheus+Granfa monitoring system
- Collection indicators:
mysql_slave_status_sql_delay
- Alarm threshold: >30 seconds trigger warning
- Collection indicators:
3. System architecture optimization solution
3.1 Copy topology optimization
Solution comparison:
Topology Type | advantage | shortcoming | Applicable scenarios |
---|---|---|---|
Traditional masters and slaves | Simple and reliable | Single point delay | Small and medium-sized scale |
Cascade copy | Reduce pressure on the main warehouse | Delay accumulation | Read more and write less |
Multi-source replication | Multi-main library summary | Complex configuration | Data aggregation |
GTID Copy | Easy to failover | High version requirements | High availability environment |
Configuration example (GTID mode):
#Configuration[mysqld] server-id = 2 log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL gtid_mode = ON enforce_gtid_consistency = ON log_slave_updates = ON
3.2 Optimization of read and write separation strategy
Intelligent routing solution:
// Spring Boot + HikariCP implements latency-aware routingpublic class DelayAwareRoutingDataSource extends AbstractRoutingDataSource { private long maxAcceptableDelay = 1000; // 1 second @Override protected Object determineCurrentLookupKey() { if(isWriteOperation()) { return "master"; } // Get delay from library long delay = getSlaveDelay(); return delay <= maxAcceptableDelay ? "slave" : "master"; } private long getSlaveDelay() { // Get real-time delay from monitoring system return (); } }
4. Parameter tuning plan
4.1 Key parameters of main library
# Control Binlog generationsync_binlog = 1 # Each transaction is submitted to swipe the diskbinlog_group_commit_sync_delay = 0 binlog_group_commit_sync_no_delay_count = 0 # Big transaction processingbinlog_cache_size = 4M max_binlog_size = 512M binlog_rows_query_log_events = ON # Record the complete SQL
4.2 Key parameters from the library
# Parallel replication configuration (MySQL 5.7+)slave_parallel_workers = 8 # 50-75% of CPU coresslave_parallel_type = LOGICAL_CLOCK slave_preserve_commit_order = 1 # Ensure transaction order # Network and IO optimizationslave_net_timeout = 60 # Network timeout (seconds)slave_compressed_protocol = 1 # Enable compressionslave_pending_jobs_size_max = 2G # Memory queue size # Hardware relatedinnodb_flush_log_at_trx_commit = 2 # From the library can be relaxedsync_relay_log = 10000 # Regularly brush the disk
V. Advanced solutions
5.1 Semi-synchronous replication
Configuration method:
-- Main library installation plug-in INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; -- Configuration parameters SET GLOBAL rpl_semi_sync_master_enabled = 1; SET GLOBAL rpl_semi_sync_master_timeout = 10000; # 10 seconds timeout -- Configuration from library INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Effect:
- The master library transaction has at least one slave library before it returns to success
- Balancing performance and data security
5.2 MGR(MySQL Group Replication)
Architectural Advantages:
- Multiple master write
- Automatic fault detection
- Strong data consistency
Deployment steps:
#Configuration[mysqld] plugin_load_add = 'group_replication.so' transaction_write_set_extraction = XXHASH64 loose-group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" loose-group_replication_start_on_boot = off loose-group_replication_local_address = "node1:33061" loose-group_replication_group_seeds = "node1:33061,node2:33061,node3:33061" loose-group_replication_bootstrap_group = off
6. Business layer solution
6.1 Read and write separation strategy
Scenario adaptation solution:
Business Type | Reading policy | Implementation method |
---|---|---|
Financial transactions | Main library reading | @Transactional(readOnly=false) |
Product Browsing | Read from the library | @Transactional(readOnly=true) |
User Comments | Delay tolerance | After writing, jump to the main library to read |
Report statistics | Dedicated slave library | Specify the data source route |
6.2 Cache Compensation Policy
public class CacheAspect { @AfterReturning("@annotation(cacheUpdate)") public void afterUpdate(JoinPoint jp) { // 1. Update the cache immediately after updating the main library updateCache(); // 2. Start the delay task check slave library (() -> { if(checkSlaveSync()) { refreshCacheFromSlave(); } }, 1, ); } private boolean checkSlaveSync() { // Check whether the master and slave positions are consistent return (); } }
7. Emergency treatment plan
7.1 Delay burst processing process
Positioning bottlenecks:
# Check the status of the thread from the librarySHOW PROCESSLIST; # View the currently executed SQLSELECT * FROM performance_schema.events_statements_current WHERE thread_id = (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = <SQLThreadID>);
Temporary solution:
- Skip the error (use with caution):
STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;
- Rebuild Copy:
mysqldump --master-data=2 --single-transaction -uroot -p dbname >
7.2 Master-slave switch decision tree
Does delay affect business? ├─ yes → yesno有紧急修复方案? │ ├─ yes → Implement repair(If you skip transaction) │ └─ no → Trigger failover └─ no → Monitoring and Observation + Record events
8. Preventive maintenance strategies
-
Regular checklist:
- Master-slave network delay (<1ms)
- Load from the library server (CPU < 70%)
- Disk IOPS margin (>30%)
- Copy thread status (Running)
Stress testing scheme:
# Use sysbench to generate loadsysbench --db-driver=mysql --mysql-host=master \ --mysql-user=test --mysql-password=test \ /usr/share/sysbench/oltp_write_only.lua \ --tables=10 --table-size=1000000 prepare # Monitor delay changeswatch -n 1 "mysql -e 'SHOW SLAVE STATUS\G' | grep Seconds_Behind"
- Architecture evolution path:
Master-slave copy → Semi-synchronous copy → MGR → Distributed database(likeTiDB)
Through the above multi-level solutions, the appropriate master-slave synchronization delay processing strategy can be selected according to the specific business scenario and technology stack. It is recommended to start with monitoring, first locate the bottleneck points, then implement targeted optimization measures, and at the same time establish a complete emergency plan.
The above is the detailed content of the comprehensive solution to the MySQL master-slave synchronization delay problem. For more information about the MySQL master-slave synchronization delay problem, please pay attention to my other related articles!