生产环境一条简单的delete语句,导致slave hang住,复制延迟不断增加!

这个问题很怪,整个系统CPU,内存,磁盘IO都没有超载,就是一个SQL线程运行,超级慢!
查看事务状态复制SQL线程一直处于fetching rows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> select * from information_schema.innodb_trx\G
*************************** 1. row ***************************
trx_id: 9284975407
trx_state: RUNNING
trx_started: 2019-03-21 17:28:11
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 96586
trx_mysql_thread_id: 2
trx_query: NULL
trx_operation_state: fetching rows
trx_tables_in_use: 1
trx_tables_locked: 3
trx_lock_structs: 95702
trx_lock_memory_bytes: 8528080
trx_rows_locked: 3773056
trx_rows_modified: 4615
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0

查看Innodb 状态变量发现一个异常:Innodb_rows_read达170多亿,远超表中数量。这使我猛然醒悟,这个删除语句共删除4万多条数据,怎么会有那么多的Innodb_rows_read? 表中肯定缺少主键,导致每条数据的删除都要造全表找。查看表结构果然没有主键!

Every replication event against a table without a primary or unique key whose value is accessible through the SQL interface has the potential to require a full table scan for each row replicated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> show global status like 'INNODB_ROWS_DELETED';
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| Innodb_rows_deleted | 4615 |
+---------------------+-------+
1 row in set (0.00 sec)

mysql> show global status like 'INNODB_ROWS_READ';
+------------------+-------------+
| Variable_name | Value |
+------------------+-------------+
| Innodb_rows_read | 17395762976 |
+------------------+-------------+
1 row in set (0.00 sec)

那么为什么这个语句在主库执行没有这么慢呢?

因为复制采用RBR模式,一条delete语句在主库执行后,可能会产生大量binlog event,这与delete语句的影响行数相关。这些binlog传到从库后,在从库上是按照binlog event个数逐一执行的,这就造成了主库执行一条delete语句,从库需要执行4万(针对本例)多条语句。由于没有主键,每一条语句都要全表扫描,导致从库非常慢!在主库上执行show master status看到的gtid_executed是不变的,而且slave的binlog也不写入,因为这是一个大事务,只有在执行完成后commit ,才会记入binlog.

解决办法

  1. 配置skip-slave-start,重启slave
  2. 登陆slave,关掉binlog(set sql_log_bin=0),给相关表添加主键
  3. start slave
  4. 复制完成好后,drop 掉所添加的主键
  5. 最后在主库上为没有主键的表添加主键,这样主从表结构严格一致

参考:https://www.fromdual.com/node/1313