Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reconstruct slave sync thread model #2638

Merged

Conversation

cheniujh
Copy link
Collaborator

@cheniujh cheniujh commented May 7, 2024

这个PR做了哪些事

1 Slave端主从同步线程模型的重构fix #2637 ):

  • 1.1 将Slave端的WriteBinlogWorker和WriteDBWorker分开为两个vector存储,允许用户对write_binlog_worker数量与write_db_worker数量分开配置。具体地,配置项“sync-thread-num”将直接控制从节点消费binlog时用于WriteDB的worker数量(write_db_worker的数量),“sync-binlog-thread-num”决定了write_binlog_worker的数量。
  • 1.2 每个db建议都配一个write_binlog_worker,但也允许用户给出的write_binlog_worker数量小于db数量,此时db会直接取模决定自己使用哪个binlogWorker, 如果用户给出的write_binlog_worker数量大于DB数量,Pika会直接使用db_num作为write_binlog_worker的最终值
  • 1.3 所有DB共用同一个WriteDBWorker Pool来做WriteDB(依旧使用key做hash来选取worker)。

2 修复了主从超时重连场景下, 因为Slave连续发送两次TrySync Req而导致的Sync Win崩溃问题(fix #2655

  1. 直接原因:Slave在超时重连时,在短时间内连续发出了2次一模一样的TrySync请求(参数中携带的BinlogOfft都一样),Master端会对这2条TrySync请求做同样的处理(每次都会清空WriteQueue和SyncWin,然后从TrySync请求中携带的Binlog偏移量位置开始发送Binlog),在这时间附近的某些BInlog会被发送2次,Slave也会对这些Binlog消费2次,进而导致了Slave返回的BinlogACK被Master认为不合法。

  2. 为什么Slave会连续发出2次TrySync: Slave端消费Binlog的Worker线程的任务队列在发出第一次TrySync时依旧还有上一次主从连接期间积压的写Binlog任务(Slave端超时断联,转入TryConnect状态时,会去一条一条丢弃此时WorkerThread中积压的写Binlog任务,这里的问题是丢得太慢了或者说下一次TrySync发的太快了),当Slave收到第一条TrySync请求的响应,会进入Connected状态,于是会开始消费之前积压的,来自前一次主从连接的写Binlog任务,而这些BInlog的SeesionID对不上,就会触发错误处理分支,将Slave转到TrySync状态,所以slave紧接着发出了第二条TrySync请求。

3 修复了“某个Binlog任务阻塞Slave很久,导致超时重连后Master从错误的起始位置续传Binlog”的问题(fix #2659

4 合并本PR以后,Slave对TrySync Reps的处理改成了同步(相较于消费Binlog)处理,那么在某些极端情况下(Slave阻塞比较严重),主从建联可能会延迟,Slave将停留在WaitReply的时间会延长,此时master_link_status也为down,所以另提了 PR #2656 ,给运维增加了更细粒度的监控指标 repl_connect_status以便在master_link_status为down时进一步判断情况。

What does this PR do:

1. Refactoring of the thread model (fix #2637):

  • 1.1 The WriteBinlogWorker and WriteDBWorker on the Slave side are separated into two vectors, allowing users to configure the number of write_binlog_worker and write_db_worker separately. Specifically, the configuration item "sync-thread-num" will directly control the number of write_db_worker used for WriteDB when consuming binlog from the slave node, while "sync-binlog-thread-num" determines the number of write_binlog_worker.
  • 1.2 Each DB is recommended to have a write_binlog_worker, but it also allows the number of write_binlog_worker provided by the user to be less than the number of DBs. In this case, the DB will use modulus to decide which binlog it will use. If the number of write_binlog_worker provided by the user is greater than the number of DBs, Pika will use the db_num as the final value for write_binlog_worker.
  • 1.3 All DBs share the same WriteDBWorker Pool for WriteDB (still using key hashing to select the worker).

2. Fixed the issue of Sync Win crash caused by Slave sending two consecutive TrySync Req in the scenario of master-slave timeout reconnection (fix #2655):

  1. Direct cause: When the Slave times out and reconnects, it sends two identical TrySync requests in a short period (with the same BinlogOfft parameter). The Master will handle these two TrySync requests in the same way (each time clearing the WriteQueue and SyncWin, then sending Binlog from the offset position carried in the TrySync request). Some Binlogs near this time will be sent twice, and the Slave will consume these Binlogs twice, causing the BinlogACK returned by the Slave to be considered invalid by the Master.

  2. Why does the Slave send two consecutive TrySync: The task queue of the Slave's Binlog-consuming Worker thread still has the write Binlog tasks accumulated during the last master-slave connection when the first TrySync is sent. When the Slave times out and disconnects, entering the TryConnect state, it discards the accumulated write Binlog tasks one by one. The problem is that this process is too slow, or the next TrySync is sent too quickly. When the Slave receives the response to the first TrySync request, it enters the Connected state and starts consuming the previously accumulated write Binlog tasks. Since the SessionIDs of these Binlogs do not match, it triggers the error handling branch, sending the Slave back to the TrySync state, thus sending the second TrySync request.

3. Fixed the issue of "a certain Binlog task blocking the Slave for a long time, causing the Master to resume Binlog transmission from the wrong starting position after a timeout reconnection" (fix #2659):

4. After merging this PR, the handling of TrySync Reps by the Slave has been changed to synchronous (compared to consuming Binlog). Therefore, in some extreme cases (severe Slave blocking), the master-slave connection may be delayed, causing the Slave to stay in the WaitReply state for an extended period. During this time, the master_link_status will also be down. Therefore, PR #2656 has been proposed to add a more granular monitoring metric for operations: repl_connect_status.

1 each db has one exclusive thread to write binlog
2 every db share the same thread pool to write db
@github-actions github-actions bot added the ☢️ Bug Something isn't working label May 7, 2024
2 ensure TrySync resp is handled after binlog tasks
@cheniujh cheniujh requested review from wangshao1, chejinge and baixin01 and removed request for Mixficsol May 20, 2024 06:29
…ailure of this test case

2 revised some comments about write-binlog-worker-num in pika.conf
src/pika_conf.cc Outdated Show resolved Hide resolved
src/pika_conf.cc Outdated Show resolved Hide resolved
src/pika_repl_client.cc Outdated Show resolved Hide resolved
src/pika_repl_client.cc Outdated Show resolved Hide resolved
src/pika_repl_client.cc Outdated Show resolved Hide resolved
@cheniujh cheniujh requested a review from AlexStocks May 21, 2024 09:33
@AlexStocks AlexStocks merged commit 32c423a into OpenAtomFoundation:unstable May 22, 2024
14 checks passed
QlQlqiqi pushed a commit to QlQlqiqi/pika that referenced this pull request May 22, 2024
* reconstruct slave comsuming thread model, new model:
1 each db has one exclusive thread to write binlog
2 every db share the same thread pool to write db

* 1 make write_binlog_thread_num configurable
2 ensure TrySync resp is handled after binlog tasks

* 1 add extra 10s sleep in randomSpopstore test to avoid the sporadic failure of this test case
2 revised some comments about write-binlog-worker-num in pika.conf

* 1 use global constexpr to replace fixed num in terms of max_db_num
2 done some format work

---------

Co-authored-by: cjh <1271435567@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.5.5 4.0.0 ☢️ Bug Something isn't working
Projects
None yet
4 participants