Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the mapping of sequence to log block index in transaction log iterator #12538

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HypenZou
Copy link

@HypenZou HypenZou commented Apr 15, 2024

Context/Summary:

The official documentation recommends using the GetUpdatesSince method for master-slave synchronization. Every time this method called, it will search for the start sequence from the beginning of the target WAL log. It may costs too much to seek to start sequence when wal file size is large. If high requirements are placed on master-slave delay, SeekToStartSequence() may be called very frequently and cause a large amount of read IO (although in most cases read page cache).
There are some issues discussing about it:
#12516
#889

In most scenarios, the start sequences of the GetUpdatesSince method are continuous. Caching the previously seek position can avoid most repeated readings of the wal file.

This commit will cache the mapping of sequence to WAL file block index. It does this both after seeking to the start sequence of the iterator and after iterating to the end of the iterator. When seeking to start sequence, the log reader will lookup the first batch sequence that is not larger than the target sequence and then skip wal file read pointer to the start of the block, instead of seeking from the start.

The optimization effect on the getupdatesSince interface is quite obvious
image

Test:
covered by ./db_log_iter_test

@HypenZou HypenZou changed the title Cache sequence to log block index in transaction log iterator Cache the mapping of sequence to log block index in transaction log iterator Apr 15, 2024
@HypenZou HypenZou force-pushed the get-update-since branch 2 times, most recently from f26e11d to de39db5 Compare April 16, 2024 05:40
@HypenZou
Copy link
Author

HypenZou commented May 7, 2024

@ajkr @jowlyzhang hi, could you please review my code when you have time? Any comments or suggestions are welcome ~

reusing the iterator may be a workaround, but it may be weird and undefined to use a iterator after it is invalid. Additionally, the comments for the Next() method state that it requires Valid() to be true.

@jowlyzhang jowlyzhang self-requested a review May 13, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants