Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segment/findEntry: do not consider offset 0 as an invalid read value #142

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jbonachera
Copy link

When looking for an entry in a segment based on its offset, findEntry seems to consider that the entry does not exist if the read offset is 0.

However, as the entry used in the sort.Search test is reused across all search iterations (it's a pointer), the entry offset may be greater than 0 even if the found entry does not exist.

I think we should not rely on entry's offset value if ReadEntryAtFileOffset returned an error.

(I might be of course wrong, so i'm open to suggestions or explanations.)

Details

Here's an example of the scenario addressed by this PR (I've implemented it in a test):

We have 12 messages in the log, and we are looking for all messages from the offset n° 11. We should get 1 message (offset 11)

sort.Search will start by looking for the first value in the index that do not satisfy the test, starting with n, then dividing it by two until the test returns false; so we have (I got this output by adding some fmt.Printf in the code) :

checking offset 655360:   -> test returned true                                                                                              
checking offset 327680:   -> test returned true
checking offset 163840:   -> test returned true
checking offset 81920:    -> test returned true
checking offset 40960:    -> test returned true
checking offset 20480:    -> test returned true
checking offset 10240:    -> test returned true
checking offset 5120:     -> test returned true
checking offset 2560:     -> test returned true
checking offset 1280:     -> test returned true
checking offset 640:      -> test returned true
checking offset 320:      -> test returned true
checking offset 160:      -> test returned true
checking offset 80:       -> test returned true
checking offset 40:       -> test returned true
checking offset 20:       -> test returned true
checking offset 10:       -> test returned false

sort.Search found that offset 10 is the first offset not satisfing the test.
We now known that the wanted value (which is the first value to satisfy the test) is in the interval [10, 20) (because 20 satisfied the test).
sort.Search will now try to find this value, by reducing this interval:

checking offset 15: -> test returned false
checking offset 18: -> test returned false
checking offset 19: -> test returned false

Because the Entry inside the sort.Search had been filled with entry from offset 10, and not re-initialized since, it does not passed the test used by sort.Search.
sort.Search thus returns offset 20 as result, instead of offset 11.

The last ReadEntryAtFileOffset then silently fails, and the returned offset is 10.

@codecov
Copy link

codecov bot commented Jul 9, 2018

Codecov Report

Merging #142 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #142      +/-   ##
==========================================
+ Coverage    41.4%   41.44%   +0.04%     
==========================================
  Files          84       84              
  Lines        5936     5935       -1     
==========================================
+ Hits         2458     2460       +2     
+ Misses       3022     3020       -2     
+ Partials      456      455       -1
Impacted Files Coverage Δ
commitlog/segment.go 68.38% <100%> (-0.21%) ⬇️
jocko/broker.go 53.62% <0%> (ø) ⬆️
commitlog/reader.go 87.87% <0%> (+9.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3497f59...7b5be6a. Read the comment docs.

@travisjeffery
Copy link
Owner

@jbonachera thanks for this. I was also thinking about just initializing the segment with -1 to know for sure if the segment is empty or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants