commitlog recovery time #129

tylertreat · 2018-06-05T19:57:46Z

The recovery time of a commitlog is currently linear with the number of entries in the log. As a result, a log with many messages (e.g. millions) can take several seconds or more to recover. The reason is the index is rebuilt by reading back the log:

jocko/commitlog/segment.go

Lines 60 to 65 in 1c69c5c

    
           // SetupIndex creates and initializes an Index. 
        
           // Initialization is: 
        
           // - Sanity check of the loaded Index 
        
           // - Truncates the Index (clears it) 
        
           // - Reads the log file from the beginning and re-initializes the Index 
        
           func (s *Segment) SetupIndex() (err error) {

It's unclear to me what the purpose of truncating the index and rebuilding it is. AFAICT, it just initializes the segment's nextOffset and position. However, the index sanity check is already reading the last entry which we can recover nextOffset from. Is there another purpose for rebuilding the index?

The text was updated successfully, but these errors were encountered:

tylertreat · 2018-06-06T01:18:48Z

So it appears one issue is this does not actually seem to do what it says:

jocko/commitlog/index.go

Lines 200 to 208 in 1c69c5c

    
           //read last entry 
        
           entry := new(Entry) 
        
           if err := idx.ReadEntryAtFileOffset(entry, idx.position-entryWidth); err != nil { 
        
           	return err 
        
           } 
        
           if entry.Offset < idx.baseOffset { 
        
           	return ErrIndexCorrupt 
        
           } 
        
           return nil

The issue is that the memory-mapped index file is 10MB, so reading at idx.position-entryWidth actually reads an empty entry (since relEntry adds the base offset, the offset comparison passes). It seems this precludes the optimization I was thinking about. I wonder if we could just do a binary search to find the real last entry though...

travisjeffery · 2018-06-10T09:33:34Z

Yeah we shouldn't have to rebuild every time. Initially, I didn't and then someone hit a bug where the index wasn't setup properly and submitted a PR doing this to hack a fix and I left it to optimize later.

I can't quite remember what Kafka does here but binary searching for the last entry sounds good to me.

tylertreat · 2018-06-10T20:21:24Z

I have it working with a binary search. Will submit a PR shortly.

travisjeffery · 2018-06-11T10:45:50Z

Awesome - you da man.

tylertreat · 2018-06-11T21:12:47Z

@travisjeffery One snag I've run into: I'm using a fork of the commitlog package which I've modified index entries to also contain a Size field which is the size of the log message that I'm using for another project. This allowed me to find the first empty entry doing the following:

i := sort.Search(n, func(i int) bool {
	if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
		panic(err)
	}
	return entry.Position == 0 && entry.Size == 0
})

Of course, Jocko doesn't have the Size field. We could do something like the following:

i := sort.Search(n, func(i int) bool {
	if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
		panic(err)
	}
	return entry.Position == 0 && i > 0
})

However, the problem is handling the special case where the index has a single entry. We have no way of distinguishing between an empty index and an index with one entry in it since in both cases the first entry will have a Position of 0. In my case, the Size field is always non-zero for valid entries and 0 for "empty" entries, but we don't have that luxury here.

Any ideas?

travisjeffery · 2018-06-11T22:00:42Z

I think in NewIndex we can check if the index file exists and if it does assume the index is legit and that the length of the file marks the position of the next entry. And we can tell the number of entries based on this, cause if there is no entry the file will be 0 bytes, with 1 entry it'll be 8 bytes.

We gotta change the code in here: https://github.com/travisjeffery/jocko/blob/master/commitlog/index.go#L86-L89. Cause right now the index file itself is initialized to the full size, when we should leave index file as is and only initialize the mmap's length by passing the length into the MapRegion or MapAt apis.

travisjeffery · 2018-06-11T22:34:28Z

Not sure if the gommap lib which is pretty jank will support that though, which is what Kafka is able to do I think. Alternatively when the index is created we could write -1 as the first offset and overwrite that on the first entry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commitlog recovery time #129

commitlog recovery time #129

tylertreat commented Jun 5, 2018

tylertreat commented Jun 6, 2018 •

edited

travisjeffery commented Jun 10, 2018

tylertreat commented Jun 10, 2018

travisjeffery commented Jun 11, 2018

tylertreat commented Jun 11, 2018

travisjeffery commented Jun 11, 2018 •

edited

travisjeffery commented Jun 11, 2018

commitlog recovery time #129

commitlog recovery time #129

Comments

tylertreat commented Jun 5, 2018

tylertreat commented Jun 6, 2018 • edited

travisjeffery commented Jun 10, 2018

tylertreat commented Jun 10, 2018

travisjeffery commented Jun 11, 2018

tylertreat commented Jun 11, 2018

travisjeffery commented Jun 11, 2018 • edited

travisjeffery commented Jun 11, 2018

tylertreat commented Jun 6, 2018 •

edited

travisjeffery commented Jun 11, 2018 •

edited