Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make index rebuilds slightly more efficient by writing prepares from the same stream in a batch (DB-369) #3942

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shaan1337
Copy link
Member

@shaan1337 shaan1337 commented Aug 30, 2023

Fixed: Make index rebuilds slightly more efficient by writing prepares from the same stream in a batch

Flags.HasAnyOf() bug

During index rebuilds at startup, this snippet of code had a bug but it turns out that it did not have any noticeable negative impact:

case LogRecordType.Prepare: {
        var prepare = (IPrepareLogRecord<TStreamId>)result.LogRecord;
        if (prepare.Flags.HasAnyOf(PrepareFlags.IsCommitted)) {
            if (prepare.Flags.HasAnyOf(PrepareFlags.SingleWrite)) {
                Commit(commitedPrepares, false, false);
                commitedPrepares.Clear();
                Commit(new[] { prepare }, result.Eof, false);
            } else {
                if (prepare.Flags.HasAnyOf(PrepareFlags.Data | PrepareFlags.StreamDelete))
                    commitedPrepares.Add(prepare);
                if (prepare.Flags.HasAnyOf(PrepareFlags.TransactionEnd)) {
                    Commit(commitedPrepares, result.Eof, false);
                    commitedPrepares.Clear();
                }
            }
        }

        break;
    }

The bug is that prepare.Flags.HasAnyOf(PrepareFlags.SingleWrite) should have been prepare.Flags.HasAllOf(PrepareFlags.SingleWrite) as SingleWrite is a composite flag: SingleWrite = Data | TransactionBegin | TransactionEnd

HasAnyOf returns true if any of the Data, TransactionBegin or TransactionEnd flags are present on the prepare log record.

Due to the bug, the else part in the if-else condition was never executed as described below:

  • there are 3 cases to consider when IsCommitted flag is set: a normal event write, an empty write and a stream delete (tombstone)
  • a normal event write has the Data flag, so the if block is entered
  • an empty event write always has the TransactionBegin/TransactionEnd flags set, so the if block is entered
  • a stream delete (tombstone) also always has the TransactionBegin/TransactionEnd flags set (together with the StreamDelete flag), so the if block is entered

Impact

The net effect is that:

This PR

The current code works fine but it is misleading due to the above bug. This PR rewrites this part to remove any ambiguity. It also takes the opportunity to add the events from the same stream in a batch to the index (it's not a very significant optimization but it's still better than committing to the index one at a time)

There should be no change in behaviour, except that:

  • we now commit prepares of the same stream in batches of size up to 256
  • Commit() is not called for empty writes (they are ignored in the Commit() method anyway)

Storage chaser initialization

Note that the index committer rebuilds the index from the index checkpoint up to the chaser checkpoint at startup - not the writer checkpoint. The remaining index rebuild (from the chaser to the writer checkpoint) is done by the storage chaser at startup. So if the chaser = writer at startup, then we can be sure that all events will be added to the index. but if chaser < writer and there is a torn transaction at the end of the log at startup, it will not be added by the storage chaser to the index (this can have unintended consequences such as duplicate event numbers). This is no longer a problem due to the 3 PRs (#3808, #3918 and #3896) that prevent torn transactions in the log. Note that there can still be a torn transaction at startup in the log when upgrading to the version with these fixes but the problem will no longer occur as from the next startup.

@shaan1337 shaan1337 changed the title Make index rebuilds more efficient by writing prepares from the same stream in a batch Make index rebuilds slightly more efficient by writing prepares from the same stream in a batch Sep 8, 2023
@shaan1337 shaan1337 force-pushed the efficient-index-rebuild branch 2 times, most recently from 3302f54 to 436c0de Compare September 8, 2023 11:01
@shaan1337 shaan1337 marked this pull request as ready for review September 8, 2023 11:40
@shaan1337 shaan1337 marked this pull request as draft September 8, 2023 12:52
@shaan1337 shaan1337 marked this pull request as ready for review September 11, 2023 09:41
@shaan1337 shaan1337 changed the title Make index rebuilds slightly more efficient by writing prepares from the same stream in a batch Make index rebuilds slightly more efficient by writing prepares from the same stream in a batch (DB-369) Sep 11, 2023
@linear
Copy link

linear bot commented Sep 11, 2023

@timothycoleman timothycoleman self-requested a review January 8, 2024 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant