Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: MVCC design on replica and replica stream isolation #2018

Merged
merged 20 commits into from May 16, 2024

Conversation

as51340
Copy link
Contributor

@as51340 as51340 commented May 9, 2024

Description

Jepsen found a few bugs in replication, this PR fixes two of them.

When the replica recovered completely, on the next txn replication, it expected replica_stream_ to be already destroyed. There MG_ASSERT failed. Now, the replica stream is created as a local variable for transactions.

Another bug is related to the MVCC of the replica. Because REPLICA’s transaction timestamps are contingent on MAIN’s timestamps, resulting in the transaction timestamp counter (used for obtaining the start transaction ID, and commit timestamp ID) staying the same when the READ transaction started, MVCC guarantees were broken. Any time the WRITE transaction would commit before the READ transaction would finish, GC could incorrectly unlink deltas as the WRITE transaction would mark its start timestamp as finished, not knowing that the READ transaction is still in the system. This would result in the replica’s READ query reading two different data snapshots during a single transaction.

[master < Task] PR

  • Provide the full content or a guide for the final git message
    • Fix: MVCC design on replica and replica stream isolation

CI Testing Labels

Please select the appropriate CI test labels (CI -build=build-name -test=test-suite)

Documentation checklist

  • Add the documentation label tag
  • Add the bug / feature label tag
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • [Release note text]
  • Link the documentation PR here
    • [Documentation PR link]
  • Tag someone from docs team in the comments

@as51340 as51340 added bug bug Docs - changelog only Docs - changelog only CI -build=jepsen -test=core Run jepsen build and core tests on push CI -build=release -test=e2e Run release build and e2e tests on push labels May 9, 2024
@as51340 as51340 added this to the mg-v2.17.0 milestone May 9, 2024
@as51340 as51340 requested a review from Ignition May 9, 2024 09:11
@as51340 as51340 changed the title Reset Reset ReplicaStream upon RPC failure May 9, 2024
@antoniofilipovic antoniofilipovic added CI -build=debug -test=core Run debug build and core tests on push CI -build=release -test=core Run release build and core tests on push labels May 10, 2024
@antoniofilipovic antoniofilipovic changed the title Reset ReplicaStream upon RPC failure Fix: MVCC design on replica and replica stream isolation May 14, 2024
@antoniofilipovic antoniofilipovic added the CI -build=coverage -test=core Run coverage build and core tests on push label May 15, 2024
Copy link
Contributor

@Ignition Ignition left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, otherwise very good

src/dbms/inmemory/replication_handlers.cpp Outdated Show resolved Hide resolved
src/storage/v2/replication/replication_client.cpp Outdated Show resolved Hide resolved
src/storage/v2/replication/replication_client.cpp Outdated Show resolved Hide resolved
src/storage/v2/replication/replication_storage_state.cpp Outdated Show resolved Hide resolved
src/storage/v2/replication/replication_storage_state.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@andrejtonev andrejtonev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

src/storage/v2/replication/replication_client.cpp Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented May 15, 2024

Quality Gate Passed Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.8% Duplication on New Code

See analysis details on SonarCloud

@antoniofilipovic antoniofilipovic removed CI -build=coverage -test=core Run coverage build and core tests on push CI -build=jepsen -test=core Run jepsen build and core tests on push CI -build=debug -test=core Run debug build and core tests on push CI -build=release -test=core Run release build and core tests on push CI -build=release -test=e2e Run release build and e2e tests on push labels May 16, 2024
@antoniofilipovic antoniofilipovic added this pull request to the merge queue May 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 16, 2024
@antoniofilipovic antoniofilipovic added this pull request to the merge queue May 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 16, 2024
@as51340 as51340 added this pull request to the merge queue May 16, 2024
Merged via the queue into master with commit 70994a9 May 16, 2024
11 checks passed
@as51340 as51340 deleted the fix-replica-stream-reset branch May 16, 2024 11:00
@kgolubic
Copy link
Contributor

@as51340 I will use this as RN:

The replication stream is now handled as a local variable for transactions, preventing assertion failures during recovery.

MarkoBarisic pushed a commit that referenced this pull request May 17, 2024
Co-authored-by: antoniofilipovic <filipovicantonio1998@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bug Docs - changelog only Docs - changelog only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants