Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum Reader - Include Primary to Meet Quorum when One Secondary Replica is Non-Responsive #4440

Open
kundadebdatta opened this issue Apr 18, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@kundadebdatta
Copy link
Member

kundadebdatta commented Apr 18, 2024

Problem:

With our Bounded Staleness consistency settings, where it normally reads data from two secondary replicas and chooses the most recent version. It then verifies consistency by sending "head requests" to all secondaries (and the primary if the replica set is less than 4).

Recently, during a deployment, one secondary became unavailable due to the update, and another crashed. This left us with a replica set of only three (two secondaries and the primary).

The system attempted to read from the two remaining secondaries, but one was unreachable due to the crash. This triggered a validation check, which normally would involve reading from the primary if no data was retrieved from the secondaries. However, in this case, the validation logic prevented reading from the primary because the replica set size was 3 (it expected at least 2 responses for a quorum).

This validation failure caused an exception and retries, but it didn't resolve the issue.

Proposed Solution:

To avoid this issue, we propose modifying the system's behavior when the replica set size is reduced and one secondary is unavailable. Instead of requiring a quorum from the remaining secondaries, we would include the primary in the selection process. This would allow the system to read from all available replicas and establish consistency. This change would ensure the system remains operational even during similar failures.

@kundadebdatta kundadebdatta added the bug Something isn't working label Apr 18, 2024
@kundadebdatta kundadebdatta self-assigned this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Approved
Development

No branches or pull requests

1 participant