Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index not updated correctly when removing nodes from placement #4196

Open
BertHartm opened this issue Mar 6, 2023 · 3 comments
Open

index not updated correctly when removing nodes from placement #4196

BertHartm opened this issue Mar 6, 2023 · 3 comments

Comments

@BertHartm
Copy link
Contributor

When scaling down a cluster, we're noticing that some data becomes unavailable for query. It appears in the form of partial results when querying for older data (from before the scale down).

We're also noticing that database_tick_index_num_docs remains flat for each node through the scale down, and then jumps up once the node is restarted. The effect if summing across all nodes the cluster is that the metric drops (when the old node is removed), and recovers to prior level when the remaining nodes restart.

General Issues

What service is experiencing the issue? (M3Coordinator, M3DB, M3Aggregator, etc)

m3db

What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).

can provide if required, but I think this might be general
RF=3

How are you using the service? For example, are you performing read/writes to the service via Prometheus, or are you using a custom script?

issue relates to reads happening via remote read

Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

It appears to be consistent when removing nodes from placements. It's more obvious when the clusters are small as more of the index is affected.

@robskillington
Copy link
Collaborator

Hey @BertHartm - there’s been some fixes and tests added to cover this category of bugs. As far as we know there are not outstanding bugs in this space, so perhaps I can take our tests and run it against the version you’re running.

Is this 1.3 or 1.5? The exact SHA would be helpful as we investigate this. Thanks for reporting!

@robskillington
Copy link
Collaborator

I believe 1.5 is the version in question, we’ll test whether this recent patch (post 1.5 release) fixes what you’ve observed:
#4193

@BertHartm
Copy link
Contributor Author

sorry, yes, this is 1.5.0 as released. dbnode Sha is e7df2b9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants