Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake unit test DBCompactionTest.CompactionLimiter #12596

Closed
wants to merge 1 commit into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Apr 29, 2024

The test has been flaky for a long time. It may be because threadpool's queue_len_ is accessed with relaxed memory order.

Test plan: I could not repro the failure locally: gtest-parallel --repeat=8000 --workers=100 ./db_compaction_test --gtest_filter="*CompactionLimiter*"

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from ajkr April 29, 2024 20:01
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how TEST_WaitForFlushMemTable() returns before compaction is scheduled. It looks like InstallSuperVersionAndScheduleWork() does both its things in one lock hold. Do you know how it happens?

@cbi42
Copy link
Member Author

cbi42 commented May 2, 2024

I don't know how TEST_WaitForFlushMemTable() returns before compaction is scheduled. It looks like InstallSuperVersionAndScheduleWork() does both its things in one lock hold. Do you know how it happens?

I was looking at the call to MaybeScheduleFlushOrCompaction() in BackgroundCallFlush() here. But I missed that InstallSuperVersionAndScheduleWork() is called right after flush result is installed here. You are right it looks like compaction should have been scheduled in InstallSuperVersionAndScheduleWork(). I'll look more into this.

@cbi42
Copy link
Member Author

cbi42 commented May 2, 2024

Updated PR summary, it may be due to threadpool's queue_len_ is accessed with relaxed memory order.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in e2ef349.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants