Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce sqlite tests flakiness #8651

Merged
merged 24 commits into from Apr 29, 2024
Merged

reduce sqlite tests flakiness #8651

merged 24 commits into from Apr 29, 2024

Conversation

abyesilyurt
Copy link
Contributor

Description

There was a race condition in initialising backing stores. Using cached property reduces flakiness.

Affected Dependencies

List any dependencies that are required for this change.

How has this been tested?

  • Describe the tests that you ran to verify your changes.
  • Provide instructions so we can reproduce.
  • List any relevant details for your test configuration.

Checklist

@yashgorana
Copy link
Contributor

Won't cached property introduce a bunch of mem leaks? because we run a new ephemeral node for function call and these valuesa are cached in the python lib level.

@yashgorana
Copy link
Contributor

What's interesting is that WAL model should have fixed this race - but seems like we haven't fixed the whole thing. @koenvanderveen any idea how we can fix this?

@koenvanderveen
Copy link
Collaborator

Won't cached property introduce a bunch of mem leaks? because we run a new ephemeral node for function call and these valuesa are cached in the python lib level.

I think it will just be garbage collected with the object (node) itself. Am I misunderstanding something?

@abyesilyurt
Copy link
Contributor Author

abyesilyurt commented Apr 23, 2024

Thanks for the comments @yashgorana . We went through the flaky test with Koen and realised that the problem actually lies in 3 places.

  1. In some of the tests, we use a fixture called create_queue_cbk which is used to construct QueueStash object. We use that function within threads during testing with the same store config! That means the threads are racing to init the same sqlite database under the hood, which results in database locked or database busy errors. The fix is to make the calls to create_queue_cbk thread-safe using threading.Lock.
  2. We were not checking the result of KeyValueStorePartition.init_store() during construction. We raise an exception if something goes wrong during ``KeyValueStorePartition.init()` .
  3. There is still inherent race conditions within SQLite. For example, in the snippet below, there is a chance of race condition in the last 2 lines. There is a chance that thread-2 might come in and make the assignment, while we are about to execute line self.unique_keys[pk_key] = {} in thread-1 self.unique_keys[pk_key] = {} . I didn’t make any fixes to this problem since that would require much more work. It could be part of the tech debt cycle.
# taken from KeyValueStorePartition.init_store
for partition_key in self.unique_cks:
    pk_key = partition_key.key
    if pk_key not in self.unique_keys:
        self.unique_keys[pk_key] = {}

@abyesilyurt abyesilyurt merged commit b1c7bae into dev Apr 29, 2024
35 checks passed
@abyesilyurt abyesilyurt deleted the aziz/sqlitest branch April 29, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants