Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate data races during concurrent chunk inserts #2454

Open
mweisgut opened this issue Apr 25, 2022 · 3 comments
Open

Investigate data races during concurrent chunk inserts #2454

mweisgut opened this issue Apr 25, 2022 · 3 comments

Comments

@mweisgut
Copy link
Collaborator

While working on #2453, I added a stress test, which concurrently appends chunks to a table. Executing the test with the thread sanitizer enabled results in data races.
The data races occurred with both the tbb::zero_allocator and the ZeroAllocator, so the issue was not introduced in #2453.

Stress test:

TEST_F(StressTest, ChunkInserts) {
auto column_definitions =
TableColumnDefinitions({{"column_1", DataType::Int, false}, {"column_2", DataType::String, true}});
auto table = std::make_shared<Table>(column_definitions, TableType::Data, ChunkOffset{2});
const auto iterations_per_thread = 1000;
const auto run = [&]() {
for (auto iteration = 0; iteration < iterations_per_thread; ++iteration) {
auto vs_int = std::make_shared<ValueSegment<int>>();
auto vs_str = std::make_shared<ValueSegment<pmr_string>>();
vs_int->append(5);
vs_str->append("five");
table->append_chunk({vs_int, vs_str});
}
};
// Create the async objects and spawn them asynchronously (i.e., as their own threads).
const auto thread_count = 50u;
std::vector<std::future<void>> thread_futures;
thread_futures.reserve(thread_count);
for (auto thread_num = 0u; thread_num < thread_count; ++thread_num) {
// We want a future to the thread running, so we can kill it after a future.wait(timeout) or the test would freeze.
thread_futures.emplace_back(std::async(std::launch::async, run));
}
// Wait for completion or timeout (should not occur).
for (auto& thread_future : thread_futures) {
// We give this a lot of time, not because we usually need that long for 100 threads to finish, but because
// sanitizers and other tools like valgrind sometimes bring a high overhead.
if (thread_future.wait_for(std::chrono::seconds(600)) == std::future_status::timeout) {
ASSERT_TRUE(false) << "At least one thread got stuck and did not commit.";
}
// Retrieve the future so that exceptions stored in its state are thrown.
thread_future.get();
}
const auto chunk_count = table->chunk_count();
EXPECT_EQ(chunk_count, iterations_per_thread * thread_count);
for (auto chunk_id = ChunkID{0}; chunk_id < chunk_count; ++chunk_id) {
const auto chunk = table->get_chunk(chunk_id);
EXPECT_NE(chunk, nullptr);
EXPECT_EQ(chunk->size(), 1);
}
}

@julianmenzler
Copy link
Member

We have the CI stage clangRelWithDebInfoThreadSanitizer, which runs hyriseSystemTest, and thus StressTest.*.

Why does/did the CI not bring up this issue?

@mweisgut
Copy link
Collaborator Author

mweisgut commented Apr 29, 2022

We have the CI stage clangRelWithDebInfoThreadSanitizer, which runs hyriseSystemTest, and thus StressTest.*.

Why does/did the CI not bring up this issue?

Because the referenced stress test is not implemented in the master branch.
If it's implemented, the CI catches it:
https://hyrise-ci.epic-hpi.de/blue/organizations/jenkins/hyrise%2Fhyrise/detail/PR-2453/5/pipeline

@julianmenzler
Copy link
Member

True. I did not read thoroughly enough, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants