-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Crash in Bitpacking/Columnar when Merging Segments #2384
Comments
Thanks for the bug report. The first stacktrace from The second stacktrace does make sense, but looking at the code and I don't see how that could happen. Can you run it with a modified version of tantivy? I could push some changes to a branch, to get more context information, when the panic occurs. Is the GH repo public? |
The code you point at is NOT using SIMD. SIMD bitpacking uses a different layout that prevents efficient random access, and we need random access for columns. Like @PSeitz, I have no clue where it could come from. There are two calls to The second one happens on code that is rather straightforward. bit_width is obtained via:
As far as I can tell, all number exiting this function, and their max, should match the assert. If you can share the segments and the schema triggering the assert, this would be even more helpful. Also if you have some kind of compiler cache in your CI? can you try and clean it or disable it and see if your problem gets solved? |
Thanks for getting back. Sorry that the first stack trace is not more helpful. That's just the atos output for a couple of memory addresses from the crash log to see whether bitpacking showed up somewhere. It all doesn't make sense to me either, so I tried comparing the build environment and the different CPU features was the only one I could spot so far. (But, as far as I understand, SIMD is not involved and even if there were compile time differences, it wouldn't explain the different runtime behavior because of feature detection.) Now I tried running a CI build again with the old config and got an interesting new permutation of the crash this time (right after running the binary on the command line): Here's the build config: rust-toolchain.toml
Cargo.toml
A build with toolchain 1.78.0 and Re CI and caching: There should be a fresh Github runner for each run and we don't do any custom caching in the action. For the toolchain setup we've been using this action, then set up cargo make via this action, run cargo clean, and then build via:
I'll check again the Github actions to see if there could be any effects. |
Describe the bug
The following crash happens "randomly" in
merge_thread_0
. (Tried to symbolicate the relevant stack trace addresses from the crash log viaatos
):Some more context:
causing this backtrace:
This all makes me think that it must have something to do with the conditional compilation in the bitpacking crate. For instance, the CI machine (but not the local machine) additionally support the SS3E instruction set. However, this shouldn't matter since at runtime the CPU feature detection should select the right implementation? So it doesn't really explain why the CI build would behave differently, yet the stack trace points to this area. I wonder what I could be missing.
Some questions:
Which version of tantivy are you using?
0.22.0 stable (but also happened on master between 0.21 and 0.22)
To Reproduce
Unfortunately, there's no clear reproduction; the bug seems to happen intermittently at some point when adding a large number of documents to the index and segments are being merged (judging by stack trace for the crashed merge_thread_0).
The text was updated successfully, but these errors were encountered: