Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes during multithreaded workload #35

Open
marvin-j97 opened this issue Jan 17, 2024 · 4 comments
Open

Crashes during multithreaded workload #35

marvin-j97 opened this issue Jan 17, 2024 · 4 comments

Comments

@marvin-j97
Copy link

On a multithreaded workload I regularly get a panic when retrieving an item, while writes may be happening (95% read, 5% insert). From what I can tell, it does not occur on single-threaded workloads, and happens more often when using many threads.

There are two kinds of errors I'm getting:

  • Panic in page_node::PageNode::index
  • get_bucket returns Error value BucketMissing out of the blue

System

  • Ubuntu 22.04 LTS
  • i9 11900K
  • 32 GB RAM
  • Samsung PM9A3 NVMe SSD

Reproduction?

Using https://github.com/marvin-j97/rust-storage-bench, run with:

RUST_BACKTRACE=full cargo run -r -- --out jammdb_test.jsonl --workload task-f --backend jamm-db --fsync --threads 16 --minutes 5 --key-size 8 --value-size 256 --items 100 --cache-size 5000000

May need to run multiple times, it's very non-deterministic.

Stack trace

Panic

thread '<unnamed>' panicked at jammdb-0.11.0/src/page_node.rs:69:22:
INVALID PAGE TYPE FOR INDEX: 4

stack backtrace:
   0:     0x563f6b08a69c - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x563f6b08a69c - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x563f6b08a69c - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x563f6b08a69c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x563f6b0b44e0 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:     0x563f6b0b44e0 - core::fmt::write::h3a39390d8560d9c9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:     0x563f6b087ccf - std::io::Write::write_fmt::h5fc9997dfe05f882
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:     0x563f6b08a484 - std::sys_common::backtrace::_print::h894006fb5c6f3d45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x563f6b08a484 - std::sys_common::backtrace::print::h23a2d212c6fff936
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x563f6b08bc37 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
  10:     0x563f6b08b99f - std::panicking::default_hook::h6038f2eba384e475
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
  11:     0x563f6b08c0b8 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:779:13
  12:     0x563f6b08bf9e - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  13:     0x563f6b08ab66 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x563f6b08bd02 - rust_begin_unwind
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
  15:     0x563f6a169d15 - core::panicking::panic_fmt::hbf0e066aabfa482c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
  16:     0x563f6a32080d - jammdb::page_node::PageNode::index::hb7e48e18bd0cd9e3
  17:     0x563f6a31ec48 - jammdb::cursor::search::he06cf7d0bde89993
  18:     0x563f6a1ef32f - jammdb::bucket::Bucket::get::h648e73f8fb9f722d
  19:     0x563f6a2018d3 - worker::db::DatabaseWrapper::get::h286a7b65683b822d

Not a panic, but a non-deterministic Err value (Bucket is definitely not missing, considering millions of reads before it did not fail):

thread '<unnamed>' panicked at src/worker/db.rs:196:52:
called `Result::unwrap()` on an `Err` value: BucketMissing
@pjtatlow
Copy link
Owner

Hey @marvin-j97 do you have a copy of the database file that is giving you this error? That would help a lot in figuring out what went wrong.

@marvin-j97
Copy link
Author

marvin-j97 commented Jan 19, 2024

@pjtatlow

I ran again to get the error again, the file ended up being a bit too large for GH, so I uploaded it to my S3:

https://jammdb-debug.s3.eu-central-1.amazonaws.com/data.db

@tgolsson
Copy link

I'm seeing this as well. The issue for me seems to be in a file that is loaded, as I can trigger this with 0 writes -- I'll run one session that works fine for reads + writes (essentially, only misses and fills the DB). If I then comment out any write code, the application still crashes.

@pjtatlow
Copy link
Owner

@tgolsson thats super interesting... what's happening is the database file is being written incorrectly, but it's difficult to tell where it's going wrong.

Any chance you have a simple but reproducible test case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants