Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run domains in a threadpool #1218

Open
gvsg-rs opened this issue Apr 11, 2024 · 1 comment
Open

Run domains in a threadpool #1218

gvsg-rs opened this issue Apr 11, 2024 · 1 comment
Assignees

Comments

@gvsg-rs
Copy link
Contributor

gvsg-rs commented Apr 11, 2024

Previously, we used tokio::task::block_in_place to run domains, which blocks the thread it's currently running on until the task completes. This prevents the executor from using that thread to make progress on any other tasks, which is not efficient. CL-1268 removed the block_in_place strategy by spawning a native OS thread for each domain instead, which is how the system works today. This is highly resource-inefficient and has also led to us hitting the upper limits on the number of threads with active tracing spans (4096).

A quote from that CL:

The thing is spawn_blocking spawns a *blocking* task on a *blocking thread*, whereas our Replica is actually asynchronous, so it would not work at all. Moreover the blocking tasks run in a thread pool that has a limited size, and we don't know a-priory how high to set it. It defaults to 512, but there is no reason for us not to have more domains, and once we run out, spawning stops.
Spawn blocking performs even worse than block_in_place BTW.

Generally speaking, blocking I/O bound work is very well-suited to tokio's built-in blocking threadpool. Further, it is now possible to configure the size of the blocking threadpool using the max_blocking_threads method on the runtime builder. We should re-investigate the performance of tokio's spawn_blocking method in the context of domains.

For work that is CPU-bound, we should consider using the rayon crate, which is typically the go-to tool for spawning blocking CPU-bound tasks.

@altmannmarcelo
Copy link
Contributor

If we run against a setup that has a high number of tables, we can exceed the limit of threads and Readyset will panic:

Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: Thread count overflowed the configured max count. Thread index = 4097, max threads = 4096.
Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: thread 'Domain 1493.0.0' panicked at readyset-dataflow/src/domain/mod.rs:732:10:
Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(6925), ...)

A workaround is to limit the number of tables either via:

  • Limit the number of databases in the --upstream-db-url

  • Limit the number of tables via --replication-tables or --replication-tables-ignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants