Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove stream managers when decommissioning partitions #4214

Merged
merged 1 commit into from
May 13, 2024

Conversation

dominiklohmann
Copy link
Member

We noticed that in some deployments, the CPU usage of the tenzir.index thread would rise over time. Most time was spent in std::_Rb_tree_increment—which is an internal of std::map or std::set. But we didn't use that anywhere in the index actor, or so we thought: it turned out that CAF itself uses a std::map for its stream managers, creating a new one for every active partition. When decommissioning an active partition into a passive partition, that entry in the map was not removed, making further interactions with the map increasingly expensive.

This change makes it so we manually erase stale stream managers when decommissioning an active partition. This really should be considered a bug in CAF, but since upstream has replaced the streaming functionality entirely it doesn't make much sense to report or fix it there.

@dominiklohmann dominiklohmann added bug Incorrect behavior engine Core pipeline and storage engine labels May 13, 2024
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOLO-approving this one.

Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the outbound stream slot that we receive when we add the path for an active partition is internally created by emplacing into the same stream manager map.

libtenzir/src/index.cpp Outdated Show resolved Hide resolved
@tobim tobim disabled auto-merge May 13, 2024 10:11
We noticed that in some deployments, the CPU usage of the `tenzir.index`
thread would rise over time. Most time was spent in
`std::_Rb_tree_increment`—which is an internal of `std::map` or
`std::set`. But we didn't use that anywhere in the index actor, or so we
thought: it turned out that CAF itself uses a `std::map` for its stream
managers, creating a new one for every active partition. When
decommissioning an active partition into a passive partition, that entry
in the map was not removed, making further interactions with the map
increasingly expensive.

This change makes it so we manually erase stale stream managers when
decommissioning an active partition. This really should be considered a
bug in CAF, but since upstream has replaced the streaming functionality
entirely it doesn't make much sense to report or fix it there.
@tobim tobim force-pushed the topic/stale-strema-managers branch from 622d5d0 to 7a5efd9 Compare May 13, 2024 12:33
@tobim tobim merged commit f695bde into main May 13, 2024
52 checks passed
@tobim tobim deleted the topic/stale-strema-managers branch May 13, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior engine Core pipeline and storage engine
Projects
None yet
3 participants