Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad estimation for temporary sstables (containing GCed data only) causes worst case of 2x space amplification for filters #18283

Closed
Tracked by #17972
raphaelsc opened this issue Apr 17, 2024 · 1 comment
Assignees
Labels
Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0
Milestone

Comments

@raphaelsc
Copy link
Member

We're incorrectly feeding same estimation for regular output sstables in compaction as for temporary sstables containing GCed-data only. That's a way of saying that the output can twice as much keys as input, which is clearly wrong. The output is segregated into either regular output or temporary.

The consequence is that during compaction filters can take twice as much ram space.

What we should do is that regular output will have input_key_estimation * estimated_live_data_ratio. And temporary will have input_key_estimation * (1 - estimated_live_data_ratio).

@raphaelsc raphaelsc added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0 labels Apr 17, 2024
@raphaelsc
Copy link
Member Author

/cc @denesb

lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue Apr 29, 2024
…sstables

When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for both the
regular output sstables and the temporary sstables. Instead split the
partition estimate between the two based on the droppable tombstone
ratio estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue Apr 29, 2024
…sstables

When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for both the
regular output sstables and the temporary sstables. Instead split the
partition estimate between the two based on the droppable tombstone
ratio estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 3, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 3, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 6, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
@mykaul mykaul added this to the 6.0 milestone May 6, 2024
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 8, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 9, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 9, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
mergify bot pushed a commit that referenced this issue May 11, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 36b98d8)

# Conflicts:
#	compaction/compaction.cc
mergify bot pushed a commit that referenced this issue May 11, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 36b98d8)
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 13, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb#18465

(cherry picked from commit d39adf6)
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue May 13, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes scylladb#18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb#18465

(cherry picked from commit d39adf6)
denesb pushed a commit that referenced this issue May 14, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes #18465

(cherry picked from commit d39adf6)

Closes #18656
denesb pushed a commit that referenced this issue May 14, 2024
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes #18465

(cherry picked from commit d39adf6)

Closes #18659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0
Projects
None yet
5 participants