Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard placement table persistence #18283

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented May 7, 2024

Add persistence to shard_placement_table. Now all shard placement updates are persisted in kvstore and restored at startup. Also implement migrating from old topic table-determined placement and update the stress test with migrations, restarts and post-restart checks.

At this point shard assignments are still ultimately determined by topic_table, but we'll already be able to better track ongoing x-shard transfers after the restart. Also, everything is set for transitioning to node-local placement decisions.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@ztlpn
Copy link
Contributor Author

ztlpn commented May 7, 2024

/ci-repeat

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 7, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f536f-edc0-402f-9e52-2ca50e32f1a1:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_test.with_restart=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f536f-edc7-45f1-a473-165a98b13ce2:

"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_test.with_restart=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f536f-edc2-43ef-9139-a303bdda8783:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_delete_test.TopicDeleteStressTest.stress_test"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f536f-edc5-40f5-ad6c-624808f78f3c:

"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_orphan_files_test"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f5377-280d-4c80-bb7a-d2bb9e5ddb04:

"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=True.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_unavailable_test.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_test.with_restart=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f5377-2807-4819-b098-183bc81b69b5:

"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=False.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_installed_snapshots_test"
"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_orphan_files_test"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f5377-2804-42a2-9dde-2ffdbf6632a8:

"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=True.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_delete_test.TopicDeleteStressTest.stress_test"

new failures in https://buildkite.com/redpanda/redpanda/builds/48784#018f5377-280a-4e25-9bbd-469cdfc10350:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_cloud_storage_test.disable_delete=False.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_unavailable_test.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_delete_test.TopicDeleteTest.topic_delete_test.with_restart=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/48797#018f5531-11b1-411c-b5f0-fa728febde49:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/48797#018f5531-11b4-4a59-a548-e8a22eabd741:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48797#018f5538-91ba-481f-a721-ad4d2a0382d9:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48797#018f5538-91b8-463c-b80f-c0ab8e17b0a1:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

@ztlpn ztlpn force-pushed the flex-assignment-persistence branch from 5341afa to 88610d9 Compare May 7, 2024 20:59
@ztlpn
Copy link
Contributor Author

ztlpn commented May 7, 2024

/ci-repeat

@ztlpn ztlpn force-pushed the flex-assignment-persistence branch 2 times, most recently from 5dcf117 to ce4a1d8 Compare May 13, 2024 15:11
@ztlpn ztlpn marked this pull request as ready for review May 13, 2024 15:54
@ztlpn ztlpn force-pushed the flex-assignment-persistence branch from ce4a1d8 to 611e589 Compare May 14, 2024 13:17
@ztlpn ztlpn force-pushed the flex-assignment-persistence branch from 611e589 to 9479497 Compare May 17, 2024 11:41
@ztlpn ztlpn requested a review from mmaslankaprv May 17, 2024 11:43
ztlpn added 12 commits May 17, 2024 13:56
We track group ids in partition assignments and current state because we
want to use them as kvstore keys - group id is just a number so it is
better in this regard than the ntp and it uniquely identifies the
current incarnation of the ntp (i.e. each group id is uniquely
mapped to ntp, but ntp can be mapped to several group ids, though only
one will be current).
Make different components log with different prefixes in test to more
easily disambiguate their messages.
@ztlpn ztlpn force-pushed the flex-assignment-persistence branch 2 times, most recently from e1c8fa2 to faa76fd Compare May 17, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants