Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RADOS: Generalize stretch mode pg temp handling to be usable without stretch mode #56233

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Mar 15, 2024

ATTENTION:

THIS PR SHOULD BE TESTED IN CONJUNCTION WITH #57381
Recommend merging #57381 before this PR.

In the command ceph osd pool stretch set

<pool> <peering_crush_bucket_count>
<peering_crush_bucket_target> <peering_crush_bucket_barrier> <crush_rule> <size> <min_size>

user has the option of setting the value of peering_crush_bucket_{count|target|barrier}.
This will then allow the utilization calc_replicated_acting_stretch,
since with peering_crush_bucket_count != 0
the pool is now a stretch_pool and we can handle pg_temp
better by setting barriers and limits to how much OSDs
should be in a pg_temp.

This will enable the specify pool to
handle pg_temp properly during create_acting, as a stretch pool
should.

Users can also use the command:
osd pool stretch show <pool>

to show all the stretch related information for the pool

pool: cephfs.a.data
pool_id: 3
is_stretch_pool: 1
peering_crush_bucket_count: 3
peering_crush_bucket_target: 3
peering_crush_bucket_barrier: 8
crush_rule: replicated_rule_custom
size: 3
min_size: 2

Users can also unset the stretch pool with the command:
osd pool stretch unset <pool>

However, the pool must be a stretch pool.

Fixes: https://tracker.ceph.com/issues/64802

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@kamoltat kamoltat requested a review from a team as a code owner March 15, 2024 22:12
@kamoltat kamoltat changed the title [WIP] src/mon/OSDMonitor: Added peering_bucket_count & peering_bucket_barrier in prepare_new_pool [WIP] RADOS: Generalize stretch mode pg temp handling to be usable without stretch mode Mar 15, 2024
@kamoltat kamoltat self-assigned this Mar 20, 2024
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 6 times, most recently from 3b504ac to 1b24e98 Compare March 25, 2024 13:53
@github-actions github-actions bot added the tests label Mar 25, 2024
@kamoltat kamoltat requested a review from a team as a code owner March 26, 2024 17:22
@kamoltat
Copy link
Member Author

kamoltat commented Apr 2, 2024

@kamoltat kamoltat changed the title [WIP] RADOS: Generalize stretch mode pg temp handling to be usable without stretch mode RADOS: Generalize stretch mode pg temp handling to be usable without stretch mode Apr 2, 2024
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 6 times, most recently from 069bd14 to ea789dd Compare April 5, 2024 20:32
@kamoltat
Copy link
Member Author

kamoltat commented May 9, 2024

@gregsfortytwo thank you for your review, I have addressed every comment you've made with the new commit

@kamoltat
Copy link
Member Author

jenkins test api

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 3 times, most recently from 35ab017 to dd19429 Compare May 15, 2024 18:52
@kamoltat
Copy link
Member Author

kamoltat commented May 15, 2024

@anthonyeleven was wondering if you have some time to take a look at the docs for this

@kamoltat
Copy link
Member Author

jenkins test api

@kamoltat
Copy link
Member Author

I suggest merging #57381 before this PR, since the test for this PR is dependent on #57381

@kamoltat
Copy link
Member Author

jenkins test make check

@kamoltat
Copy link
Member Author

jenkins test api

@kamoltat
Copy link
Member Author

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 2 times, most recently from a872fbf to f8316e1 Compare May 28, 2024 15:40
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 4 times, most recently from dd83bd0 to dcba1fd Compare May 31, 2024 14:21
In the command `ceph osd pool stretch set`

<pool> <peering_crush_bucket_count>
<peering_crush_bucket_target> <peering_crush_bucket_barrier>
<crush_rule> <size> <min_size>

user has the option of setting the value of `peering_crush_bucket_{count|target|barrier}`.
This will then allow the utilization `calc_replicated_acting_stretch`,
since with `peering_crush_bucket_count != 0`
the pool is now a stretch_pool and we can handle pg_temp
better by settubg barriers and limits to how much OSDs
should be in a pg_temp.

This will enable the specify pool to
handle pg_temp properly during create_acting, as a stretch pool
should.

User can also use the command:
`osd pool stretch show <pool> `

to show all the stretch related information for the pool

pool: cephfs.a.data
pool_id: 3
is_stretch_pool: 1
peering_crush_bucket_count: 3
peering_crush_bucket_target: 3
peering_crush_bucket_barrier: 8
crush_rule: replicated_rule_custom
size: 3
min_size: 2

User can also unset the stretch pool wiith the commnad:
`osd pool stretch unset <pool>`
However, the pool must be a stretch pool.

Fixes: https://tracker.ceph.com/issues/64802

Signed-off-by: Kamoltat <ksirivad@redhat.com>
Test the following new Ceph CLI commands:

`ceph osd pool stretch set`
`ceph osd pool stretch unset`
`ceph osd pool stretch show`

`qa/workunits/mon/mon-stretch-pool.sh`

will create the stretch cluster
while performing input validation for the CLI
Commands mentioned above.

`qa/tasks/stretch_cluster.py`

is in charge of
setting a pool to stretch cluster
and checks whether it prevents PGs
from the going active when there is not
enough buckets available in the acting
set of PGs to go active.

Also, test different MON fail over scenarios
after setting pool as stretch

`qa/suites/rados/singleton/all/mon-stretch-pool.yaml`

brings the scripts together.

Fixes: https://tracker.ceph.com/issues/64802

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-64802 branch 4 times, most recently from 4d36186 to 525aecb Compare June 2, 2024 21:44
Test the case where 2 DC loses connection with each other
for a 3 AZ stretch cluster with stretch pool enabled.
Check if cluster is accessible and can accept read/write operations

Signed-off-by: Kamoltat <ksirivad@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants