Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: osds with no device class should not raise an exception. #37

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jmamma
Copy link

@jmamma jmamma commented Apr 9, 2024

There are normal situations where osds will have a crush weight but no device class set.

@TheJJ
Copy link
Owner

TheJJ commented Apr 10, 2024

uh ok :) when is that? we group devices by class for internal selection optimization, so we can't just ignore the reasons when fixing it properly.

is it for old clusters where there was just one default "class"?

@jmamma
Copy link
Author

jmamma commented Apr 10, 2024

We're running Quincy here. We group by device class also.

When I build and activate osds they will automatically have a crush weight but no device class set.
To add capacity to a pool, I'll sequentially set device class on the osds to add capacity.

During this process some osds on a host will not have a device class and jj will trigger the above exception.

@jmamma
Copy link
Author

jmamma commented Apr 10, 2024

Example host:

 -44         8714.51367  root default                                         
  -3          167.63745      host host1               
 208            6.98489          osd.208                 up   1.00000  1.00000.  <--- crush weight, but no device class
 209   nvme     6.98489          osd.209                 up   1.00000  1.00000
 210   nvme     6.98489          osd.210                 up   1.00000  1.00000
 211   nvme     6.98489          osd.211                 up   1.00000  1.00000
 212   nvme     6.98489          osd.212                 up   1.00000  1.00000
 213   nvme     6.98489          osd.213                 up   1.00000  1.00000
 214   nvme     6.98489          osd.214                 up   1.00000  1.00000```

@TheJJ
Copy link
Owner

TheJJ commented Apr 11, 2024

this osd should clearly have the nvme class! or is it somehow special?
when looking for movement target candidates, this surely is picked when "taking" a nvme class tree bucket, no?

@jmamma
Copy link
Author

jmamma commented Apr 11, 2024

We add or remove device class via

ceph osd crush set-device-class <class> <osd.id>
ceph osd crush rm-device-class <osd.id>

It's a manual process which allows us to allocate osds to a pool as needed.

Sometimes we choose not to assign a device class to an osd due to operational reasons.
Or when building a new node, the osds will be in the crush map but have no class set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants