New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: drop client metrics during recovery #57084
base: main
Are you sure you want to change the base?
Conversation
Fixes: https://tracker.ceph.com/issues/65660 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
jenkins test make check |
@@ -2130,6 +2130,8 @@ void MDSRank::active_start() | |||
{ | |||
dout(1) << "active_start" << dendl; | |||
|
|||
m_is_active = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where you are resetting it. I'd suggest that you put the code at the end of handle_mds_map
:
m_is_active = is_active();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resetting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO no need to reset it.
@batrick BTW, why not just check the MDS' state from the mdsmap
instead of adding a new m_is_active
? For lockless case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the mds_lock
to look at the MDSMap. We don't want the metrics aggregator to be acquiring that lock generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
I appreciate that as of today an instance of MDSRank may never experience a transition from active -> inactive in a way that would affect this code. However, relying on that fact here creates an implicit dependency on how MDSRank instances should be managed that isn't unit-tested (as of today). We should try to minimize such implicit dependencies, IMO |
|
@batrick I think this change will fix the case that when connecting to the old clients, which haven't included my previous fixes. |
Fixes: https://tracker.ceph.com/issues/65660
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e