mds: drop client metrics during recovery #57084

batrick · 2024-04-25T00:45:33Z

Fixes: https://tracker.ceph.com/issues/65660

Checklist

Tracker (select at least one)
- References tracker ticket
Component impact
- No impact that needs to be tracked
Documentation (select at least one)
- No doc update is appropriate
Tests (select at least one)
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

Fixes: https://tracker.ceph.com/issues/65660 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick · 2024-05-01T15:22:01Z

jenkins test make check

leonid-s-usov · 2024-05-01T20:32:15Z

src/mds/MDSRank.cc

@@ -2130,6 +2130,8 @@ void MDSRank::active_start()
 {
  dout(1) << "active_start" << dendl;

+  m_is_active = true;


I don't see where you are resetting it. I'd suggest that you put the code at the end of handle_mds_map:

m_is_active = is_active();

IMO no need to reset it.

@batrick BTW, why not just check the MDS' state from the mdsmap instead of adding a new m_is_active ? For lockless case ?

You need the mds_lock to look at the MDSMap. We don't want the metrics aggregator to be acquiring that lock generally.

Makes sense.

leonid-s-usov · 2024-05-01T20:54:47Z

I appreciate that as of today an instance of MDSRank may never experience a transition from active -> inactive in a way that would affect this code. However, relying on that fact here creates an implicit dependency on how MDSRank instances should be managed that isn't unit-tested (as of today). We should try to minimize such implicit dependencies, IMO

batrick · 2024-05-03T14:18:03Z

I appreciate that as of today an instance of MDSRank may never experience a transition from active -> inactive in a way that would affect this code. However, relying on that fact here creates an implicit dependency on how MDSRank instances should be managed that isn't unit-tested (as of today). We should try to minimize such implicit dependencies, IMO

up:active has always been treated as a "terminal" state for a rank and is reflected in the code everywhere. I don't see a benefit for making this somehow resilient to that changing.

lxbsz · 2024-05-09T00:52:02Z

@batrick I think this change will fix the case that when connecting to the old clients, which haven't included my previous fixes.

rishabh-d-dave · 2024-05-20T04:42:37Z

@batrick Picking this PR for QA

rishabh-d-dave · 2024-05-20T04:58:01Z

This PR is under test in https://tracker.ceph.com/issues/66125.

leonid-s-usov

I understand that MDS can't deactivate except for shutting down, hence I'll approve as-is.

To reduce ambiguity for future readers, who, like me, could be confused by the naming and why this state never gets reset, I'd suggest the name: m_did_activate. IMO this better encodes the one-way nature of the flag.

rishabh-d-dave · 2024-05-21T12:46:31Z

This PR is under test in https://tracker.ceph.com/issues/66162.

batrick added cephfs Ceph File System needs-review labels Apr 25, 2024

batrick requested a review from a team April 25, 2024 00:45

mds: drop client metrics during recovery

83f445c

Fixes: https://tracker.ceph.com/issues/65660 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick force-pushed the i65660 branch from 04260e1 to 83f445c Compare April 25, 2024 00:59

leonid-s-usov requested changes May 1, 2024

View reviewed changes

lxbsz approved these changes May 9, 2024

View reviewed changes

rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label May 20, 2024

leonid-s-usov approved these changes May 20, 2024

View reviewed changes

batrick added needs-qa and removed needs-review labels May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mds: drop client metrics during recovery #57084

mds: drop client metrics during recovery #57084

batrick commented Apr 25, 2024 •

edited by leonid-s-usov

batrick commented May 1, 2024

leonid-s-usov May 1, 2024

batrick May 3, 2024

lxbsz May 9, 2024

batrick May 9, 2024

lxbsz May 9, 2024

leonid-s-usov commented May 1, 2024

batrick commented May 3, 2024

lxbsz commented May 9, 2024

rishabh-d-dave commented May 20, 2024

rishabh-d-dave commented May 20, 2024

leonid-s-usov left a comment

rishabh-d-dave commented May 21, 2024

mds: drop client metrics during recovery #57084

Are you sure you want to change the base?

mds: drop client metrics during recovery #57084

Conversation

batrick commented Apr 25, 2024 • edited by leonid-s-usov

Checklist

batrick commented May 1, 2024

leonid-s-usov May 1, 2024

Choose a reason for hiding this comment

batrick May 3, 2024

Choose a reason for hiding this comment

lxbsz May 9, 2024

Choose a reason for hiding this comment

batrick May 9, 2024

Choose a reason for hiding this comment

lxbsz May 9, 2024

Choose a reason for hiding this comment

leonid-s-usov commented May 1, 2024

batrick commented May 3, 2024

lxbsz commented May 9, 2024

rishabh-d-dave commented May 20, 2024

rishabh-d-dave commented May 20, 2024

leonid-s-usov left a comment

Choose a reason for hiding this comment

rishabh-d-dave commented May 21, 2024

batrick commented Apr 25, 2024 •

edited by leonid-s-usov