-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools/cephfs: recover alternate_name of dentries from journal #55792
base: main
Are you sure you want to change the base?
Conversation
Obviously, I think tests need written for this but unfortunately that's not immediately easy to do because |
Good catch! |
@vshankar @lxbsz is the |
encode('I', dentry_bl); | ||
encode('i', dentry_bl); | ||
ENCODE_START(2, 1, dentry_bl); | ||
encode(fb.alternate_name, dentry_bl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the encode order, where Inode has alternate_name at the front and Link has it at the back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both cases alternate_name
is a piece of dentry metadata. As a primary link, ino/d_type
are inferred from the inode that's embedded in the dentry. It doesn't really matter in any case (as inode encoding was also made versioned [1] where before it was not [2]) but made sense to everyone at the time.
[1] https://github.com/ceph/ceph/pull/37297/files#diff-c0b626494569135ea1406cfad7ab9929a4df0a1f7d880405b5d2e734db812a03R2207-R2241
[2] https://github.com/ceph/ceph/pull/37297/files#diff-c0b626494569135ea1406cfad7ab9929a4df0a1f7d880405b5d2e734db812a03L2285-L2309
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct.
I think asok could work in this case. How much encoding validation do we currently do? Per this here [1] , this field is currently used for supporting long file names on clients who don't know about fscrypt. I proposed a custom client to validate security posture in [2]. Perhaps we add additional tests to this custom client, to further test when that becomes available. |
I'd really rather not create an asok command just for synthetic manipulation of alternate_name of a dentry, if possible.
No actually |
@batrick Sorry for late reply and I think missed this. Yeah it is. The #54725 is the last piece of the puzzle. But won't affect this tests IMO. |
5c49e33
to
5aa7484
Compare
69a8de6
to
4d9eb97
Compare
Example failure with the new test:
/teuthology/pdonnell-2024-05-16_22:05:38-fs:fscrypt-wip-pdonnell-testing-20240515.023933-debug-distro-default-smithi/7709888/teuthology.log I'll polish this and cleanup the commits. |
816c2ae
to
7cc6bdc
Compare
|
||
self.fs.fail() | ||
|
||
self.fs.journal_tool(['event', 'recover_dentries', 'list'], 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running this patch on my local dev machine. It gets to this line and fails during decode()
. Have you run this on teuthology? Can you share a log for me to look at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the example without the fix here:
but I don't have a link on hand with this fix.
Are you running an old version of the journal-tool perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ran it via vstart_runner within branch: i64602.
Also ran the command manually:
./bin/cephfs-journal-tool --debug-mds=20 --debug-ms=1 --debug-objecter=1 --rank cephfs:0 event recover_dentries list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run this through QA and see what we find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes, there is a decode error: https://pulpito.ceph.com/pdonnell-2024-05-21_20:54:38-fs:fscrypt-wip-pdonnell-testing-20240521.172753-debug-distro-default-smithi/
It doesn't make much sense to me. I'll reproduce locally and gdb it.
0b35eab
to
85a19d4
Compare
jenkins test make check arm64 |
This PR is under test in https://tracker.ceph.com/issues/66167. |
rpm: | ||
- fscrypt | ||
deb: | ||
- fscrypt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remembered in some old distros the fscrypt
package couldn't be found and installed. But for the latest distros we use in qa should be fine.
Let's just keep this in mind.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Test without the fix: 2024-05-16T22:34:21.781 DEBUG:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 300 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok --format=json dump tree /dir 0 ... [ { "accounted_rstat": { "rbytes": 4096, "rctime": "2024-05-16T22:34:15.251864+0000", "rfiles": 1, "rsnaps": 0, "rsubdirs": 1, "version": 0 }, "atime": "2024-05-16T22:34:14.498880+0000", "auth_pins": 0, "auth_state": { "replicas": {} }, "authlock": {}, "backtrace_version": 14, "btime": "2024-05-16T22:34:14.498880+0000", "change_attr": 3, "client_caps": [ { "client_id": 5184, "issued": "pAsLsXsFsx", "last_sent": 4, "pending": "pAsLsXsFsx", "wanted": "pAsLsXsFsxcral" } ], "client_ranges": [], "ctime": "2024-05-16T22:34:15.249864+0000", "damage_flags": 0, "dir_layout": { "dir_hash": 2, "unused1": 0, "unused2": 0, "unused3": 0 }, "dirfrags": [ { "auth_pins": 0, "auth_state": { "replicas": {} }, "committed_version": "0", "committing_version": "0", "dentries": [ { "alternate_name": "bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5/YV7DtpeNPZnorcTqxPM/hExtWHSS4P+S+Dpwj62hMyh/77sGhiW1Filvv1gQjV+sN/GozPNwHgfleadkUs1OkRkYtgWrCjbKP0MayRtiOLrVTRuYyOp/Qt3+XCIyiS87B9bUcOFjWratF+yR0kpJ0RYriix7NKVkBJ0kGWYSCY+PYjiLeMYJBMQcCxW/nwfVku+m6fgFJvb6pjEFxIk9zT5cunSImsjr", "auth_pins": 0, "auth_state": { "replicas": {} }, "inode": 1099511628283, "is_auth": true, "is_freezing": false, "is_frozen": false, "is_new": false, "is_null": false, "is_primary": true, "is_remote": false, "lock": {}, "nref": 2, "path": "dir/bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5,YV7DtpeNPZnorcTqxPM,hExtWHSS4P+S+Dpwj62hMyh,77sGhiW1Filvv1gQjV+sN,GozPNwHgfleadkUuZ+PMLCaKQXhuid9WvmHanxJnaabYDLj4VEz+EX2WsG", ... # fail + journal recovery 2024-05-16T22:35:31.077 DEBUG:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 300 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok --format=json dump tree /dir 0 ... [ { "accounted_rstat": { "rbytes": 4096, "rctime": "2024-05-16T22:34:15.251864+0000", "rfiles": 1, "rsnaps": 0, "rsubdirs": 1, "version": 0 }, "atime": "2024-05-16T22:34:14.498880+0000", "auth_pins": 0, "auth_state": { "replicas": {} }, "authlock": {}, "backtrace_version": 14, "btime": "2024-05-16T22:34:14.498880+0000", "change_attr": 3, "client_caps": [], "client_ranges": [], "ctime": "2024-05-16T22:34:15.249864+0000", "damage_flags": 0, "dir_layout": { "dir_hash": 2, "unused1": 0, "unused2": 0, "unused3": 0 }, "dirfrags": [ { "auth_pins": 0, "auth_state": { "replicas": {} }, "committed_version": "5", "committing_version": "5", "dentries": [ { "alternate_name": "", "auth_pins": 0, "auth_state": { "replicas": {} }, "inode": 1099511628283, "is_auth": true, "is_freezing": false, "is_frozen": false, "is_new": false, "is_null": false, "is_primary": true, "is_remote": false, "lock": {}, "nref": 2, "path": "dir/bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5,YV7DtpeNPZnorcTqxPM,hExtWHSS4P+S+Dpwj62hMyh,77sGhiW1Filvv1gQjV+sN,GozPNwHgfleadkUuZ+PMLCaKQXhuid9WvmHanxJnaabYDLj4VEz+EX2WsG", Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Right now the cephfs-journal-tool always uses the legacy encoding for dentries which will drop the alternate_name if ever set. Fixes: 39f3440 Fixes: https://tracker.ceph.com/issues/64602 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Right now the cephfs-journal-tool always uses the legacy encoding for dentries which will drop the alternate_name if ever set.
Fixes: https://tracker.ceph.com/issues/64602
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e