Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/non empty dir removal #557

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tim-nordell-nimbelink
Copy link

For background, see #43. I figured we could add logic easily that prevented the orphan issue that @geky mentioned by scanning for files only in the folder, so this commit adds that logic. E.g. you can remove a folder so long as it doesn't contain other folders with one commit to the filesystem permitting several files to be removed at once.

@tim-nordell-nimbelink
Copy link
Author

One thing I wasn't sure about is the lfs_fs_preporphans(...) count added/removed. Technically we're removing multiple "blocks" at once from the allocated filesystem with this change so I wasn't sure if the lfs_fs_preporphans should reflect that, or if it's supposed to reflect the number of folders. I'm assuming not since I don't think lfs_dir_split(...) touchs lfs_fs_preporphans(...) at all.

I did miss updating the global state from each of the split pages initially, but that has been added in with the new commit block. I also had the linked list of directories slightly wrong, and propagated this up a hair.

@tim-nordell-nimbelink
Copy link
Author

I guess not actually testing it in the compile environment within littlefs was a mistake, nor running the test suite. It's fixed now. :) (I only tried it in my project on my embedded platform which had different compiler settings.)

@geky geky added the needs minor version new functionality only allowed in minor versions label Jun 12, 2021
@geky
Copy link
Member

geky commented Jun 12, 2021

Sorry about the delay between CI jobs, the new "workflow approval for first time contributors" requirement means I need to manually allow the CI to run, which I unfortunately probably won't be able to do promptly.

As far as I'm aware there's no way to give you CI permissions, that would be awfully convenient :/

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

@geky
Copy link
Member

geky commented Jun 12, 2021

I'm assuming not since I don't think lfs_dir_split(...) touchs lfs_fs_preporphans(...) at all.

Ah, lfs_dir_split is a bit special. Orphans are metadata-pairs that no longer belong to a directory, and when we split we can create the metadata-pair and put it in the directory's linked-list at the same time, so creating an orphan is not possible.

We need the orphan state for other operations, such as remove, where we need two commits to remove the metadata-pair from both the directory tree and the metadata linked-list.

For example if we remove C here:

          .----.
          | A  |-.
.---------|    |-'
|         '----'
|    .-----'| '-----.
|    v      v       v
| .----.  .----.  .----.
'>| B  |->| C  |->| D  |
  |    |  |    |  |    |
  '----'  '----'  '----'

Technically we're removing multiple "blocks" at once from the allocated filesystem with this change so I wasn't sure if the lfs_fs_preporphans should reflect that

lfs_fs_preporphans is a bit misleading here, on disk there is a single bit to say "has orphans" vs "does not have orphans". This is to avoid an expensive deorphan scan which usually is not needed. The deorphan scan keeps searching for orphans until it is sure there is no more orphans on the metadata-pair linked-list (here).

The reason there is a counter on the driver side is to keep track of how many orphans have been created by different operations so we know if we need to update the orphan bit in lfs_dir_commit.

Wear-leveling/bad-block removal also creates orphans, and can happen on any commit, so this gets a bit messy.

You will need to set the orphan bit, but you don't actually need to write the exact number to the disk, so you could "+1" for multiple and, and "-1" once you have removed the blocks from the linked-list. There is an assert in preporphans (here) that should help protect against mistakes here.

lfs.c Outdated
return 0;
}
#endif

#ifndef LFS_READONLY
static int lfs_rawremove(lfs_t *lfs, const char *path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSIX is pretty clear that remove should return ENOTEMPTY if the directory isn't empty. Even though this may seem unhelpful, changing this risks surprising users and an incompatible filesystem.

I think the idea of creating a new function for this is a good one. If there's no prior art perhaps lfs_removeall?

lfs.c Outdated
}

if (lfs_tag_type3(tag) == LFS_TYPE_DIR) {
return LFS_ERR_NOTEMPTY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is moved to lfs_removeall or equivalent: Perhaps this should return ENOSYS for now (feel free to add LFS_ERR_NOSYS = -38 to the lfs_error enum). This suggests the error code may change.

It would be nice to add support for recursive removal of directories in the future if possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the filesystem correctly, the linked list of "directories" is only contiguous for one directory at a time? E.g. a "split" of a given directory will have all of its links together? Recursive support would potentially require multiple touch points to the linked list structure, but a single folder without subfolders doesn't have this requirement.

However, one could remove entire directories together at a "super" function. That function would need to be semi-recursive. It could keep looking for folders, removing those, and then starting over at the top again. (That way it doesn't have as much of an in-memory recursive footprint and depend on the on-disk state.)

I did also add support in the removal by move, too. New function name for this, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the filesystem correctly, the linked list of "directories" is only contiguous for one directory at a time? E.g. a "split" of a given directory will have all of its links together?

Yep, directories can't be interleaved.

Recursive support would potentially require multiple touch points to the linked list structure, but a single folder without subfolders doesn't have this requirement.

lfs_fs_deorphan already handles multiple orphans spread out in the linked-list (can be created by wear-leveling/bad-block removal), so I don't think that will be a problem. The main challenge is how to determine if a metadata-pair is an orphan, currently lfs_fs_deorphan only looks up the immediate parent, and is already expensive.

It could keep looking for folders, removing those, and then starting over at the top again. (That way it doesn't have as much of an in-memory recursive footprint and depend on the on-disk state.)

This might work, if you also remove the directory entry to keep track of progress. The goal of littlefs is fixed RAM cost, so really no recursion is allowed at all (I'm being a bit hypocritical at the moment, but working to remove the one recursive function in the codebase). But and tail-recursive function can be written as a loop, so it sounds like it should work.

I did also add support in the removal by move, too. New function name for this, too?

Oh I missed that. Hmm, that's a good question, lfs_renameall certainly doesn't make sense.

Maybe lfs_removerecursive and lfs_renamerecursive?

As an aside that is certainly quite a bit of functionality for one little function.

Feel free to refactor the internals to reuse code between the recursive/POSIX versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lfs_removerecursive and lfs_renamerecursive?

Hah, yeah. Before seeing this, I called it "lfs_rename_with_removeall" which is really verbose (and probably too verbose), but does describe it accurately. I'm not sure I care for renamerecursive personally, but I'm not sure I really care for what I called it either.

As an aside that is certainly quite a bit of functionality for one little function.

Feel free to refactor the internals to reuse code between the recursive/POSIX versions.

Yeah, I modified the rawremove(...) to accept a pointer to the bit of code that does the work in the "recursive" side of things. I figured this reuses code between the two fairly easily; is this an okay approach for you? The code will be dropped by a linker with garbage collection if noone references the helper function to put it into an argument.

Thanks,
Tim

@tim-nordell-nimbelink
Copy link
Author

Sorry about the delay between CI jobs, the new "workflow approval for first time contributors" requirement means I need to manually allow the CI to run, which I unfortunately probably won't be able to do promptly.

As far as I'm aware there's no way to give you CI permissions, that would be awfully convenient :/

No worries - I saw that you had to manually run the CI. I wonder if you had pulled the branch into a temporary branch in the official repository if it'd then consider me to be a contributor, or if I need to be in the master history?

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

I'll check out these tests cases, thanks.

@geky
Copy link
Member

geky commented Jun 14, 2021

It looks like the branch doesn't work, I'm guessing it needs to be on master since this is what GitHub uses to update the contributors graph.

https://github.com/littlefs-project/littlefs/tree/tim-nordell-nimbelink-perm

Unfortunately they really don't have much documentation over this badge since it originally was just decorative.

@tim-nordell-nimbelink
Copy link
Author

Hi @geky -

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

Got it - I didn't realize there were additional tests hidden. I figured out the problem - I was walking the splits for the gstate, but I forgot to xor as I went along. (The fix is here for the test case that was failing: 45d02e5.) I haven't squashed the fixups, but I figured I'd leave them in there this time in case we didn't want any of those additional changes.

I moved the core logic into a helper function, and am passing that as a pointer down into the sub routines. I wasn't sure if you'd want that minor code cost always there or not - e.g. this way it can implicitly be dropped by the linker if a user doesn't use the 2 new user-facing routines.

I haven't added any recursive top-level routines as-of-yet, but I think any recursion due to the linked list nature needs to be done at a layer above what I implemented. However, any recursive routine can now be a bit faster than it could have been previously if added at some point.

It looks like the branch doesn't work, I'm guessing it needs to be on master since this is what GitHub uses to update the contributors graph.

Oh well, I figured it might be worth a shot.

Would you like me to squash the fixups? Anything you'd like changed in the changeset? All of the current test cases now pass (at commit 45d02e5 which is before I split off the function name). I wasn't sure if you wanted any of the test cases moved to the new function names or not, so none of the tests have been changed. I did make a duplicate of the test file that had failed that invokes the removeall function instead.

Thanks,
Tim

@tim-nordell-nimbelink
Copy link
Author

I just realized there are some stylistic things I need to conform to the codebase too. For instance, I'm usually a little more verbose and have { on new lines, so I am realizing I didn't quite conform to the styles of the codebase. I'll get those fixed up later today at some point.

@tim-nordell-nimbelink
Copy link
Author

Maybe lfs_atomic_remove(...) and lfs_atomic_rename(...)? The base ones are atomic (if I understand it correctly), too, but we just handle a wider range of use-cases with this changeset.

Incidentally, in Linux there is a "renameat2(...)" which apparently accepts a flag to exchange the location of two folders with RENAME_EXCHANGE; not quite what is implemented here.

@tim-nordell-nimbelink tim-nordell-nimbelink force-pushed the feature/non_empty_dir_removal branch 2 times, most recently from 9713247 to 901c5c8 Compare June 14, 2021 20:56
@tim-nordell-nimbelink
Copy link
Author

Hi @geky -

I added the lfs_remove_recursive(...) feature. So there are 3 new API functions overall:

  • lfs_atomic_remove(...)
  • lfs_atomic_rename(...)
  • lfs_remove_recursive(...)

The lfs_atomic_*(...) ones do not support folders with nested items in them, but the lfs_remove_recursive(...) one does. If I'm misunderstanding that removing a folder without nested folders is "atomic" from the filesystem perspective (e.g. after reboot you end up with either it not removed, or it completely removed), then it doesn't make sense to differentiate between these. I haven't thoroughly tested the lfs_remove_recursive(...) variant which is introduced in its own commit. I did add a minimal test case for this though into the test suite, but it could probably be more extensively tested.

The lfs_remove_recursive(...) will be a little more efficient than doing it externally outside of the filesystem code since it can walk the tags within a folder and decide to recurse into them without having to reload the whole tree up to that (and it doesn't require looking at the tag's name).

Thanks,
Tim

@tim-nordell-nimbelink tim-nordell-nimbelink marked this pull request as draft June 16, 2021 16:03
@wreyford
Copy link

wreyford commented Jun 16, 2021

I tested this code today, and wow, amazing. Thank you. Hope it gets added.
We accumulate up to 20000 small json files in a folder in a few days, and deleting these with unlink one by one takes over 9.5 hours. Deleting the folder with with 5000 records using rmdir_atomic(), only took me 4040ms. I had to modify the esp_littlefs port I'm using on the ESP32 WROOM 32 E

@tim-nordell-nimbelink
Copy link
Author

@wreyford - Glad you found it useful. We have something similar were we can accumulate several small files and I needed it to unlink more quickly, too.

I did find my recursive removal function is broken, so that's why I marked this back to a draft. The atomic_remove(...) should be fine, though, and maybe I should just leave the PR for those before a recursive solution is added in properly.

…ated

This is in preparation of the next commit for it to be able to consolidate
some of the logic between the two functions.
This routine checks that the folder is empty before removing and is
found in two places.  We can consolidate this for changes to the logic.
@tim-nordell-nimbelink tim-nordell-nimbelink marked this pull request as ready for review June 16, 2021 22:49
@tim-nordell-nimbelink
Copy link
Author

@geky and @wreyford -

The lfs_remove_recursive(...) routine is fixed. I suppose a future change for that would be to remove the top-level folder (and marking it as an orphan), and then go through and systematically remove the inner links from the linked list directory structure like it does. That'd probably allow the whole sub-folder to be atomically removed rather than as it is being done now.

I'm pretty happy about the current state of the code for this changeset. I refactored it a tad so that the common states for a directory removal are held in a structure, and that single structure is pushed around to the helpers rather than all of the constituent components as arguments. That should reduce the code overhead, too, of calling the helper functions.

It'd probably be easier to review the individual commits as they move things through:

  • The first commit in the series moves to using the common structure
  • The second commit pulls out some common code from remove/rename around the usage of that structure
  • The third commit adds the atomic_remove/atomic_rename
  • The fourth commit adds the remove_recursive(...) functionality

It's easier to see the refactoring in the first two commits instead of looking at the overall series as a changeset.

-- Tim

@tim-nordell-nimbelink
Copy link
Author

I realized I had some redundant code in the lfs_remove_recursive(...) codepath. It's consolidated now.

@tim-nordell-nimbelink
Copy link
Author

Hmm, interesting that the global state is broken for arm but not for x86_64. They're both little endian architectures, so... I'll see if I can figure it out.

Introduce the function lfs_atomic_rename(...) and lfs_atomic_remove(...)
that permits the removal of a folder with multiple files within it.  This
fails when the directory contains other directories.
Introduce a function that removes a folder and all of its child folders
recursively.
@tim-nordell-nimbelink
Copy link
Author

tim-nordell-nimbelink commented Jun 21, 2021

Hi @geky -

Doing some testing locally, and it appears that my assumption in some of the code was wrong (based on function names). I had assumed that the function lfs_dir_getgstate(...) set the output array, but it does an xor into the output array. So this code pattern was wrong:

static int lfs_dir_prep_remove_nonempty_folders(lfs_t *lfs, struct lfs_remove_state *p) {
    lfs_gstate_t split_gstate;
...
            // Before we fetch the next block, update our fetched gstate xor
            lfs_dir_getgstate(lfs, &p->dir.m, &split_gstate);
            lfs_gstate_xor(&p->gstate, &split_gstate);
...
}

It should have been:

static int lfs_dir_prep_remove_nonempty_folders(lfs_t *lfs, struct lfs_remove_state *p) {
...
            // Before we fetch the next block, update our fetched gstate xor
            lfs_dir_getgstate(lfs, &p->dir.m, &p->gstate);
...
}

My guess is it was failing inside qemu because of left-over data in the stack that was being xor'd (where I had made the assumption the functio filled it in). This has been fixed with the latest update.

I've also been assuming that I have to collect the global state from each of the splits within a directory for when dropping the directory. Is this assumption correct? I think the lfs_dir_drop(...) only handles the last "split" entry only.

Thanks,
Tim

@mrozekma
Copy link

mrozekma commented Apr 6, 2022

Any idea on the timeline to merge this? Thanks

@geky
Copy link
Member

geky commented Apr 14, 2022

So, I have been sitting on this PR for a while because I've been unsure if it should go in or not. It is a really nice feature leveraging some unique properties of littlefs, and I greatly appreciate the work @tim-nordell-nimbelink has put in to get this PR into a merge-able state.

But I'm concerned with how it will interact with future improvements to littlefs. Specifically that supporting this API might restrict littlefs to the current allocator design (#75). #75 is a bit more pressing because it affects the performance of all operations in littlefs, and risks limiting the maximum possible size of storage littlefs can support. Depending on how #75 is resolved, changing the block allocator to use a bitmap for example, may or may not break this proposal.

So for now I don't think I can bring this in until #75 is resolved. This unfortunately means a much longer timeline before this PR could go in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs minor version new functionality only allowed in minor versions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants