Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow block_size/block_count to be optional, add erase_count, lfs_fs_stat, and lfs_fs_grow #753

Draft
wants to merge 9 commits into
base: test-and-bench-runners
Choose a base branch
from

Conversation

geky
Copy link
Member

@geky geky commented Dec 5, 2022

This PR moves block_size/block_count into RAM, allowing more flexibility during lfs_mount.

The main benefit is allowing block_size and block_count to be marked as "unknown", using the newly added erase_count to read block_size and block_count from the superblock during lfs_mount.

Unfortunately this isn't as simple as it sounds. In littlefs, as a part of enabling atomic updates, the superblock can live in either block 0 or block 1. But if we don't know the block_size, we don't really know where block 1 begins. This can put us in a catch-22 situation where we need the block_size in order to read the block_size from the superblock.

The solution proposed here is to search possible block_sizes until we find the superblock, optionally using the size of the disk to narrow down our search. Surprisingly, this adds no runtime if we find a littlefs image, but unfortunately has some complicated performance implications if the disk in unformatted:

  1. .block_size = BLOCK_SIZE,
    .block_count = BLOCK_COUNT,
    

    If we know the block_size and block_count, we can easily mount the filesystem, erroring with LFS_ERR_INVAL if the block_size/block_count stored in the superblock are different. Worst case $O(1)$.

  2. .block_size = BLOCK_SIZE,
    .block_count = 0, // unknown
    

    If we know the block_size, we can easily mount the filesystem and read the block_count from the superblock. Worst case $O(1)$.

  3. .block_size = 0, // unknown
    .block_count = DISK_SIZE / ERASE_SIZE,
    

    If we don't know the block_size, we need to search for it. Two shortcuts: 1. we know the block_size must be a multiple of the erase_size, and 2. if we know the block_count, we can assume the block_size divides the block_count evenly. Worst case $O(d(n))$.

  4. .block_size = 0, // unknown
    .block_count = 0, // unknown
    

    If we don't know the block_count, or can't rely on the above assumption, the best we can do is search all multiples of the erase_size. This risks taking a long time on large, unformatted disks. Worst case $O(n)$.

The above function, $d(n)$, is the "divisor function", and it's analysis is a bit difficult. On average, and for all powers-of-two, it runs in $O(\log n)$. But there are so called "highly composite numbers" where it can take at worst $O(\sqrt n)$.

#349 has more discussion about this scheme and alternatives.

I put together a quick benchmark to measure how long lfs_mount takes to fail as the disk grows. We can see the sub-linear runtime if we know the block_count, and linear runtime if both block_size and block_count are unknown. If we know the block_size, lfs_mount reads only ~136 bytes:

bench-mount
bench-mount-dark

Choosing a random device, mt25q, shows a max of ~90MiB/s reads in the datasheet. So for a very rough estimate:

unformatted mount
1 GiB @ 90 MiB/s
known block_count ~125 us
known block_size ~1.44 us
known bs/bc ~1.44 us
unknown bs/bc ~376 ms

With block_size/block_count now variable, this PR also includes a couple relevant new functions:

  • lfs_fs_stat

    lfs_fs_stat allows you to access the configuration read from littlefs's superblock during mount, as well as other state such as the number of blocks in use (susceptible to over-counting due to CoW).

    This is supposed to mirror statvfs common in other filesystems.

    struct lfs_fsstat fsstat;
    int err = lfs_fs_stat(lfs, &fsstat);
    if (err) {
        return err;
    }
    
    printf("block_size:  %d\n", fsstat.block_size);
    printf("block_count: %d\n", fsstat.block_count);
    printf("block_usage: %d\n", fsstat.block_usage); // same as lfs_fs_size
    printf("name_max:    %d\n", fsstat.name_max);
    printf("file_max:    %d\n", fsstat.file_max);
    printf("attr_max:    %d\n", fsstat.attr_max);
  • lfs_fs_grow

    First implemented by @kaetemi, lfs_fs_grow allows you to change the size of the filesystem on disk to take advantage of additional space if it becomes available:

    cfg.block_count = 1024;
    
    int err = lfs_mount(lfs, cfg);
    if (err) {
        return err;
    }
    
    struct lfs_fsstat fsstat;
    err = lfs_fs_stat(lfs, &fsstat);
    if (err) {
        return err;
    }
    printf("block_count: %d\n", fsstat.block_count); // prints 1024
    
    err = lfs_fs_grow(lfs, 2048);
    if (err) {
        return err;
    }
    
    struct lfs_fsstat fsstat;
    err = lfs_fs_stat(lfs, &fsstat);
    if (err) {
        return err;
    }
    printf("block_count: %d\n", fsstat.block_count); // prints 2048

    If you want to automatically resize littlefs to the full disk size every mount, this can be done by using these two calls as your "mount" operation:

    lfs_size_t required_block_count = DISK_SIZE / BLOCK_SIZE;
    
    // unknown block_count
    cfg.block_count = 0;
    
    // mount and resize if disk has grown
    int err = lfs_mount(lfs, cfg);
    if (err) {
        return err;
    }
    
    err = lfs_fs_grow(lfs, required_block_count );
    if (err) {
        return err;
    }

Relevant discussions:

Note this PR is currently under discussion in #349.

@e107steved
Copy link

I expressed some concern about additional RAM usage in #349 - it could (at least at first glance) be a significant amount in the context of a small to medium processor that doesn't need the RAM-based config block.
What are the benefits (if any!) of littleFS doing the scan with different block sizes when needed, rather than leaving a user to set up the incredibly simple code loop and create a persistent config block in RAM?

@geky
Copy link
Member Author

geky commented Dec 6, 2022

Don't mind me just copying the CI output here. I don't have a strong opinion yet on if this goes in or not:

Tests passed ✓, Code: 16334 B (+4.0%), Stack: 1344 B (+3.1%), Structs: 812 B (+5.2%)
Code Stack Structs Coverage
Default 16334 B (+4.0%) 1344 B (+3.1%) 812 B (+5.2%) Lines 2293/2476 lines (+0.3%)
Readonly 6154 B (+7.4%) 400 B (+13.6%) 812 B (+5.2%) Branches 1183/1498 branches (-0.4%)
Threadsafe 17178 B (+4.1%) 1344 B (+3.1%) 820 B (+5.1%) Benchmarks
Migrate 18042 B (+3.8%) 1672 B (+2.0%) 816 B (+5.2%) Readed 29303789027 B (+0.0%)
mkdir -p comment
Error-asserts 17090 B (+4.7%) 1344 B (+3.7%) 812 B (+5.2%) Proged 1482456139 B (+0.0%)
Erased 1568290304 B (+0.0%)

This adds two new configuration options: erase_size and erase_count,
allowing block_size and block_count to be loaded from the superblock
during mount. For backwards compability these default to block_size and
block_count if zero.

---

Unfortunately this is a bit easier said than done. littlefs keeps its
superblock in the metadata pair located at blocks {0,1}, which is also
where the root directory lives (keep in mind small littlefs images may
only have 2 blocks in total). If we mutate blocks {0,1}, we have to
erase before programming, which means it's possible to have the only
superblock in block 1. This presents a puzzle because how do you find
block 1 if you don't know the size of a block?

One solution presented here is to search for block 1 by trying different
sizes until we find a superblock. This isn't great but there are some
properties of this search that help:

1. If we do find a superblock, the search will never take longer than a
   mount with a known block_size. This is because we stop at block 1,
   searching at most O(block_size) bytes, and metadata fetch is already
   a O(block_size) operation.

   This means the concern is limited to how long it takes to fail when
   littlefs is not present on the disk.

2. We can assume the on-disk block_size is probably a factor of the
   total size of the disk. After a bit of digging into the math, this
   apparently reduces the runtime to the divisor function, d(n), which
   is sublinear.

   According to a blog post by Terence Tao this is bounded by the
   ridiculous O(e^O(log(n)/log(log(n)))):
   https://terrytao.wordpress.com/2008/09/23/the-divisor-bound

   This is apparently somewhere between O(sqrt(n)) and O(log(n)), but
   conveniently O(log(n)) on average and O(log(n)) for powers of 2.

   I've left it as O(d(n)) in the documentation, which might be a bit
   confusing, but I'm not sure how best to capture "mostly log(n)"
   correctly.

3. If we don't know the block_size, or don't know that block_size is
   aligned to the disk size, the best we can do is a O(n) search.

   In this case I've added a warning, so at least it's distinguishable
   from an infinite loop if debugging.
This hopefully helps make it more clear that we are using erase_size as the
unit during the block_size search. However this does mean you need to
set erase_count to zero to allow any littlefs size, which might be a bit
confusing:

  block_size  block_count  erase_size  erase_count
  known       known        known       known       => known bs, O(1)
  known       unknown      known       known       => known bs, O(1)
  unknown     unknown      known       known       => bs search, O(d(n))
  unknown     unknown      known       unknown     => bs search, O(n)
Aside from asserts (which can be implemented in block devices), erase_count
provides no useful info aside from what's known with block_count and
risks mistakes during configuration.

Each set of known/unknown block_size-related variables has a well-defined
mount behavior:

  block_size  block_count
  known       known       => known bs, O(1)
  known       unknown     => known bs, O(1)
  unknown     known       => bs search, O(d(n))
  unknown     unknown     => bs search, O(n)

If block_size is unknown, block_count uses the erase_size as a unit,
which is the only reasonable option since block_size must be
>= prog/read/cache_size.
This ended up needing a bit of API rework since littlefs no longer needs
to know the actual erase_count. We can no longer rely on lfs_config to
contain all the information necessary to configure the block devices.

Changing the lfs_bd_config structs to be required is probably a good
idea anyways as it moves us more towards separating the bds from
littlefs, though we can't quite get rid of the lfs_config parameter
because of the block-device API in lfs_config. Eventually it would be
nice to get rid of it, but that would require API breakage.
This is necessary for accessing format-specific information stored in
the superblock after mounting.

As is common in other filesystems this also provides the filesystem
usage. However collecting the filesystem usage is more expensive in
littlefs than other filesystem (and less accurate because of CoW), so
this may need to be cached after mount in the future.
- Need to drop cache during block_size search.

- Removed "Corrupted" messages when root is not found to avoid too much debug
  output. This also gets rid of a long-term point of confusion arround
  "Corrupted" messages on a mount that is intended to fail.
The initial implementation for this was provided by kaetemi, originally
as a mount flag. However, it has been modified here to be self-contained
in an explicit runtime function that can be called after mount.

The reasons for an explicit function:

1. lfs_mount stays a strictly readonly operation, and avoids pulling in
   all of the write machinery.

2. filesystem-wide operations such as lfs_fs_grow can be a bit risky,
   and irreversable. The action of growing the filesystem should be very
   intentional.

---

One concern with this change is that this will be the first function
that changes metadata in the superblock. This might break tools that
expect the first valid superblock entry to contain the most recent
metadata.
These locally show the expected O(d(n)) growth of the block_size search
when block_count is known.

A rough estimate of runtime for a 1 GiB device with a 90 MiB/s read bandwidth
(using the mt25q for rough numbers):

  known bs/bc       => ~1.44 us
  known block_size  => ~1.44 us
  known block_count => ~125 us
  unknown bs/bc     => ~376 ms
- Valgrind-detected memory leak in test_superblocks.

- Valgrind-detected uninitialized lfs->block_size in lfs_init caused by
  an unintended sed replacement.

- Said sed replacement also changed the implementation of the v1 portion
  of lfs_migrate, this is out of scope and caused migrate to fail, which
  fortunately CI noticed.

- We also need to copy the block_size/block_count configs in v1 so that
  low-level bd operations still work.

- le-conversion in test_evil now matters due to difference in
  LFS_ERR_CORRUPT/LFS_ERR_INVAL paths during lfs_mount. This is good
  because we're actually testing for loop detection not pointer
  out-of-bounds as intended.
@geky
Copy link
Member Author

geky commented Sep 12, 2023

Just an FYI for anyone watching this PR, at the moment I'm considering this PR defunct.

lfs_fs_stat has been merged in #838, and with any luck, unknown block-counts will be merged in #866, and lfs_fs_grow in #872.

The only remaining feature is the detection of unknown block-sizes, but after taking some time away from the problem, I'm not sure detecting unknown block-sizes based on the assumption of the total disk size being divisible is very reasonable in practice. And as @e107steved notes above, it's possible to do the naive search for block-size outside of the filesystem, so I will be leaving this feature out for now.

At the moment it looks like block-size will always be a required configuration option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automount enhancement needs documentation needs documentation needs minor version new functionality only allowed in minor versions postponed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants