--suppress-size still recursively computes file sizes even when sorting is disabled #69

raffimolero · 2023-03-14T10:32:44Z

I'm a simple man. I like pretty colors and file icons. So I installed this to replace tree and exa. I put up an alias to run et -i -I -s size -l 2. all nice and pretty.

Then I tried running the command at my root directory with and realized that wait, this is a tool that was made to calculate disk usage, not just print pretty trees! It took longer than necessary to just print 2 layers of folders.

So I tried --suppress-size. Still, it took the same amount of time.
So then I thought this was using the size data to possibly sort the files. I checked the source code and, yeah, it holds the data until it tries to sort. But it would be nice™ with

struct Context {
    // ...
    #[arg(short, long)]
    max_depth: Option<usize>,
}

impl Context {
    // ...
    fn max_depth(&self) -> usize {
        self.max_depth.unwrap_or(usize::MAX)
    }

    fn walk_depth(&self) -> usize {
        self.max_depth().max(self.level())
    }
}

and a little bit of WalkBuilder::max_depth() in TryFrom<&Context> for WalkParallel then account for max_depth < level by setting the file sizes of deeper folders to 0 and hopefully done?

Now this should let you look at the shallow sizes of directories, if ever that's a thing that you need to do.

The text was updated successfully, but these errors were encountered:

solidiquis · 2023-03-14T15:46:23Z

Yeah I guess this is the opposite problem to what exa has. I originally did not want the --suppress-size option as this was meant to be a disk usage tool as well, but it became demanded enough to the point that I just thought "hey, why not?". The implementation that I approved definitely was just a monkey-patch as the ability to suppress-size and circumvent the overhead associated with how disk usage is calculated would be a bit more of a lift.

So for context on the traversal and disk usage algorithm:

Traversal is done in parallel using WalkParallel. DirEntrys are transformed into Nodes and subsequently sent over a channel to another worker thread that is responsible for grouping children by their parent directory.
Once traversal is complete and all of the DirEntrys are converted to Nodes, we assemble the virtual Tree and recursively traverse to compute disk usage.

The reason we need to recur in both steps is because all knowledge of filesystem hierarchies is gone once traverse in parallel. It theoretically increases throughput when dealing with filesystem I/O but then it's up to you to extrapolate your file structure again using the paths of each Node. With regard to erdtree, I felt as though compartmentalizing the traversal and disk usage calculation into the two steps above would yield good performance because you're not interweaving I/O-bound and CPU-bound workloads and you're allowing the former more costly step (in terms of time) to be done in parallel.

In a world where --suppress-size works perfectly in that you don't have to pay the overhead of double recursion (one for the file-system and one for the virtual Tree), I think we could just throw parallelism out the window and build the Tree in a single-threaded context i.e. use Walk instead of WalkParallel.

This of course is conjecture and I haven't had too much time to consider this super thoughtfully, but this is on my radar. Thanks a bunch for taking an interest in the project and making an issue out of this! I plan to do weekly minor releases for the foreseeable future and depending on how busy I am with work the fix for this will either be out this Sunday or the next. Let me know if you have any thoughts :)

solidiquis mentioned this issue May 28, 2023

avoid indexing every time? #184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--suppress-size still recursively computes file sizes even when sorting is disabled #69

--suppress-size still recursively computes file sizes even when sorting is disabled #69

raffimolero commented Mar 14, 2023 •

edited

solidiquis commented Mar 14, 2023 •

edited

--suppress-size still recursively computes file sizes even when sorting is disabled #69

--suppress-size still recursively computes file sizes even when sorting is disabled #69

Comments

raffimolero commented Mar 14, 2023 • edited

solidiquis commented Mar 14, 2023 • edited

raffimolero commented Mar 14, 2023 •

edited

solidiquis commented Mar 14, 2023 •

edited