Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--suppress-size still recursively computes file sizes even when sorting is disabled #69

Open
raffimolero opened this issue Mar 14, 2023 · 1 comment

Comments

@raffimolero
Copy link

raffimolero commented Mar 14, 2023

I'm a simple man. I like pretty colors and file icons. So I installed this to replace tree and exa. I put up an alias to run et -i -I -s size -l 2. all nice and pretty.

Then I tried running the command at my root directory with and realized that wait, this is a tool that was made to calculate disk usage, not just print pretty trees! It took longer than necessary to just print 2 layers of folders.

So I tried --suppress-size. Still, it took the same amount of time.
So then I thought this was using the size data to possibly sort the files. I checked the source code and, yeah, it holds the data until it tries to sort. But it would be nice™ with

struct Context {
    // ...
    #[arg(short, long)]
    max_depth: Option<usize>,
}

impl Context {
    // ...
    fn max_depth(&self) -> usize {
        self.max_depth.unwrap_or(usize::MAX)
    }

    fn walk_depth(&self) -> usize {
        self.max_depth().max(self.level())
    }
}

and a little bit of WalkBuilder::max_depth() in TryFrom<&Context> for WalkParallel then account for max_depth < level by setting the file sizes of deeper folders to 0 and hopefully done?

Now this should let you look at the shallow sizes of directories, if ever that's a thing that you need to do.

@solidiquis
Copy link
Owner

solidiquis commented Mar 14, 2023

Yeah I guess this is the opposite problem to what exa has. I originally did not want the --suppress-size option as this was meant to be a disk usage tool as well, but it became demanded enough to the point that I just thought "hey, why not?". The implementation that I approved definitely was just a monkey-patch as the ability to suppress-size and circumvent the overhead associated with how disk usage is calculated would be a bit more of a lift.

So for context on the traversal and disk usage algorithm:

  1. Traversal is done in parallel using WalkParallel. DirEntrys are transformed into Nodes and subsequently sent over a channel to another worker thread that is responsible for grouping children by their parent directory.
  2. Once traversal is complete and all of the DirEntrys are converted to Nodes, we assemble the virtual Tree and recursively traverse to compute disk usage.

The reason we need to recur in both steps is because all knowledge of filesystem hierarchies is gone once traverse in parallel. It theoretically increases throughput when dealing with filesystem I/O but then it's up to you to extrapolate your file structure again using the paths of each Node. With regard to erdtree, I felt as though compartmentalizing the traversal and disk usage calculation into the two steps above would yield good performance because you're not interweaving I/O-bound and CPU-bound workloads and you're allowing the former more costly step (in terms of time) to be done in parallel.

In a world where --suppress-size works perfectly in that you don't have to pay the overhead of double recursion (one for the file-system and one for the virtual Tree), I think we could just throw parallelism out the window and build the Tree in a single-threaded context i.e. use Walk instead of WalkParallel.

This of course is conjecture and I haven't had too much time to consider this super thoughtfully, but this is on my radar. Thanks a bunch for taking an interest in the project and making an issue out of this! I plan to do weekly minor releases for the foreseeable future and depending on how busy I am with work the fix for this will either be out this Sunday or the next. Let me know if you have any thoughts :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants