bug: strip_tool_version_comments could be faster for a large amount of comments #1195

blopker · 2022-04-06T01:09:19Z

Describe the Bug

I'm not sure if this is a bug or feature, but I noticed the strip_tool_version_comments performance could be improved.

The current code is:

strip_tool_version_comments() {
  local tool_version_path="$1"

  while IFS= read -r tool_line || [ -n "$tool_line" ]; do
    # Remove whitespace before pound sign, the pound sign, and everything after it
    new_line="$(cut -f1 -d"#" <<<"$tool_line" | sed -e 's/[[:space:]]*$//')"

    # Only print the line if it is not empty
    if [[ -n "$new_line" ]]; then
      printf "%s\\n" "$new_line"
    fi
  done <"$tool_version_path"
}

This iterates over every line in .tool-versions, then shells out to cut and sed for each line. I looked around a bit, and it seems like this loop can be replaced with one call to sed for the whole file (found here: https://unix.stackexchange.com/a/157619, which has a good breakdown of what the command is doing). The alternate version is:

strip_tool_version_comments() {
  local tool_version_path="$1"
  sed '/^[[:blank:]]*#/d;s/#.*//' "$tool_version_path"
}

This should handle lines that are only comments and comments at the end of tool version lines, like the original code. As for performance, I tested with a .tool-versions file with 50 lines of comments (excessive, but not unrealistic) and the timing worked like:

Before change: 0.22s user 0.69s system 120% cpu 0.762 total
After change: 0.07s user 0.21s system 99% cpu 0.276 total

After a few more tests, it looks like the original code slows a bit with every line added, where the 1 sed version stays constant. The proposed version is even around 10ms faster for a .tool-versions with only one line in it.

If this change seems interesting I can go ahead and make a PR. Let me know!

Steps to Reproduce

Create a .tool-versions file with 50 lines of comments.
Run time asdf current.
See that it takes ~200ms.

Expected Behaviour

It should not take any more time than a file with no comments.

Actual Behaviour

It gets slower with every comment in the file.

Environment

OS:
Darwin blopker-pro 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 arm64

SHELL:
zsh 5.8 (x86_64-apple-darwin21.0)

ASDF VERSION:
v0.9.0

ASDF ENVIRONMENT VARIABLES:
ASDF_DIR=/opt/homebrew/opt/asdf/libexec

ASDF INSTALLED PLUGINS:
bats                         https://github.com/timgluz/asdf-bats.git main 299551f
nodejs                       https://github.com/asdf-vm/asdf-nodejs.git master 1bc55cb
python                       https://github.com/danhper/asdf-python.git master 57a4d72
shellcheck                   https://github.com/luizm/asdf-shellcheck.git master 31cb328
shfmt                        https://github.com/luizm/asdf-shfmt.git master f893252



### asdf plugins affected (if relevant)

_No response_

The text was updated successfully, but these errors were encountered:

pedropombeiro · 2022-04-06T17:07:07Z

I did a test on the GitLab Development Kit and here are the findings (on a MacBook Pro M1 Max):

A 3.5x performance increase. Well done!

blopker · 2022-04-06T18:17:32Z

Very cool. @Stratus3D, is this something you'd like a PR for?

…line for performance.

blopker · 2022-04-07T22:44:14Z

I put up a PR for this at #1198. I changed this sed call to make it pass all the tests, and to make it a bit more robust.

Stratus3D · 2022-04-08T14:21:12Z

@blopker yes! Catching up on the discussion around this. I see your PR and am reviewing it now.

Stratus3D · 2022-04-08T14:42:05Z

Great work! On my machine (and with @pedropombeiro well-annotated .tool-versions file), I see a speedup of about 4x with your PR @blopker ! Take a look at these before and after flame graphs:

Before:

After:

Faster and simpler. I don't use comments in my .tool-versions file, so I never would have caught this myself. However even without comments in my simple .tool-versions file I see this speeds up asdf current by about 40%!

Stratus3D · 2022-04-08T14:44:28Z

Closing as this has been fixed by #1198

blopker · 2022-04-08T16:15:36Z

Great, thanks for the merge!

Glad I was able to help out a bit. Sometimes these performance things only reveal themselves under specific conditions. I personally never had any performance issues with asdf.

Nice flamegraph too, are you using perf and speedscope? Might be cool to add how to do performance profiling to the dev docs (unless I missed it and it's already there).

Stratus3D · 2022-04-08T17:35:27Z

@blopker it's just a simple Python script I've got that takes the output of Bash's set -x option and generates the data needed for Brendan Gregg's format that is accepted by speedscope. Then I just open the generated file with speedscope. This is not in the docs as it's somewhat hacky and not perfect, but it's good enough for some high level insight. I guess I could open source the script at some point.

blopker · 2022-04-09T08:47:15Z

That actually sounds more interesting that you're letting on. I've looked for a solution to profile bash and there's no obvious answer, or really any answer at all.

I vote you put it out there!

Stratus3D · 2022-04-11T21:24:18Z

So off the top of my head, there are some limitations to the script:

It can only profile execution time of each expression in the script (memory and other things aren't supported)
It requires manually adding set -x to the script you want to profile
The script reads from STDIN (a pipe) so anything buffering output affects the accuracy of the script (buffering can often be disabled)
It doesn't understand Bash functions (flamegraphs generated display expressions, not Bash function calls)

So overall this script is just a hack.

Still interested? If so, got any suggestions for what I should name this script/repo?

blopker · 2022-04-11T22:09:36Z

My philosophy is that a bad measurement is better than no measurement (not that this is a bad measurement). We have to start somewhere. I'd like to see what it can do! Maybe you can post it to a gist instead of creating a whole repo for now?

Possible names:

flashgraph (flame + bash)
shellflame/shellgraph
setex (because you have to set -x)
ishum (god of fire who was the brother of the sun god Shamash)
Or maybe just "Bash Flamegraph Profiler" :)

Stratus3D · 2022-04-14T23:19:29Z

@blopker I'm busy trying to get a new release of asdf tagged, but after that I am planning on creating a repo with the code. I'll keep you posted.

blopker added the bug label Apr 6, 2022

blopker changed the title ~~bug: strip_tool_version_comments could be faster for large amount of comments~~ bug: strip_tool_version_comments could be faster for a large amount of comments Apr 6, 2022

blopker mentioned this issue Apr 6, 2022

asdf shims adding too much time to use them in a shell prompt #290

Open

blopker added a commit to blopker/asdf that referenced this issue Apr 7, 2022

Fix asdf-vm#1195: remove comments from whole file instead of line by …

2c0858f

…line for performance.

blopker mentioned this issue Apr 7, 2022

fix: remove comments from whole file instead of line by line for performance #1198

Merged

Stratus3D closed this as completed Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: strip_tool_version_comments could be faster for a large amount of comments #1195

bug: strip_tool_version_comments could be faster for a large amount of comments #1195

blopker commented Apr 6, 2022 •

edited

pedropombeiro commented Apr 6, 2022

blopker commented Apr 6, 2022

blopker commented Apr 7, 2022

Stratus3D commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

blopker commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

blopker commented Apr 9, 2022

Stratus3D commented Apr 11, 2022

blopker commented Apr 11, 2022

Stratus3D commented Apr 14, 2022

bug: strip_tool_version_comments could be faster for a large amount of comments #1195

bug: strip_tool_version_comments could be faster for a large amount of comments #1195

Comments

blopker commented Apr 6, 2022 • edited

Describe the Bug

Steps to Reproduce

Expected Behaviour

Actual Behaviour

Environment

pedropombeiro commented Apr 6, 2022

blopker commented Apr 6, 2022

blopker commented Apr 7, 2022

Stratus3D commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

blopker commented Apr 8, 2022

Stratus3D commented Apr 8, 2022

blopker commented Apr 9, 2022

Stratus3D commented Apr 11, 2022

blopker commented Apr 11, 2022

Stratus3D commented Apr 14, 2022

blopker commented Apr 6, 2022 •

edited