Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: strip_tool_version_comments could be faster for a large amount of comments #1195

Closed
blopker opened this issue Apr 6, 2022 · 12 comments
Closed
Labels

Comments

@blopker
Copy link
Contributor

blopker commented Apr 6, 2022

Describe the Bug

I'm not sure if this is a bug or feature, but I noticed the strip_tool_version_comments performance could be improved.

The current code is:

strip_tool_version_comments() {
  local tool_version_path="$1"

  while IFS= read -r tool_line || [ -n "$tool_line" ]; do
    # Remove whitespace before pound sign, the pound sign, and everything after it
    new_line="$(cut -f1 -d"#" <<<"$tool_line" | sed -e 's/[[:space:]]*$//')"

    # Only print the line if it is not empty
    if [[ -n "$new_line" ]]; then
      printf "%s\\n" "$new_line"
    fi
  done <"$tool_version_path"
}

This iterates over every line in .tool-versions, then shells out to cut and sed for each line. I looked around a bit, and it seems like this loop can be replaced with one call to sed for the whole file (found here: https://unix.stackexchange.com/a/157619, which has a good breakdown of what the command is doing). The alternate version is:

strip_tool_version_comments() {
  local tool_version_path="$1"
  sed '/^[[:blank:]]*#/d;s/#.*//' "$tool_version_path"
}

This should handle lines that are only comments and comments at the end of tool version lines, like the original code. As for performance, I tested with a .tool-versions file with 50 lines of comments (excessive, but not unrealistic) and the timing worked like:

Before change: 0.22s user 0.69s system 120% cpu 0.762 total
After change: 0.07s user 0.21s system 99% cpu 0.276 total

After a few more tests, it looks like the original code slows a bit with every line added, where the 1 sed version stays constant. The proposed version is even around 10ms faster for a .tool-versions with only one line in it.

If this change seems interesting I can go ahead and make a PR. Let me know!

Steps to Reproduce

  1. Create a .tool-versions file with 50 lines of comments.
  2. Run time asdf current.
  3. See that it takes ~200ms.

Expected Behaviour

It should not take any more time than a file with no comments.

Actual Behaviour

It gets slower with every comment in the file.

Environment

OS:
Darwin blopker-pro 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 arm64

SHELL:
zsh 5.8 (x86_64-apple-darwin21.0)

ASDF VERSION:
v0.9.0

ASDF ENVIRONMENT VARIABLES:
ASDF_DIR=/opt/homebrew/opt/asdf/libexec

ASDF INSTALLED PLUGINS:
bats                         https://github.com/timgluz/asdf-bats.git main 299551f
nodejs                       https://github.com/asdf-vm/asdf-nodejs.git master 1bc55cb
python                       https://github.com/danhper/asdf-python.git master 57a4d72
shellcheck                   https://github.com/luizm/asdf-shellcheck.git master 31cb328
shfmt                        https://github.com/luizm/asdf-shfmt.git master f893252


### asdf plugins affected (if relevant)

_No response_
@blopker blopker added the bug label Apr 6, 2022
@blopker blopker changed the title bug: strip_tool_version_comments could be faster for large amount of comments bug: strip_tool_version_comments could be faster for a large amount of comments Apr 6, 2022
@pedropombeiro
Copy link

I did a test on the GitLab Development Kit and here are the findings (on a MacBook Pro M1 Max):

2022-04-06 at 19 05

A 3.5x performance increase. Well done!

@blopker
Copy link
Contributor Author

blopker commented Apr 6, 2022

Very cool. @Stratus3D, is this something you'd like a PR for?

@blopker
Copy link
Contributor Author

blopker commented Apr 7, 2022

I put up a PR for this at #1198. I changed this sed call to make it pass all the tests, and to make it a bit more robust.

@Stratus3D
Copy link
Member

@blopker yes! Catching up on the discussion around this. I see your PR and am reviewing it now.

@Stratus3D
Copy link
Member

Great work! On my machine (and with @pedropombeiro well-annotated .tool-versions file), I see a speedup of about 4x with your PR @blopker ! Take a look at these before and after flame graphs:

Before:

Screen Shot 2022-04-08 at 10 36 05 AM

After:

Screen Shot 2022-04-08 at 10 36 18 AM

Faster and simpler. I don't use comments in my .tool-versions file, so I never would have caught this myself. However even without comments in my simple .tool-versions file I see this speeds up asdf current by about 40%!

@Stratus3D
Copy link
Member

Closing as this has been fixed by #1198

@blopker
Copy link
Contributor Author

blopker commented Apr 8, 2022

Great, thanks for the merge!

Glad I was able to help out a bit. Sometimes these performance things only reveal themselves under specific conditions. I personally never had any performance issues with asdf.

Nice flamegraph too, are you using perf and speedscope? Might be cool to add how to do performance profiling to the dev docs (unless I missed it and it's already there).

@Stratus3D
Copy link
Member

@blopker it's just a simple Python script I've got that takes the output of Bash's set -x option and generates the data needed for Brendan Gregg's format that is accepted by speedscope. Then I just open the generated file with speedscope. This is not in the docs as it's somewhat hacky and not perfect, but it's good enough for some high level insight. I guess I could open source the script at some point.

@blopker
Copy link
Contributor Author

blopker commented Apr 9, 2022

That actually sounds more interesting that you're letting on. I've looked for a solution to profile bash and there's no obvious answer, or really any answer at all.

I vote you put it out there!

@Stratus3D
Copy link
Member

So off the top of my head, there are some limitations to the script:

  • It can only profile execution time of each expression in the script (memory and other things aren't supported)
  • It requires manually adding set -x to the script you want to profile
  • The script reads from STDIN (a pipe) so anything buffering output affects the accuracy of the script (buffering can often be disabled)
  • It doesn't understand Bash functions (flamegraphs generated display expressions, not Bash function calls)

So overall this script is just a hack.

Still interested? If so, got any suggestions for what I should name this script/repo?

@blopker
Copy link
Contributor Author

blopker commented Apr 11, 2022

My philosophy is that a bad measurement is better than no measurement (not that this is a bad measurement). We have to start somewhere. I'd like to see what it can do! Maybe you can post it to a gist instead of creating a whole repo for now?

Possible names:

  • flashgraph (flame + bash)
  • shellflame/shellgraph
  • setex (because you have to set -x)
  • ishum (god of fire who was the brother of the sun god Shamash)
  • Or maybe just "Bash Flamegraph Profiler" :)

@Stratus3D
Copy link
Member

@blopker I'm busy trying to get a new release of asdf tagged, but after that I am planning on creating a repo with the code. I'll keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants