Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: speed improvements #1441

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
104 changes: 75 additions & 29 deletions lib/utils.bash
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,45 @@ GREP_COLORS=
ASDF_DIR=${ASDF_DIR:-''}
ASDF_DATA_DIR=${ASDF_DATA_DIR:-''}

fast_basename() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be changed to:?

local path="$1"
path=${path%/}
path=${path##*/}
printf '%s\n' "$REPLY"

The extra tests seem redundant as Bash should perform them internally.

I think it's very worth pursuing having a REPLY-like pattern as shown here to completely eliminate the subshell (I strongly prefer the variable being called REPLY, though). In my experience this gives gigantic perf wins, something similar as what it seems you have alluded to in your other comment. I'm not sure how on-board the core-team is with this, though, which is why I think more discussion is needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the additional subshells can be removed, ty for this recommendation.

Re: $REPLY, I thought that was specifically for read? Using your code sample as-is resulted in an empty output for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is for read and also select, I'm borrowing the convention (more info here). It will work if the function looks something like this:

fast_basename() {
  unset -v REPLY; REPLY=

  local path="$1"
  path=${path%/}
  path=${path##*/}
  REPLY=$path
}

And is called like this:

fast_basename "$executable_path"
local cmd="$REPLY"

The code sample I provided was for existing subshell code, not the REPLY-thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Ah, there are times like this that would likely be sped up with the elimination of the subshell.

I ran the two fast_basename types (REPLY and printf) with hyperfine on my M1 Air. I also tried this on a Debian server, but generally was unable to get hyperfine to not issue warnings about interference, no matter the combinations tried. Regardless, the timings were always incredibly close. In this example, printf had a slightly higher max, but was otherwise identical.

❯ cat fast_base*.sh
#!/usr/bin/env bash

fast_basename() {
  unset dirpath
  local dirpath="$1"
  dirpath=${dirpath%/}
  dirpath=${dirpath##*/}
  printf "%s\n" $dirpath
}

fast_basename "$1"
#!/usr/bin/env bash

fast_basename() {
  unset -v REPLY; REPLY=
  local path="$1"
  path=${path%/}
  path=${path##*/}
  REPLY=$path
}

fast_basename "$1"

❯ for f in fast_*.sh; do hyperfine -N --warmup 1000 --runs 1000 "./$f $PWD"; done
Benchmark 1: ./fast_basename_printf.sh /Users/sgarland
  Time (mean ± σ):       2.7 ms ±   0.4 ms    [User: 1.3 ms, System: 0.9 ms]
  Range (min … max):     2.5 ms …   5.2 ms    1000 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (3.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 1: ./fast_basename_reply.sh /Users/sgarland
  Time (mean ± σ):       2.7 ms ±   0.4 ms    [User: 1.3 ms, System: 0.9 ms]
  Range (min … max):     2.5 ms …   5.1 ms    1000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Re: the best practices linked, I don't understand the reasoning for some of them.

unset REPLY; REPLY=

If you unset a var (or function), it's gone. In C you may need to then initialize the empty var to a known value, but this isn't C.

unset i

If it's function-scoped, why bother unsetting it? This is always going to print a null for the first echo, and 10 for the second:

function foo() {
  echo $i
  local i=
  for _ in {0..9}; do 
    ((i=i+1))
  done
  echo $i
}

In any case, much of this is quibbling over minute differences, and I'm not that much of a stickler. I care about improving the core performance, and it's clear that you do as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For perf test, you have to include the capture of the variable as a part of the test. So I'm not sure what you're trying to measure when you use printf without a subshell. These were my results:

script.sh
#!/usr/bin/env bash

fast_basename_work() {
	local path="$1"
	path=${path%/}
	printf '%s\n' "{path##*/}"
}

slow_basename() {
	unset -v REPLY
	REPLY=$(basename "$1")
}

fast_basename() {
	unset -v REPLY
	REPLY=$(fast_basename_work "$1")
}

faster_basename() {
	unset -v REPLY; REPLY=
	local path="$1"
	path=${path%/}
	path=${path##*/}
	REPLY=$path
}

main() {
	"$1" "$PWD"
	local _result="$REPLY"
}

main "$@"
> for fn in slow_basename fast_basename faster_basename; do hyperfine -N --warmup 1000 --runs 1000 "./script.sh $fn"; done
Benchmark 1: ./script.sh slow_basename
  Time (mean ± σ):       3.6 ms ±   0.2 ms    [User: 2.0 ms, System: 1.4 ms]
  Range (min … max):     3.4 ms …   8.8 ms    1000 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 1: ./script.sh fast_basename
  Time (mean ± σ):       3.2 ms ±   0.4 ms    [User: 1.6 ms, System: 1.4 ms]
  Range (min … max):     2.9 ms …  15.1 ms    1000 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 1: ./script.sh faster_basename
  Time (mean ± σ):       2.7 ms ±   0.3 ms    [User: 1.3 ms, System: 1.2 ms]
  Range (min … max):     2.5 ms …   8.9 ms    1000 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

For your questions about my self-defined "best practices"

If you unset a var (or function), it's gone. In C you may need to then initialize the empty var to a known value, but this isn't C.

The REPLY= is to be absolutely certain that REPLY is what you think it is. For a single function, this doesn't do anything, but if you have hundreds of functions that set REPLY, this ensures that artifacts of running those other functions don't accidentally interfere with the current function.

The unset -v is necessary because we want to be absolutely sure that the variable is of the correct type - it's fundamentally more correct. See below:

script.sh
# shellcheck shell=bash

fn1() {
	unset -v REPLY
	REPLY=

	printf 'fn1: %s\n' "$(declare -p REPLY)"
	REPLY=(a b)
}

fn2() {
	# unset -v REPLY
	REPLY=

	printf 'fn2: %s\n' "$(declare -p REPLY)"
	REPLY=c
}

main() {
	fn1
	local -a result1="${REPLY[@]}"

	fn2
	local result2="$REPLY"
}

main "$@"

And I can't just do unset -v REPLY because for some reason, people like to enable errunset, and since I'm usually writing Bash libraries, I need to be compatible for the options that they wish to set.

If it's function-scoped, why bother unsetting it? This is always going to print a null for the first echo, and 10 for the second:

Yeah, it's function scoped, and I still don't want that variable to leak into the callers of functions that are called later in that function like so:

otherfn() {
  local i=
  for i in {0..1}; do :; done

  foo
}

So by doing unset -v i, the goal is to have a behavior closer to traditionally scoped variables. Bash variables (with local) are function scoped, but more importantly, they are also dynamically scoped, and I wish to account for that.

But yeah, these are minor differences, and I don't wish to prescribe all of these conventions on the ASDF codebase - just the one that actually significantly improves performance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure what you're trying to measure when you use printf without a subshell.

In the light of day, I also have no idea what I was doing there. Good catch.

And I can't just do unset -v REPLY because for some reason, people like to enable errunset, and since I'm usually writing Bash libraries, I need to be compatible for the options that they wish to set.

An excellent point I hadn't thought of. People love to set -euo pipefail because someone said it was better.

Yeah, it's function scoped, and I still don't want that variable to leak into the callers of functions that are called later in that function like so

Again, an excellent point I hadn't thought of.

I think I'm just going to defer to your recommendations. I thought I knew shell reasonably well, but clearly I've a lot more to learn. Thank you for taking the time to demonstrate your answers with scripts.

unset -v REPLY
local dirpath="$1"
dirpath=${dirpath%/}
dirpath=${dirpath##*/}
REPLY=$dirpath
}

# in order to make this able to call tr if bash v4 is unavailable,
# it requires inputting a tr-like string, e.g. tr '[:lower:]' '[:upper:]'
fast_tr() {
unset -v REPLY
inputArr=("$@")
inputStr="${inputArr[0]}"
if [ ! ${#inputArr[@]} -eq 3 ] && [ ! ${#inputArr[@]} -eq 5 ]; then
printf "%s\n" "ERROR: fast_tr expects an array of 3 or 5 elements, got ${#inputArr[@]}"
return 1
fi
# if bash >= v4, use parameter modification to upper/lower the string
if [ ${BASH_VERSINFO[0]} -gt 3 ]; then
case ${inputArr[1]} in
"[:lower:]")
REPLY="${inputStr^^}"
;;
"[:upper:]")
REPLY="${inputStr,,}"
;;
"*")
printf "%s\n" "ERROR: fast_tr() expects [:lower:] and [:upper:] as arguments"
esac
# and then use parameter substitution to replace characters in the string
# if the 4th or 5th elements are not set, the substitution has no effect
stephanGarland marked this conversation as resolved.
Show resolved Hide resolved
REPLY="${REPLY//${inputArr[3]-''}/${inputArr[4]-''}}"
else
# else use tr with the input array elements as args
printf "%s\n" "${inputStr}" | tr "${inputArr[1]}${inputArr[3]-''}" "${inputArr[2]}${inputArr[4]-''}"
fi
}

asdf_version() {
local version git_rev
version="v$(cat "$(asdf_dir)/version.txt")"
Expand Down Expand Up @@ -99,7 +138,7 @@ list_installed_versions() {
if [ -d "$plugin_installs_path" ]; then
for install in "${plugin_installs_path}"/*/; do
[[ -e "$install" ]] || break
basename "$install" | sed 's/^ref-/ref:/'
fast_basename "$install" && printf "%s\n" "$REPLY"
done
fi
}
Expand Down Expand Up @@ -190,7 +229,9 @@ find_versions() {
version=$(get_version_from_env "$plugin_name")
if [ -n "$version" ]; then
local upcase_name
upcase_name=$(printf "%s\n" "$plugin_name" | tr '[:lower:]-' '[:upper:]_')
local fast_tr_arr=("$plugin_name" "[:lower:]" "[:upper:]" "-" "_")
fast_tr "${fast_tr_arr[@]}"
upcase_name="$REPLY"
local version_env_var="ASDF_${upcase_name}_VERSION"

printf "%s\n" "$version|$version_env_var environment variable"
Expand Down Expand Up @@ -237,7 +278,9 @@ display_no_version_set() {
get_version_from_env() {
local plugin_name=$1
local upcase_name
upcase_name=$(printf "%s\n" "$plugin_name" | tr '[:lower:]-' '[:upper:]_')
local fast_tr_arr=("$plugin_name" "[:lower:]" "[:upper:]" "-" "_")
fast_tr "${fast_tr_arr[@]}"
upcase_name="$REPLY"
local version_env_var="ASDF_${upcase_name}_VERSION"
local version=${!version_env_var:-}
printf "%s\n" "$version"
Expand Down Expand Up @@ -280,7 +323,8 @@ get_custom_executable_path() {

# custom plugin hook for executable path
if [ -x "${plugin_path}/bin/exec-path" ]; then
cmd=$(basename "$executable_path")
fast_basename "$executable_path"
local cmd="$REPLY"
local relative_path
# shellcheck disable=SC2001
relative_path=$(printf "%s\n" "$executable_path" | sed -e "s|${install_path}/||")
Expand All @@ -300,7 +344,8 @@ get_executable_path() {

if [ "$version" = "system" ]; then
path=$(remove_path_from_path "$PATH" "$(asdf_data_dir)/shims")
cmd=$(basename "$executable_path")
fast_basename "$executable_path"
local cmd="$REPLY"
cmd_path=$(PATH=$path command -v "$cmd" 2>&1)
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
Expand All @@ -320,8 +365,9 @@ parse_asdf_version_file() {

if [ -f "$file_path" ]; then
local version
version=$(strip_tool_version_comments "$file_path" | grep "^${plugin_name} " | sed -e "s/^${plugin_name} //")

# this is causing multiple failures in tests/utils.bash, despite seeming to work identically
stephanGarland marked this conversation as resolved.
Show resolved Hide resolved
version=$(awk -v plugin_name="$plugin_name" '!/^#/ { gsub(/#.*/,"",$0); gsub(/ +$/,"",$0)} $1 == plugin_name {print $NF}' "$file_path")
#version=$(strip_tool_version_comments "$file_path" | grep "^${plugin_name} " | sed -e "s/^${plugin_name} //")
if [ -n "$version" ]; then
if [[ "$version" == path:* ]]; then
util_resolve_user_path "${version#path:}"
Expand Down Expand Up @@ -581,9 +627,7 @@ with_plugin_env() {
local path exec_paths
exec_paths="$(list_plugin_exec_paths "$plugin_name" "$full_version")"

# exec_paths contains a trailing newline which is converted to a colon, so no
# colon is needed between the subshell and the PATH variable in this string
path="$(tr '\n' ':' <<<"$exec_paths")$PATH"
path="${exec_paths}:$PATH"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the comment to be true in testing, and this works fine as-is. Open to proof to the contrary.


# If no custom exec-env transform, just execute callback
if [ ! -f "${plugin_path}/bin/exec-env" ]; then
Expand Down Expand Up @@ -620,10 +664,8 @@ plugin_executables() {

is_executable() {
local executable_path=$1
if [[ (-f "$executable_path") && (-x "$executable_path") ]]; then
return 0
fi
return 1
[ -x "$executable_path" ]
return $?
}

plugin_shims() {
Expand All @@ -634,11 +676,15 @@ plugin_shims() {

shim_plugin_versions() {
local executable_name
executable_name=$(basename "$1")
fast_basename "$1"
executable_name="$REPLY"
local shim_path
shim_path="$(asdf_data_dir)/shims/${executable_name}"
if [ -x "$shim_path" ]; then
grep "# asdf-plugin: " "$shim_path" 2>/dev/null | sed -e "s/# asdf-plugin: //" | uniq
# not sure why uniq was needed here since asdf doesn't let you install the same plugin version twice - maybe for legacy?
# nevertheless, the awk command handles de-duping; if it isn't needed, the sed command is ~3% faster
#sed -n "/# asdf-plugin: /s/# asdf-plugin: //p 2>/dev/null" "$shim_path"
awk '/# asdf-plugin: / { l=$(NF-1) " " $NF; !seen[$0]++; if (seen[$0] < 2) { print l } }' "$shim_path" 2>/dev/null
else
printf "asdf: unknown shim %s\n" "$executable_name"
return 1
Expand All @@ -647,11 +693,12 @@ shim_plugin_versions() {

shim_plugins() {
local executable_name
executable_name=$(basename "$1")
fast_basename "$1"
executable_name="$REPLY"
local shim_path
shim_path="$(asdf_data_dir)/shims/${executable_name}"
if [ -x "$shim_path" ]; then
grep "# asdf-plugin: " "$shim_path" 2>/dev/null | sed -e "s/# asdf-plugin: //" | cut -d' ' -f 1 | uniq
awk '/# asdf-plugin: / { l=$(NF-1) } END { print l} ' "$shim_path" 2>/dev/null
stephanGarland marked this conversation as resolved.
Show resolved Hide resolved
else
printf "asdf: unknown shim %s\n" "$executable_name"
return 1
Expand All @@ -662,12 +709,11 @@ strip_tool_version_comments() {
local tool_version_path="$1"
# Use sed to strip comments from the tool version file
# Breakdown of sed command:
# This command represents 3 steps, separated by a semi-colon (;), that run on each line.
# This command represents 2 steps, separated by a semi-colon (;), that run on each line.
# 1. Delete line if it starts with any blankspace and a #.
# 2. Find a # and delete it and everything after the #.
# 3. Remove any whitespace from the end of the line.
# 2. Find a # and delete it and everything after the #, and remove trailing whitespace.
# Finally, the command will print the lines that are not empty.
sed '/^[[:blank:]]*#/d;s/#.*//;s/[[:blank:]]*$//' "$tool_version_path"
sed '/^[[:blank:]]*#/d; s/\s*#.*$//;' "$tool_version_path"
}

asdf_run_hook() {
Expand All @@ -686,19 +732,18 @@ asdf_run_hook() {
get_shim_versions() {
shim_name=$1
shim_plugin_versions "${shim_name}"
shim_plugin_versions "${shim_name}" | cut -d' ' -f 1 | awk '{print$1" system"}'
shim_plugin_versions "${shim_name%% [0-9]*} system}"
}

preset_versions() {
shim_name=$1
shim_plugin_versions "${shim_name}" | cut -d' ' -f 1 | uniq | xargs -IPLUGIN bash -c ". $(asdf_dir)/lib/utils.bash; printf \"%s %s\n\" PLUGIN \$(get_preset_version_for PLUGIN)"
shim_plugin_versions "${shim_name%% [0-9]*}" | xargs -IPLUGIN bash -c ". $(asdf_dir)/lib/utils.bash; printf \"%s %s\n\" PLUGIN \$(get_preset_version_for PLUGIN)"
}

select_from_preset_version() {
local shim_name=$1
local shim_versions
local preset_versions

shim_versions=$(get_shim_versions "$shim_name")
if [ -n "$shim_versions" ]; then
preset_versions=$(preset_versions "$shim_name")
Expand All @@ -721,7 +766,6 @@ select_version() {

local plugins
IFS=$'\n' read -rd '' -a plugins <<<"$(shim_plugins "$shim_name")"

for plugin_name in "${plugins[@]}"; do
local version_and_path
local version_string
Expand Down Expand Up @@ -751,7 +795,8 @@ select_version() {

with_shim_executable() {
local shim_name
shim_name=$(basename "$1")
fast_basename "$1"
shim_name="$REPLY"
local shim_exec="${2}"

if [ ! -f "$(asdf_data_dir)/shims/${shim_name}" ]; then
Expand All @@ -761,7 +806,6 @@ with_shim_executable() {

local selected_version
selected_version="$(select_version "$shim_name")"

if [ -z "$selected_version" ]; then
selected_version="$(select_from_preset_version "$shim_name")"
fi
Expand Down Expand Up @@ -844,7 +888,9 @@ remove_path_from_path() {
# Output is a new string suitable for assignment to PATH
local PATH=$1
local path=$2
substitute "$PATH" "$path" "" | sed -e "s|::|:|g"
PATH=${PATH//"$path"}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an argument to be made for using the substitute call everywhere to make it more easily understandable, but at the very least, here the sed pipe is unnecessary and this can simply be modified twice.

PATH=${PATH//"::"/":"}
printf "%s" $PATH
}

# @description Strings that began with a ~ are always paths. In
Expand Down