-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modern/extended POSIX-compliant SemVer RegEx (for bash) #981
Comments
I see the website code is hosted in this repo as well. If there's interest, I'll happily turn this into a PR to add a third regular expression at the bottom of the page. |
You will find a test string here: https://regex101.com/r/vkijKf/1/ How does your regex perform against the valid/invalid data? |
Oh right, sorry I forgot to mention this. It passes the tests, here's a sample code to run for anyone interested: #!/usr/bin/env bash
# Regex for a semver digit
D='0|[1-9][0-9]*'
# Regex for a semver pre-release word
PW='[0-9]*[a-zA-Z-][0-9a-zA-Z-]*'
# Regex for a semver build-metadata word
MW='[0-9a-zA-Z-]+'
declare -a MUST_MATCH=("0.0.4" "1.2.3" "10.20.30" "1.1.2-prerelease+meta"
"1.1.2+meta" "1.1.2+meta-valid" "1.0.0-alpha" "1.0.0-beta" "1.0.0-alpha.beta"
"1.0.0-alpha.beta.1" "1.0.0-alpha.1" "1.0.0-alpha0.valid" "1.0.0-alpha.0valid"
"1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay" "1.0.0-rc.1+build.1"
"2.0.0-rc.1+build.123" "1.2.3-beta" "10.2.3-DEV-SNAPSHOT" "1.2.3-SNAPSHOT-123"
"1.0.0" "2.0.0" "1.1.7" "2.0.0+build.1848" "2.0.1-alpha.1227" "1.0.0-alpha+beta"
"1.2.3----RC-SNAPSHOT.12.9.1--.12+788" "1.2.3----R-S.12.9.1--.12+meta"
"1.2.3----RC-SNAPSHOT.12.9.1--.12" "1.0.0+0.build.1-rc.10000aaa-kk-0.1"
"99999999999999999999999.999999999999999999.99999999999999999"
"1.0.0-0A.is.legal")
declare -a MUST_NOT_MATCH=("1" "1.2" "1.2.3-0123" "1.2.3-0123.0123" "1.1.2+.123"
"+invalid" "-invalid" "-invalid+invalid" "-invalid.01" "alpha" "alpha.beta"
"alpha.beta.1" "alpha.1" "alpha+beta" "alpha_beta" "alpha." "alpha.." "beta"
"1.0.0-alpha_beta" "-alpha." "1.0.0-alpha.." "1.0.0-alpha..1" "1.0.0-alpha...1"
"1.0.0-alpha....1" "1.0.0-alpha.....1" "1.0.0-alpha......1" "1.0.0-alpha.......1"
"01.1.1" "1.01.1" "1.1.01" "1.2.3.DEV" "1.2-SNAPSHOT"
"1.2.31.2.3----RC-SNAPSHOT.12.09.1--..12+788" "1.2-RC-SNAPSHOT" "-1.0.3-gamma+b7718"
"+justmeta" "9.8.7+meta+meta" "9.8.7-whatever+meta+meta"
"99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12")
function _fatal {
echo -e "\e[31mFATAL\e[0m $@"
exit 1
}
function _ok {
echo -e "\e[32m OK\e[0m $@"
}
echo ">> Testing valid version numbers <<"
for var in "${MUST_MATCH[@]}"; do
if [[ "$var" =~ ^($D)\.($D)\.($D)(-(($D|$PW)(\.($D|$PW))*))?(\+($MW(\.$MW)*))?$ ]]; then
MAJOR="${BASH_REMATCH[1]}"
MINOR="${BASH_REMATCH[2]:-""}"
PATCH="${BASH_REMATCH[3]:-""}"
PRE_RELEASE="${BASH_REMATCH[5]:-""}"
BUILD_METADATA="${BASH_REMATCH[10]:-""}"
_ok "$var -> ($MAJOR) ($MINOR) ($PATCH) ($PRE_RELEASE) ($BUILD_METADATA)"
else
_fatal "regex didn't match '$var'"
fi
done
echo ""
echo ">> Testing invalid version numbers <<"
for var in "${MUST_NOT_MATCH[@]}"; do
if [[ "$var" =~ ^($D)\.($D)\.($D)(-(($D|$PW)(\.($D|$PW))*))?(\+($MW(\.$MW)*))?$ ]]; then
_fatal "regex matched '$var'"
else
_ok "'$var' recognized as invalid"
fi
done
echo ""
_ok "All tests passed"
exit 0 and here's the output:
|
You made my day! I was going nuts yesterday trying to make it work in bash 😆 |
Great job, thanks a lot! I have one question, could you please explain why:
Thank you |
@jwdonahue Is this good to go? |
I think it all looks great, and on behalf of bash coders everywhere, thank you for the effort! My bash foo is weak and it's really not up to me (not a maintainer). Does bash process only ASCII or at least just the lower 128 code points of UTF-8? If I have made a close inspection and I don't see anything wrong with it. My main concern, as with all regex's, is whether there are any potential perf or run-away concerns wrt the bash regex implementation and this particular regex. The test data we have catches the potential issues, such as excessive back tracking, non-termination or failure to match due to timeouts, that we know about with the other two implementations and I suspect they cover that aspect for regex's in general, but like I said, my bash foo is weak. Since there do not seem to be any POSIX compatible regex test sites to share this on, I think the next step would be to put that in a dedicated github repo; with at least a short readme file, and then issue a PR here, with proposed changes to the FAQ that includes a link back to the repo. After a round or two of review of those changes, you should get the attention of the maintainers. |
Alright, the repo is here: https://github.com/har7an/bash-semver-regex Thanks for the feedback @jwdonahue ! |
Hello,
today I was writing an application for a bit of CI-infrastructure of mine that needs to handle semver numbers from applications. Since I prefer to write my CI code in plain bash, I came up with a regex that one can perform in bash to match semver. This differs slightly from the example for numbered capture groups since POSIX regex (which is what bash uses), as far as I know, has no concept of non-matching capture groups. Here's what I came up with:
The individual parts are captured as follows:
Here is the fully-expanded regex pattern:
Maybe this will save someone else an hour of playing with regular expressions. :)
The text was updated successfully, but these errors were encountered: