Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path globbing syntax is not documented #705

Open
PerMildner opened this issue Feb 20, 2024 · 2 comments
Open

Path globbing syntax is not documented #705

PerMildner opened this issue Feb 20, 2024 · 2 comments

Comments

@PerMildner
Copy link

Looking at README.md and s5cmd --help, I see no details about the glob syntax.

In particular, it seems s5cmd understands the "double star" ** syntax for matching any folder depth, but this is not mentioned in the README.md examples.

@dzuelke
Copy link

dzuelke commented Apr 13, 2024

Even a single star matches any folder depth. The asterisk is not bound by path separators:

$ s5cmd ls "s3://foo/*.txt"
2024/04/13 09:38:00               3  bar/baz.txt

The reason why ** works is because any occurrence of * is replaced by a .* regular expression (as you can see, it also supports ? to match single characters):

s5cmd/strutil/strutil.go

Lines 63 to 68 in c1c7ee3

// WildCardToRegexp converts a wildcarded expresiion to equivalent regular expression
func WildCardToRegexp(pattern string) string {
patternRegex := regexp.QuoteMeta(pattern)
patternRegex = strings.Replace(patternRegex, "\\?", ".", -1)
return strings.Replace(patternRegex, "\\*", ".*", -1)
}

And s5cmd gives the S3 API an empty delimiter, instead of /, when the URL in question contains a "*" or "?":

s5cmd/storage/url/url.go

Lines 264 to 270 in c1c7ee3

if loc := strings.IndexAny(u.Path, globCharacters); loc < 0 {
u.Delimiter = s3Separator
u.Prefix = u.Path
} else {
u.Prefix = u.Path[:loc]
u.filter = u.Path[loc:]
}

This could be enhanced so that u.Delimiter is set to / for the else branch, as well, unless the URL contains **, but I think that'd be crude and incomplete - you might have URLs with several combinations of * and ** wildcards, so it probably needs some more logic in other places.

@PerMildner
Copy link
Author

Thanks for looking at this.

I think the thing I did not see in the documentation was something that explicitly and clearly says "Even a single star matches any folder depth". Perhaps this is what Usage means by "s5cmd supports multiple-level wildcards for all S3 operations" but it is not clear.

Personally I prefer clear specification-style descriptions in --help and README.md before showing the examples, rather than just relying on the user to guess meaning from examples, but I am sure not everyone would agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants