New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go back to using fnmatch() instead of lots of regexes. #3995
base: master
Are you sure you want to change the base?
Conversation
jenkins build this please |
benchmark please |
I don't think we should do this, even if the benchmarks are promising. |
Not a lot, I would argue, as posix is quite pervasive, but I agree it would be better to avoid. It could be that regex is horrible on clang/libc++ and that is why I see so great differences, but in any case we should replace the current function here with something much faster. Maybe a simplified FreeBSD or GPLv3+ licensed implementation of fnmatch() exists? |
https://github.com/lattera/freebsd/blob/master/lib/libc/gen/fnmatch.c Turned out that was a very short hop away... That at least may be a starting point. I think we can drop the flags that are particular to file and path matching, simplifying the code. |
There's not a lot of justification, but the discussion in PR #2925 does suggest that portability is the main reason for using regular expressions here, especially for downstream clients which need to support Windows–e.g., ResInsight.
Sure, but I'd prefer that we look closer into what's incurring undue costs in the current implementation. If it is mainly the construction of the for (const auto& e : collection) {
if (shmatch(e, patt)) {
use(e);
}
} into something along the lines of const shmatcher m { patt };
for (const auto& e : collection) {
if (m.matches(e)) {
use(e);
}
} instead. In other words, to construct the pattern/matcher once instead of on each call. |
Benchmark result overview:
View result details @ https://www.ytelses.com/opm/?page=result&id=2416 |
The benchmark were not properly executed, any clue @blattms ? |
#include <chrono>
#include <cstring>
#include <iostream>
#include <regex>
#include <string>
#include <fnmatch.h>
bool shmatch(const std::string& pattern, const std::string& symbol)
{
// Shell patterns should implicitly be interpreted as anchored at beginning
// and end.
std::string re_pattern = "^" + pattern + "$";
{
// Shell wildcard '*' should be regular expression arbitrary character '.'
// repeated arbitrary number of times '*'
std::regex re("\\*");
re_pattern = std::regex_replace(re_pattern, re, ".*");
}
{
// Shell wildcard '?' should be regular expression one arbitrary character
// '.'
std::regex re("\\?");
re_pattern = std::regex_replace(re_pattern, re, ".");
}
std::regex regexp(re_pattern);
return std::regex_search(symbol, regexp);
}
struct shmatcher
{
shmatcher(const std::string& pattern)
{
// Shell patterns should implicitly be interpreted as anchored at beginning
// and end.
std::string re_pattern = "^" + pattern + "$";
{
// Shell wildcard '*' should be regular expression arbitrary character '.'
// repeated arbitrary number of times '*'
std::regex re("\\*");
re_pattern = std::regex_replace(re_pattern, re, ".*");
}
{
// Shell wildcard '?' should be regular expression one arbitrary character
// '.'
std::regex re("\\?");
re_pattern = std::regex_replace(re_pattern, re, ".");
}
regex_ = std::regex(re_pattern);
}
bool match(const std::string& symbol)
{
return std::regex_search(symbol, regex_);
}
std::regex regex_;
};
void naive_regex(int num_iter)
{
const auto startTime = std::chrono::system_clock::now();
for (int i = 0; i < num_iter; ++i) {
shmatch("NAME.*", "NAME");
}
std::cout << "Naive regex: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count() / 1000.0 << std::endl;
}
void regex_struct(int num_iter)
{
shmatcher matcher{"NAME.*"};
const auto startTime = std::chrono::system_clock::now();
for (int i = 0; i < num_iter; ++i) {
matcher.match("NAME");
}
std::cout << "Struct regex: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count() / 1000.0 << std::endl;
}
void use_fnmatch(int num_iter)
{
const auto startTime = std::chrono::system_clock::now();
int count = 0;
for (int i = 0; i < num_iter; ++i) {
count += fnmatch("NAME.*", "NAME", 0);
}
std::cout << "Fnmatch: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count() / 1000.0 << " (" << count << ")" << std::endl;
}
int main(int argc, char** argv)
{
if (argc < 2) {
std::cerr << "need number of iterations" << std::endl;
return 1;
}
int num_iter = std::atoi(argv[1]);
naive_regex(num_iter);
regex_struct(num_iter);
use_fnmatch(num_iter);
return 0;
} ./shmatch 50000
So approximately reduced from 100x to 10x. |
This cuts a lot of time off Norne for me, and is noticable even on SPE9.
Norne summaries:
Ignoring random variations in runtimes there is still a significant difference to pre/post.