Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify anything-but #32

Open
timbray opened this issue Sep 4, 2022 · 2 comments
Open

Simplify anything-but #32

timbray opened this issue Sep 4, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@timbray
Copy link
Collaborator

timbray commented Sep 4, 2022

What is your idea?

The redesign of ByteMachine that came in with wildcards can efficiently represent byte-range transitions. Given that, anything-but and anything-but-prefix can be directly represented in a ByteMachine straightforwardly and efficiently, which would simplify the rule-matching code path.

Would you be willing to make the change?

Maybe

@timbray timbray added the enhancement New feature or request label Sep 4, 2022
@schenksj
Copy link
Contributor

schenksj commented Jan 29, 2023

@timbray Do you still see this as an opportunity? The anything-but-prefix (and my proposal for anything-but-suffix) matching seems to simply take advantage of the StateMachine transitions (

for (ByteMatch match : nextTrans.getMatches()) {
switch (match.getPattern().type()) {
case EXACT:
case EQUALS_IGNORE_CASE:
case WILDCARD:
if (valIndex == (val.length - 1)) {
transitionTo.add(match.getNextNameState());
}
break;
case NUMERIC_EQ:
// only matches at last character
if (fieldValueIsNumeric && valIndex == (val.length - 1)) {
transitionTo.add(match.getNextNameState());
}
break;
case PREFIX:
transitionTo.add(match.getNextNameState());
break;
case SUFFIX:
case EXISTS:
// we already harvested these matches via separate functions due to special matching
// requirements, so just ignore them here.
break;
case NUMERIC_RANGE:
// as soon as you see the match, you've matched
Range range = (Range) match.getPattern();
if ((fieldValueIsNumeric && !range.isCIDR) || (!fieldValueIsNumeric && range.isCIDR)) {
transitionTo.add(match.getNextNameState());
}
break;
case ANYTHING_BUT:
AnythingBut anythingBut = (AnythingBut) match.getPattern();
// only applies if at last character
if (valIndex == (val.length - 1) && anythingBut.isNumeric() == fieldValueIsNumeric) {
failedAnythingButs.add(match.getNextNameState());
}
break;
case ANYTHING_BUT_PREFIX:
failedAnythingButs.add(match.getNextNameState());
break;
), and if they match the get tagged as failed.

@timbray
Copy link
Collaborator Author

timbray commented Jan 29, 2023

Absolutely. The way anything-but and suffix matching is done are sort of hacky (I can say that because anything-but is my own gross hack) and I suspect quite a bit of code could be discarded if someone were willing to buckle down and replace the hacks with principled automaton-building. I'm pretty sure performance would improve too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants