New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance filter parser #41
base: main
Are you sure you want to change the base?
Conversation
941daff
to
080e335
Compare
Have a look at |
2337de5
to
176b442
Compare
fuzz: elapsed: 3m30s, execs: 1483038 (7807/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m33s, execs: 1507525 (8160/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m36s, execs: 1530168 (7550/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m39s, execs: 1553639 (7821/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m42s, execs: 1576623 (7661/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m45s, execs: 1595193 (6190/sec), new interesting: 95 (total: 1559)
fuzz: elapsed: 3m48s, execs: 1609919 (4910/sec), new interesting: 97 (total: 1561)
fuzz: elapsed: 3m51s, execs: 1630670 (6917/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m54s, execs: 1648763 (6029/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m57s, execs: 1667104 (6113/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m0s, execs: 1685923 (6255/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m3s, execs: 1704380 (6154/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m6s, execs: 1723764 (6479/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m9s, execs: 1744018 (6752/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m12s, execs: 1758080 (5418/sec), new interesting: 100 (total: 1564)
--- PASS: FuzzParser (251.66s)
PASS
ok github.com/icinga/noma/internal/filter 251.870s |
089cfc5
to
877a020
Compare
6cbb497
to
67991f2
Compare
8b2c4eb
to
3603283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went a bit over the code and needed to educate myself about yacc again. Feel free to argue on each of my comments.
internal/filter/lexer.go
Outdated
lex.Scanner.Error = lex.ScanError | ||
|
||
// Enable parsers error verbose to get more context of the parsing failures | ||
yyErrorVerbose = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, this line resp. writing this internal variable results in data races.
As the tests are set to run in parallel, this Parse
function is called concurrently and there are racing writes on yyErrorVerbose
.
How bad is this? In this specific context, I wouldn't be really concerned but this can become worse later on.
As this is more off an global variable configuration, I would suggest setting this in an init
function or check with goyacc
directly?
==================
WARNING: DATA RACE
Write at 0x0000009e228f by goroutine 9:
github.com/icinga/icinga-notifications/internal/filter.Parse()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
github.com/icinga/icinga-notifications/internal/filter.TestParser.func2()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:86 +0x4f
testing.tRunner()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
Previous write at 0x0000009e228f by goroutine 8:
github.com/icinga/icinga-notifications/internal/filter.Parse()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
github.com/icinga/icinga-notifications/internal/filter.TestParser.func1()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:16 +0x44
testing.tRunner()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
Goroutine 9 (running) created at:
testing.(*T).Run()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
github.com/icinga/icinga-notifications/internal/filter.TestParser()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:83 +0x68
testing.tRunner()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
Goroutine 8 (running) created at:
testing.(*T).Run()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
github.com/icinga/icinga-notifications/internal/filter.TestParser()
/home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:13 +0x4b
testing.tRunner()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
testing.(*T).Run.func1()
/nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
==================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea 🤷 how to set this using goyacc
, but for now I will just use init()
as discussed offline.
3603283
to
8f8936d
Compare
// identifiersMatcher contains a compiled regexp and is used by the Lexer to match filter identifiers. | ||
// Currently, it allows to match any character except a LogicalOp and CompOperator. | ||
var identifiersMatcher = regexp.MustCompile("[^!&|~<>=()]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to work with a positive list here. Doing it like this, there is only very limited room for expansion. For example, if the regex would have been [^!&|<>=()]
before (missing the ~
), adding ~
there would have made previously valid identifiers invalid. Yes, this means that more characters have to be escaped/urlencoded, but I don't think that this would be bad, it would just have to be consistent with what the web module writes into the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand on why don't doing it the way it is, and I don't want to list all the valid chars because I don't know them all.
internal/filter/lexer.go
Outdated
// Lexer is used to tokenize the filter input into a set literals. | ||
// This is just a wrapper around the Scanner type and implements the yyLexer interface used by the parser. | ||
type Lexer struct { | ||
scanner.Scanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully understanding this yet, but https://pkg.go.dev/text/scanner says:
By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different identifier and white space characters.
I don't see any such customization, which sounds like this does way more than it has to and even should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go white spaces are just like in PHP and the IPL parser also skips all these characters, just like this lexer does, so I won't do anything in this regard. For other customizations, see the inline comments in the Parse()
function.
filter_chain: conditions_expr logical_op maybe_negated_condition_expr | ||
{ | ||
$$ = reduceFilter($2, $1, $3) | ||
} | ||
| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an asymmetry in the operands regarding possible negation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I got confused by the very similar names conditions_expr
and condition_expr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a better name that fits this use case, I will accept it.
filter_chain: conditions_expr logical_op maybe_negated_condition_expr | ||
{ | ||
$$ = reduceFilter($2, $1, $3) | ||
} | ||
| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP | ||
; | ||
|
||
conditions_expr: maybe_negated_condition_expr logical_op maybe_negated_condition_expr | ||
{ | ||
$$ = reduceFilter($2, $1, $3) | ||
} | ||
| maybe_negated_condition_expr %prec PREFER_SHIFTING_LOGICAL_OP | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something or does this limit a filter_chain
to at most 3 condition_expr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does, which limits filter_rule
to 6 condition_expr
, so parsing a & b & c & d & e & f & g
fails (spaces for readability, parsing fails with and without them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I noticed that last time too, but didn't have time to push all the fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with 5484f91
8f8936d
to
ff0ab0c
Compare
eebcabd
to
1ccb78e
Compare
1ccb78e
to
e5cd4e2
Compare
By default, the scanner mode is set to `scanner.GoTokens` which contains way more flags than are actually needed. This commit now sets this to only recognize identifiers and `scanner#Scan()` will return all other unrecognized tokens as they are.
e5cd4e2
to
98471a8
Compare
After all the testing we did last week, it turns out that the current implementation of the filter
Parser
is very buggy and sometimes crashes with very simple filter strings. This PR solves this by usinggoyacc
generatedParser
. This is similar to how we parse Icinga2DSL
withflex/bison
. The grammar rules can be found in theparser.y
file and you can find theLexer
and other parsing utils in thelexer.go
file.For local testing/debugging purposes you can re-build the parser with the following command.
go env -w GOBIN=/Users/yhabteab/go/bin go install golang.org/x/tools/cmd/goyacc@latest go generate './...'
This will also generate a
parser.output
file which contains all the parser states and is useful to resolve someshift/reduce
conflicts.