Enhance filter parser #41

yhabteab · 2023-04-27T16:34:23Z

After all the testing we did last week, it turns out that the current implementation of the filter Parser is very buggy and sometimes crashes with very simple filter strings. This PR solves this by using goyacc generated Parser. This is similar to how we parse Icinga2 DSL with flex/bison. The grammar rules can be found in the parser.y file and you can find the Lexer and other parsing utils in the lexer.go file.

For local testing/debugging purposes you can re-build the parser with the following command.

go env -w GOBIN=/Users/yhabteab/go/bin
go install golang.org/x/tools/cmd/goyacc@latest
go generate './...'

This will also generate a parser.output file which contains all the parser states and is useful to resolve some shift/reduce conflicts.

julianbrost · 2023-05-02T08:58:28Z

However, I haven't pushed the auto-generated parser yet because it involves a lot of code changes that don't need to be reviewed, but when this PR is done, we can think about merging this parser here or auto-generating it every time when building noma, basically just like the Icinga2 *.ti files.

Have a look at go generate. Also, common practice in Go is to commit these files to the repo so that you don't have to bother with the these dependencies unless you actually want to change the parser.

yhabteab · 2023-05-03T10:01:54Z

fuzz: elapsed: 3m30s, execs: 1483038 (7807/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m33s, execs: 1507525 (8160/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m36s, execs: 1530168 (7550/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m39s, execs: 1553639 (7821/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m42s, execs: 1576623 (7661/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m45s, execs: 1595193 (6190/sec), new interesting: 95 (total: 1559)
fuzz: elapsed: 3m48s, execs: 1609919 (4910/sec), new interesting: 97 (total: 1561)
fuzz: elapsed: 3m51s, execs: 1630670 (6917/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m54s, execs: 1648763 (6029/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m57s, execs: 1667104 (6113/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m0s, execs: 1685923 (6255/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m3s, execs: 1704380 (6154/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m6s, execs: 1723764 (6479/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m9s, execs: 1744018 (6752/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m12s, execs: 1758080 (5418/sec), new interesting: 100 (total: 1564)
--- PASS: FuzzParser (251.66s)
PASS
ok  	github.com/icinga/noma/internal/filter	251.870s

oxzi

I just went a bit over the code and needed to educate myself about yacc again. Feel free to argue on each of my comments.

internal/filter/lexer.go

internal/filter/parser_test.go

oxzi · 2023-10-27T13:19:01Z

internal/filter/lexer.go

+	lex.Scanner.Error = lex.ScanError
+
+	// Enable parsers error verbose to get more context of the parsing failures
+	yyErrorVerbose = true


Turns out, this line resp. writing this internal variable results in data races.
As the tests are set to run in parallel, this Parse function is called concurrently and there are racing writes on yyErrorVerbose.

How bad is this? In this specific context, I wouldn't be really concerned but this can become worse later on.
As this is more off an global variable configuration, I would suggest setting this in an init function or check with goyacc directly?

================== WARNING: DATA RACE Write at 0x0000009e228f by goroutine 9: github.com/icinga/icinga-notifications/internal/filter.Parse() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424 github.com/icinga/icinga-notifications/internal/filter.TestParser.func2() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:86 +0x4f testing.tRunner() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238 testing.(*T).Run.func1() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44 Previous write at 0x0000009e228f by goroutine 8: github.com/icinga/icinga-notifications/internal/filter.Parse() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424 github.com/icinga/icinga-notifications/internal/filter.TestParser.func1() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:16 +0x44 testing.tRunner() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238 testing.(*T).Run.func1() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44 Goroutine 9 (running) created at: testing.(*T).Run() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a github.com/icinga/icinga-notifications/internal/filter.TestParser() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:83 +0x68 testing.tRunner() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238 testing.(*T).Run.func1() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44 Goroutine 8 (running) created at: testing.(*T).Run() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a github.com/icinga/icinga-notifications/internal/filter.TestParser() /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:13 +0x4b testing.tRunner() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238 testing.(*T).Run.func1() /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44 ==================

No idea 🤷 how to set this using goyacc, but for now I will just use init() as discussed offline.

internal/filter/lexer.go

internal/filter/parser.y

internal/filter/types.go

internal/filter/lexer.go

julianbrost · 2023-10-31T13:12:48Z

internal/filter/lexer.go

+// identifiersMatcher contains a compiled regexp and is used by the Lexer to match filter identifiers.
+// Currently, it allows to match any character except a LogicalOp and CompOperator.
+var identifiersMatcher = regexp.MustCompile("[^!&|~<>=()]")


I think it would make sense to work with a positive list here. Doing it like this, there is only very limited room for expansion. For example, if the regex would have been [^!&|<>=()] before (missing the ~), adding ~ there would have made previously valid identifiers invalid. Yes, this means that more characters have to be escaped/urlencoded, but I don't think that this would be bad, it would just have to be consistent with what the web module writes into the database.

I don't really understand on why don't doing it the way it is, and I don't want to list all the valid chars because I don't know them all.

julianbrost · 2023-10-31T13:43:56Z

internal/filter/lexer.go

+// Lexer is used to tokenize the filter input into a set literals.
+// This is just a wrapper around the Scanner type and implements the yyLexer interface used by the parser.
+type Lexer struct {
+	scanner.Scanner


I'm not fully understanding this yet, but https://pkg.go.dev/text/scanner says:

By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different identifier and white space characters.

I don't see any such customization, which sounds like this does way more than it has to and even should do.

Go white spaces are just like in PHP and the IPL parser also skips all these characters, just like this lexer does, so I won't do anything in this regard. For other customizations, see the inline comments in the Parse() function.

julianbrost · 2023-10-31T14:02:20Z

internal/filter/parser.y

+filter_chain: conditions_expr logical_op maybe_negated_condition_expr
+	{
+		$$ = reduceFilter($2, $1, $3)
+	}
+	| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP
+	;


Why is there an asymmetry in the operands regarding possible negation?

Ah, I got confused by the very similar names conditions_expr and condition_expr.

If you have a better name that fits this use case, I will accept it.

julianbrost · 2023-10-31T14:03:06Z

internal/filter/parser.y

+filter_chain: conditions_expr logical_op maybe_negated_condition_expr
+	{
+		$$ = reduceFilter($2, $1, $3)
+	}
+	| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP
+	;
+
+conditions_expr: maybe_negated_condition_expr logical_op maybe_negated_condition_expr
+	{
+		$$ = reduceFilter($2, $1, $3)
+	}
+	| maybe_negated_condition_expr %prec PREFER_SHIFTING_LOGICAL_OP
+	;


Am I missing something or does this limit a filter_chain to at most 3 condition_expr?

Yes, it does, which limits filter_rule to 6 condition_expr, so parsing a & b & c & d & e & f & g fails (spaces for readability, parsing fails with and without them).

Yes, I noticed that last time too, but didn't have time to push all the fixes.

Should be fixed with 5484f91

By default, the scanner mode is set to `scanner.GoTokens` which contains way more flags than are actually needed. This commit now sets this to only recognize identifiers and `scanner#Scan()` will return all other unrecognized tokens as they are.

cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Apr 27, 2023

yhabteab force-pushed the enhanced-filter-parser branch 3 times, most recently from 941daff to 080e335 Compare May 2, 2023 07:50

yhabteab changed the title ~~[Draft]: Enhance filter parser~~ Enhance filter parser May 2, 2023

yhabteab requested a review from julianbrost May 2, 2023 08:12

yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from 2337de5 to 176b442 Compare May 3, 2023 10:00

yhabteab force-pushed the enhanced-filter-parser branch 5 times, most recently from 089cfc5 to 877a020 Compare May 3, 2023 16:32

yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 6cbb497 to 67991f2 Compare October 16, 2023 06:51

yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 8b2c4eb to 3603283 Compare October 26, 2023 14:52

julianbrost mentioned this pull request Oct 27, 2023

Trigger time based escalations #68

Merged

oxzi reviewed Oct 27, 2023

View reviewed changes

yhabteab force-pushed the enhanced-filter-parser branch from 3603283 to 8f8936d Compare October 27, 2023 15:40

julianbrost reviewed Oct 31, 2023

View reviewed changes

yhabteab force-pushed the enhanced-filter-parser branch from 8f8936d to ff0ab0c Compare November 6, 2023 12:02

yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from eebcabd to 1ccb78e Compare November 20, 2023 15:32

yhabteab force-pushed the enhanced-filter-parser branch from 1ccb78e to e5cd4e2 Compare December 1, 2023 09:02

yhabteab and others added 22 commits April 15, 2024 16:28

Introduce filter Chain helper methods

e82add0

tests: Use ~ as an operator for wildcald matching

6ec0579

Introduce NewChain() & NewCondition() funcs

130eafb

spew.Sdump() filter & add some assertions in FuzzParser()

6ab031e

Introduce goyacc filter parsing grammer rules

dd4cbfc

Introduce Lexer type and other parsing utils

d8a6b45

Add extended filter/parser tests & adjust expected error messages

3a4780e

Add auto-generated goyacc filter Parser

fc30a9c

Set yylex.rule only once

22b5f0c

Reset Lexer#rule on error & use init() for initializing global confg

16f9bdc

Name regex variable after its usage

447f4f1

Fix accessing to non-existent logical operators

4d9c8c7

Set the scanner mode to ScanIdents

bc0f570

By default, the scanner mode is set to `scanner.GoTokens` which contains way more flags than are actually needed. This commit now sets this to only recognize identifiers and `scanner#Scan()` will return all other unrecognized tokens as they are.

Drop superfluous T_STRING token

789b2bc

Add test cases that triggers scan error

a57004d

Url decode identifiers in place & cleanup some type/token defs

68102c1

Resolve filter_rule in brackets instead of filter_chain

3f4f97a

Provide token friendly names in err messages

00254a4

Fully & correctly resolve complex filter strings

c49ac18

Add .gitignore file

3cb7989

Add test case for all the filter evaluation methods

9ff22cd

Fix mixed tab & space indentention

98471a8

yhabteab force-pushed the enhanced-filter-parser branch from e5cd4e2 to 98471a8 Compare April 16, 2024 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance filter parser #41

Enhance filter parser #41

yhabteab commented Apr 27, 2023 •

edited

julianbrost commented May 2, 2023

yhabteab commented May 3, 2023

oxzi left a comment

oxzi Oct 27, 2023

yhabteab Oct 27, 2023 •

edited

julianbrost Oct 31, 2023

yhabteab Nov 6, 2023

julianbrost Oct 31, 2023

yhabteab Nov 6, 2023

julianbrost Oct 31, 2023

julianbrost Nov 3, 2023

yhabteab Nov 6, 2023

julianbrost Oct 31, 2023

julianbrost Nov 3, 2023

yhabteab Nov 6, 2023

yhabteab Nov 6, 2023

Enhance filter parser #41

Are you sure you want to change the base?

Enhance filter parser #41

Conversation

yhabteab commented Apr 27, 2023 • edited

julianbrost commented May 2, 2023

yhabteab commented May 3, 2023

oxzi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhabteab Oct 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhabteab commented Apr 27, 2023 •

edited

yhabteab Oct 27, 2023 •

edited