Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance filter parser #41

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Enhance filter parser #41

wants to merge 22 commits into from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Apr 27, 2023

After all the testing we did last week, it turns out that the current implementation of the filter Parser is very buggy and sometimes crashes with very simple filter strings. This PR solves this by using goyacc generated Parser. This is similar to how we parse Icinga2 DSL with flex/bison. The grammar rules can be found in the parser.y file and you can find the Lexer and other parsing utils in the lexer.go file.

For local testing/debugging purposes you can re-build the parser with the following command.

go env -w GOBIN=/Users/yhabteab/go/bin
go install golang.org/x/tools/cmd/goyacc@latest
go generate './...'

This will also generate a parser.output file which contains all the parser states and is useful to resolve some shift/reduce conflicts.

@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Apr 27, 2023
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 3 times, most recently from 941daff to 080e335 Compare May 2, 2023 07:50
@yhabteab yhabteab changed the title [Draft]: Enhance filter parser Enhance filter parser May 2, 2023
@yhabteab yhabteab requested a review from julianbrost May 2, 2023 08:12
@julianbrost
Copy link
Collaborator

However, I haven't pushed the auto-generated parser yet because it involves a lot of code changes that don't need to be reviewed, but when this PR is done, we can think about merging this parser here or auto-generating it every time when building noma, basically just like the Icinga2 *.ti files.

Have a look at go generate. Also, common practice in Go is to commit these files to the repo so that you don't have to bother with the these dependencies unless you actually want to change the parser.

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from 2337de5 to 176b442 Compare May 3, 2023 10:00
@yhabteab
Copy link
Member Author

yhabteab commented May 3, 2023

fuzz: elapsed: 3m30s, execs: 1483038 (7807/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m33s, execs: 1507525 (8160/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m36s, execs: 1530168 (7550/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m39s, execs: 1553639 (7821/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m42s, execs: 1576623 (7661/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m45s, execs: 1595193 (6190/sec), new interesting: 95 (total: 1559)
fuzz: elapsed: 3m48s, execs: 1609919 (4910/sec), new interesting: 97 (total: 1561)
fuzz: elapsed: 3m51s, execs: 1630670 (6917/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m54s, execs: 1648763 (6029/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m57s, execs: 1667104 (6113/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m0s, execs: 1685923 (6255/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m3s, execs: 1704380 (6154/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m6s, execs: 1723764 (6479/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m9s, execs: 1744018 (6752/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m12s, execs: 1758080 (5418/sec), new interesting: 100 (total: 1564)
--- PASS: FuzzParser (251.66s)
PASS
ok  	github.com/icinga/noma/internal/filter	251.870s

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 5 times, most recently from 089cfc5 to 877a020 Compare May 3, 2023 16:32
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 6cbb497 to 67991f2 Compare October 16, 2023 06:51
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 8b2c4eb to 3603283 Compare October 26, 2023 14:52
Copy link
Member

@oxzi oxzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went a bit over the code and needed to educate myself about yacc again. Feel free to argue on each of my comments.

internal/filter/lexer.go Outdated Show resolved Hide resolved
internal/filter/lexer.go Outdated Show resolved Hide resolved
internal/filter/parser_test.go Show resolved Hide resolved
lex.Scanner.Error = lex.ScanError

// Enable parsers error verbose to get more context of the parsing failures
yyErrorVerbose = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, this line resp. writing this internal variable results in data races.
As the tests are set to run in parallel, this Parse function is called concurrently and there are racing writes on yyErrorVerbose.

How bad is this? In this specific context, I wouldn't be really concerned but this can become worse later on.
As this is more off an global variable configuration, I would suggest setting this in an init function or check with goyacc directly?

==================
WARNING: DATA RACE
Write at 0x0000009e228f by goroutine 9:
  github.com/icinga/icinga-notifications/internal/filter.Parse()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
  github.com/icinga/icinga-notifications/internal/filter.TestParser.func2()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:86 +0x4f
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Previous write at 0x0000009e228f by goroutine 8:
  github.com/icinga/icinga-notifications/internal/filter.Parse()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
  github.com/icinga/icinga-notifications/internal/filter.TestParser.func1()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:16 +0x44
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Goroutine 9 (running) created at:
  testing.(*T).Run()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
  github.com/icinga/icinga-notifications/internal/filter.TestParser()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:83 +0x68
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Goroutine 8 (running) created at:
  testing.(*T).Run()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
  github.com/icinga/icinga-notifications/internal/filter.TestParser()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:13 +0x4b
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
==================

Copy link
Member Author

@yhabteab yhabteab Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea 🤷 how to set this using goyacc, but for now I will just use init() as discussed offline.

internal/filter/lexer.go Show resolved Hide resolved
internal/filter/parser.y Show resolved Hide resolved
internal/filter/parser.y Outdated Show resolved Hide resolved
internal/filter/parser.y Outdated Show resolved Hide resolved
internal/filter/types.go Outdated Show resolved Hide resolved
internal/filter/types.go Outdated Show resolved Hide resolved
internal/filter/lexer.go Show resolved Hide resolved
Comment on lines +13 to +15
// identifiersMatcher contains a compiled regexp and is used by the Lexer to match filter identifiers.
// Currently, it allows to match any character except a LogicalOp and CompOperator.
var identifiersMatcher = regexp.MustCompile("[^!&|~<>=()]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to work with a positive list here. Doing it like this, there is only very limited room for expansion. For example, if the regex would have been [^!&|<>=()] before (missing the ~), adding ~ there would have made previously valid identifiers invalid. Yes, this means that more characters have to be escaped/urlencoded, but I don't think that this would be bad, it would just have to be consistent with what the web module writes into the database.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand on why don't doing it the way it is, and I don't want to list all the valid chars because I don't know them all.

Comment on lines 53 to 77
// Lexer is used to tokenize the filter input into a set literals.
// This is just a wrapper around the Scanner type and implements the yyLexer interface used by the parser.
type Lexer struct {
scanner.Scanner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully understanding this yet, but https://pkg.go.dev/text/scanner says:

By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different identifier and white space characters.

I don't see any such customization, which sounds like this does way more than it has to and even should do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go white spaces are just like in PHP and the IPL parser also skips all these characters, just like this lexer does, so I won't do anything in this regard. For other customizations, see the inline comments in the Parse() function.

Comment on lines +108 to +137
filter_chain: conditions_expr logical_op maybe_negated_condition_expr
{
$$ = reduceFilter($2, $1, $3)
}
| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP
;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an asymmetry in the operands regarding possible negation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got confused by the very similar names conditions_expr and condition_expr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a better name that fits this use case, I will accept it.

Comment on lines +108 to +144
filter_chain: conditions_expr logical_op maybe_negated_condition_expr
{
$$ = reduceFilter($2, $1, $3)
}
| conditions_expr %prec PREFER_SHIFTING_LOGICAL_OP
;

conditions_expr: maybe_negated_condition_expr logical_op maybe_negated_condition_expr
{
$$ = reduceFilter($2, $1, $3)
}
| maybe_negated_condition_expr %prec PREFER_SHIFTING_LOGICAL_OP
;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or does this limit a filter_chain to at most 3 condition_expr?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does, which limits filter_rule to 6 condition_expr, so parsing a & b & c & d & e & f & g fails (spaces for readability, parsing fails with and without them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed that last time too, but didn't have time to push all the fixes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed with 5484f91

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from eebcabd to 1ccb78e Compare November 20, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants