Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and report invalid grammars #101

Open
igordejanovic opened this issue Apr 1, 2023 · 0 comments
Open

Detect and report invalid grammars #101

igordejanovic opened this issue Apr 1, 2023 · 0 comments
Labels

Comments

@igordejanovic
Copy link
Member

igordejanovic commented Apr 1, 2023

Recent change [faaeaeb] (See the warning note in the changelog) makes the parser behave consistently but some pathological grammars now can lead the parser to loop endlessly (e.g. nesting Optional inside ZeroOrMore). This is frustrating for the user.

This issue will track the development of checker during parser construction which would catch those pathological cases and report the problem early.

In general, these cases should be checked:

  1. Direct or indirect left recursion. Leads to an infinite loop.
  2. Rules that always succeed (infallible) in OrderedChoice (Optional, ZeroOrMore, RegExMatch that match empty, sequences of the previous). This will not stop the parser from working but will prevent other alternatives to be tried. Thus, this might just be a warning and not a hard error. Or, we could insist on fixing this by e.g. removing unreachable choice alternatives.
  3. Rules that may succeed without consuming input (nullable) inside Zero/OneOrMore as it may lead to an infinite loop.

Algorithms:
Check 1 (left recursion):

  • Do a depth first search over the parser model.
  • For ordered choices follow each branch
  • For sequences follow leftmost rule, if the current rule in a sequence is nullable follow the next one also (hidden left recursion)
  • Keep a path of rules. If current rule was visited before report an error.

Check 2 (infallible in OrderedChoice):

  • Mark each rule during construction with infallible flag (in constructors)
  • Infallible rules are:
    • StringMatch with empty match
    • RegexMatch that can match empty (test by trying to match '')
    • ZeroOrMore
    • Optional
    • Sequence of infallible rules
    • OrderedChoice with an infallible branch
  • If any of the branch, except the last, in OrderedChoice is infallible report an error. Suggest removing unreachable choices (those that follows the infallible) as a fix.

Check 3 (nullable in repetitions):

  • Mark each rule during construction as nullable flag (in constructor):
  • Nullable rules are:
    • StringMatch with empty match
    • RegexMatch that can match empty (test by trying to match '')
    • Syntax predicates (lookaheads) - And and Not
    • Optional
    • ZeroOrMore
    • Sequence of nullable rules
    • OrderedChoice where any of the branch is nullable (some cases would be prevented by check 2, so it should run first)
  • If Zero/OneOrMore contains nullable rule report an error.

Note: These checks introduce an additional overhead. Since, the grammar won't change in production these check should be run only if the parser is in debug mode.

igordejanovic referenced this issue Apr 1, 2023
If a branch in ordered choice has a potentially non-consuming successful
alternative (Optional, ZeroOrMore, Not, And), it will succeed if
alternative succeeds and thus will not try further choices.
@igordejanovic igordejanovic changed the title Detect pathological grammars which can lead the parser to an infinite loop Detect some pathological grammars Apr 1, 2023
@igordejanovic igordejanovic changed the title Detect some pathological grammars Detect and report invalid grammars Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant