WIP: SLR parse table generation #27

mpsijm · 2019-07-08T10:24:02Z

@udesou Just to get some feedback, and also to check whether this will not conflict with your upcoming refactorings.

I've implemented SLR parse table generation in sdf2table based on http://compilers.iecc.com/comparch/article/01-04-079, as was linked in the FIXME comment. Some remarks:

The First and Follow sets are calculated per symbol (rather than per production) and stored as a character class.
- For SLR, this is precise enough. In order to know which symbols can follow another symbol, you need to check all productions for that symbol anyway.
- For LR(1), there will probably be more benefit in calculating the First set per production. This should not be difficult when using the symbolProductionsMapping of the grammar.
The refactoring of character classes of Refactor CharacterClass: use ICharacterClass where possible #26 is not used yet, as experimenting with SLR parse table generation was my inspiration for this refactoring. (EDIT: done!)
In the end, we will probably want to allow the end-user to switch between different types of parse table generation.
I have no idea what contextual symbols are, but I could work around them (see 1c2518b#diff-bbb27858751954065a9ac33b038c11e3R146). Did I do this correctly?

Thanks in advance for your feedback! 🙂

udesou · 2019-07-09T01:37:28Z

Great stuff!
Don't worry about my refactorings, atm I'm just shifting interfaces around to remove the dependency between JSGLR (1 and 2) and sdf2table.
I think a good project (not necessarily for you) would be benchmarking all cases (LR(0), SLR and LR(1)) with respect to parse table generation time, but also parse time. The original sdf2table does SLR if I'm not wrong, but I remember reading some paper saying that the difference between all of those was minimal in terms of parse time, but they were considering GLR, not SGLR.
Finally, contextual symbols are like traditional non-terminals but with context tokens indicating priority and associativity conflicts. For data-dependent parsing, the contexts are used to solve conflicts at parse time, so they should be the same as the original non-terminal.
For disambiguation by grammar transformation, you might need to calculate the sets for the contextual symbols themselves, since they get new productions (I believe that in most cases, the first/follow sets for the contextual symbols are actually smaller than the original non-terminal).
Section 3.1 of http://delivery.acm.org/10.1145/3140000/3136020/sle17-sle17main12.pdf gives a short overview on contextual grammars, but if you wanna know more, chapter 2 of my thesis has a proper description of what those symbols are.

@udesou

**The Problem:** In `NormGrammarReader.processSymbol`, a cache is maintained for symbols that are already processed. However, this cache did not take into account that Symbols with and without a Label should be treated as equal (e.g. `tail:Stm` and `Stm`, see org.spoofax.jsglr2.integration/src/main/resources/grammars/layout-sensitive.sdf3). These labels are only used to reference the symbols from the layout constraints, but do not actually contribute to the meaning of a production. **Why is this a problem?** For SLR parse table generation (see metaborg#27), all grammar symbols are assumed to be unique. However, because Symbols with or without a Label were not treated as equal, the "graph" used to calculate the First- and Follow-sets became disconnected. This in turn caused some symbols to have empty First-/Follow-sets. The productions belonging to these symbols would not get any reduce actions in the parse table, because if the Follow-set is empty, no lookahead is allowed. **Current solution:** When processing symbols in the `NormGrammarReader`, any Label constructor is removed from the symbol term. **Discussion:** I have tried looking around where these labels are used (I am assuming layout constraints, but I couldn't find how they are processed in NormGrammarReader). Depending on their usage, it might be a good idea to apply this filter in Stratego already, e.g. in the strategy `module-to-normal-form`. @udesou any feedback is appreciated, as always 🙂

This EOFSymbol was introduced when changing the EOF value from 256 to -1

mpsijm force-pushed the slr-generation branch from 1c2518b to f9bbfac Compare July 8, 2019 16:12

mpsijm force-pushed the slr-generation branch 3 times, most recently from 3f93359 to a1809ec Compare July 18, 2019 12:08

mpsijm force-pushed the slr-generation branch from a1809ec to 6f513b7 Compare August 16, 2019 09:39

mpsijm force-pushed the slr-generation branch from 6f513b7 to e68c54c Compare September 3, 2019 11:34

mpsijm force-pushed the slr-generation branch from e68c54c to 961359f Compare September 13, 2019 13:07

mpsijm force-pushed the slr-generation branch from 961359f to ddc6784 Compare November 4, 2019 14:08

mpsijm mentioned this pull request Nov 4, 2019

Remove duplicate Symbols caused by a Label #33

Merged

mpsijm force-pushed the slr-generation branch from ddc6784 to 9fa417d Compare November 12, 2019 08:54

mpsijm force-pushed the slr-generation branch from 9fa417d to f2d4c12 Compare November 25, 2019 11:41

mpsijm force-pushed the slr-generation branch 2 times, most recently from 7d49367 to 582dcc1 Compare February 25, 2020 14:42

mpsijm force-pushed the slr-generation branch from 582dcc1 to 1584228 Compare March 12, 2020 11:15

mpsijm force-pushed the slr-generation branch 2 times, most recently from f9083b1 to 084742e Compare November 24, 2020 12:28

mpsijm force-pushed the slr-generation branch from 084742e to a18f39f Compare April 30, 2021 10:27

mpsijm and others added 3 commits June 2, 2021 15:13

Implement SLR parse table generation

b3688a3

Correctly calculate contains-the-follow-of

9dad2bc

Use new EOFSymbol in SLR parsing

000e0ab

This EOFSymbol was introduced when changing the EOF value from 256 to -1

mpsijm force-pushed the slr-generation branch from a18f39f to 000e0ab Compare June 2, 2021 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: SLR parse table generation #27

WIP: SLR parse table generation #27

mpsijm commented Jul 8, 2019 •

edited

udesou commented Jul 9, 2019

WIP: SLR parse table generation #27

Are you sure you want to change the base?

WIP: SLR parse table generation #27

Conversation

mpsijm commented Jul 8, 2019 • edited

udesou commented Jul 9, 2019

mpsijm commented Jul 8, 2019 •

edited