Simplify the parser #1596

certik · 2019-08-06T18:07:28Z

Several simplifications were done:

parse_implicit_mul() cannot return an error, added an assert and simplified the parser.yy
parser.yy: merged all the rules that could be merged; changed syntax that makes it more readable; rcp_static_cast casts were removed as they are not needed
There is a slight speedup due to the simplified grammar:

$ ./benchmarks/parsing 
parse('0') = 0: 25us
33ms
(x + y - sin(x)/(-4 + z**2) - x**(y**z))**5001
83ms
(x + y - sin(x)/(-4 + z**2) - x**(y**z))**5001
19ms
101ms

certik · 2019-08-06T18:22:00Z

Here is an alternative formatting:

st_expr
    : expr { $$ = $1; p.res = $$; }
    ;

expr
    : expr '+' expr { $$ = add($1, $3); }
    | expr '-' expr { $$ = sub($1, $3); }
    | expr '*' expr { $$ = mul($1, $3); }
    | expr '/' expr { $$ = div($1, $3); }
    | expr POW expr { $$ = pow($1, $3); }
    | expr '<' expr { $$ = Lt($1, $3); }
    | expr '>' expr { $$ = Gt($1, $3); }
    | expr LE expr { $$ = Le($1, $3); }
    | expr GE expr { $$ = Ge($1, $3); }
    | expr EQ expr { $$ = Eq($1, $3); }
    | expr '|' expr {
        set_boolean s;
        s.insert(rcp_static_cast<const Boolean>($1));
        s.insert(rcp_static_cast<const Boolean>($3));
        $$ = logical_or(s); }
    | expr '&' expr {
        set_boolean s;
        s.insert(rcp_static_cast<const Boolean>($1));
        s.insert(rcp_static_cast<const Boolean>($3));
        $$ = logical_and(s); }
    | expr '^' expr {
        vec_boolean s;
        s.push_back(rcp_static_cast<const Boolean>($1));
        s.push_back(rcp_static_cast<const Boolean>($3));
        $$ = logical_xor(s); }
    | '(' expr ')' { $$ = $2; }
    | '-' expr %prec UMINUS { $$ = neg($2); }
    | '~' expr %prec NOT {
        $$ = logical_not(rcp_static_cast<const Boolean>($2)); }
    | IDENTIFIER { $$ = p.parse_identifier($1); }
    | NUMERIC { $$ = p.parse_numeric($1); }
    | IDENTIFIER '(' expr_list ')' { $$ = p.functionify($1, $3); }
    | IMPLICIT_MUL {
        auto tup = p.parse_implicit_mul($1);
        $$ = mul(std::get<0>(tup), std::get<1>(tup)); }
    | IMPLICIT_MUL POW expr {
        auto tup = p.parse_implicit_mul($1);
        $$ = mul(std::get<0>(tup), pow(std::get<1>(tup), $3)); }
    ;

expr_list
    : expr_list ',' expr { $$ = $1; $$.push_back($3); }
    | expr { $$ = vec_basic(1, $1); }
    ;

@isuruf compared to the one in the PR, which one do you think looks better? I can't quite decide where to put the }.

certik · 2019-08-06T18:23:37Z

Here is another alternative, I probably like this one the most so far:

st_expr
    : expr { $$ = $1; p.res = $$; }
    ;

expr
    : expr '+' expr { $$ = add($1, $3); }
    | expr '-' expr { $$ = sub($1, $3); }
    | expr '*' expr { $$ = mul($1, $3); }
    | expr '/' expr { $$ = div($1, $3); }
    | expr POW expr { $$ = pow($1, $3); }
    | expr '<' expr { $$ = Lt($1, $3); }
    | expr '>' expr { $$ = Gt($1, $3); }
    | expr LE expr { $$ = Le($1, $3); }
    | expr GE expr { $$ = Ge($1, $3); }
    | expr EQ expr { $$ = Eq($1, $3); }
    | expr '|' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_or(s); }
    | expr '&' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_and(s); }
    | expr '^' expr {
            vec_boolean s;
            s.push_back(rcp_static_cast<const Boolean>($1));
            s.push_back(rcp_static_cast<const Boolean>($3));
            $$ = logical_xor(s); }
    | '(' expr ')' { $$ = $2; }
    | '-' expr %prec UMINUS { $$ = neg($2); }
    | '~' expr %prec NOT {
            $$ = logical_not(rcp_static_cast<const Boolean>($2)); }
    | IDENTIFIER { $$ = p.parse_identifier($1); }
    | NUMERIC { $$ = p.parse_numeric($1); }
    | IDENTIFIER '(' expr_list ')' { $$ = p.functionify($1, $3); }
    | IMPLICIT_MUL {
            auto tup = p.parse_implicit_mul($1);
            $$ = mul(std::get<0>(tup), std::get<1>(tup)); }
    | IMPLICIT_MUL POW expr {
            auto tup = p.parse_implicit_mul($1);
            $$ = mul(std::get<0>(tup), pow(std::get<1>(tup), $3)); }
    ;

expr_list
    : expr_list ',' expr { $$ = $1; $$.push_back($3); }
    | expr { $$ = vec_basic(1, $1); }
    ;

certik · 2019-08-06T19:50:38Z

I like the last one the most, so I pushed it in. The { and } are kind of hidden, and the double indentation ensures that one can visually quickly see what is the grammar rule and what is the semantic action.

This approach also encourages to keep the semantic action short: ideally code like

    | expr '|' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_or(s); }

should probably be replaced with something like:

    | expr '|' expr { $$ = make_or($1, $3); }

isuruf · 2019-08-07T04:18:58Z

symengine/parser/parser.cpp

-    } else {
-        sym = parse_identifier(lexpr);
-    }
+    SYMENGINE_ASSERT(lexpr.length() > 0);


I'd rather this throw an exception even in Release mode instead of silently ignoring.

Can you give an example when this can happen? I couldn't figure out any.

And if we cannot figure out when this could happen for some user input, then the assert statement is appropriate. Asserts are for ensuring that our assumptions in the code are consistent/valid, but those do not run in release mode, because we assume our code is correct in release mode.

If I remember correctly, there was some ambiguity in the grammar where 2e10 was parsed as an implicit mul.

I'll see if I can trigger it. Too bad there wasn't a test for this. All parsing tests pass. We need to test this.

Quite frankly, I don't like this implicit mul parsing, due to such ambiguities. Do you think people want implicit mul to be parsed?

Do you think people want implicit mul to be parsed?

Yes, julia people use this feature heavily in SymEngine.jl

FWIW, I'm no fan of implicit mul.

While we are at it, we should also fix #1462. We can even have a few different parsers if needed.

certik · 2019-09-09T20:27:47Z

I don't have time to finish this before a release, let's do this PR after the release.

certik added 15 commits August 6, 2019 11:19

Simplify parse_implicit_mul() and the parser

1c6e804

Update generated file

cb0e669

Get rid of rcp_static_cast()

b8c896a

Simplify the grammar (slight speedup)

b04d0f3

Update generated file

e4d4e2f

Move things together

d7498ab

Fix the shift/reduce conflict

bdcb6c9

Regenerate

ee175c1

Use better formatting

7df06e6

Regenerate

92e3c30

Simplify func rule

372c0ce

Polish the syntax

e254b7c

Update

6afdbf9

Simplify a few lines

64f0eb7

Update generated file

f0a38da

certik requested a review from isuruf August 6, 2019 18:07

certik added 2 commits August 6, 2019 12:26

Better formatting

4f1a90d

Update generated file

5fd6c92

certik added 5 commits August 6, 2019 14:05

Enable warnings in Bison

4b70f67

Fix all warnings

cec1fde

Regenerate

f6cc107

Consolidate semantic declaration lines

5f8eea5

Regenerate

6e0ca77

isuruf reviewed Aug 7, 2019

View reviewed changes

lkeegan mentioned this pull request May 5, 2021

add sbml infix parsing and printing #1785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the parser #1596

Simplify the parser #1596

certik commented Aug 6, 2019

certik commented Aug 6, 2019

certik commented Aug 6, 2019

certik commented Aug 6, 2019

isuruf Aug 7, 2019 •

edited

certik Aug 7, 2019

certik Aug 7, 2019 •

edited

isuruf Aug 7, 2019

certik Aug 7, 2019

certik Aug 7, 2019

isuruf Aug 7, 2019

bjodah Aug 7, 2019

certik Aug 7, 2019

certik commented Sep 9, 2019

Simplify the parser #1596

Are you sure you want to change the base?

Simplify the parser #1596

Conversation

certik commented Aug 6, 2019

certik commented Aug 6, 2019

certik commented Aug 6, 2019

certik commented Aug 6, 2019

isuruf Aug 7, 2019 • edited

Choose a reason for hiding this comment

certik Aug 7, 2019

Choose a reason for hiding this comment

certik Aug 7, 2019 • edited

Choose a reason for hiding this comment

isuruf Aug 7, 2019

Choose a reason for hiding this comment

certik Aug 7, 2019

Choose a reason for hiding this comment

certik Aug 7, 2019

Choose a reason for hiding this comment

isuruf Aug 7, 2019

Choose a reason for hiding this comment

bjodah Aug 7, 2019

Choose a reason for hiding this comment

certik Aug 7, 2019

Choose a reason for hiding this comment

certik commented Sep 9, 2019

isuruf Aug 7, 2019 •

edited

certik Aug 7, 2019 •

edited