Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the parser #1596

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Simplify the parser #1596

wants to merge 22 commits into from

Conversation

certik
Copy link
Contributor

@certik certik commented Aug 6, 2019

Several simplifications were done:

  • parse_implicit_mul() cannot return an error, added an assert and simplified the parser.yy
  • parser.yy: merged all the rules that could be merged; changed syntax that makes it more readable; rcp_static_cast casts were removed as they are not needed
  • There is a slight speedup due to the simplified grammar:
$ ./benchmarks/parsing 
parse('0') = 0: 25us
33ms
(x + y - sin(x)/(-4 + z**2) - x**(y**z))**5001
83ms
(x + y - sin(x)/(-4 + z**2) - x**(y**z))**5001
19ms
101ms

@certik certik requested a review from isuruf August 6, 2019 18:07
@certik
Copy link
Contributor Author

certik commented Aug 6, 2019

Here is an alternative formatting:

st_expr
    : expr { $$ = $1; p.res = $$; }
    ;

expr
    : expr '+' expr { $$ = add($1, $3); }
    | expr '-' expr { $$ = sub($1, $3); }
    | expr '*' expr { $$ = mul($1, $3); }
    | expr '/' expr { $$ = div($1, $3); }
    | expr POW expr { $$ = pow($1, $3); }
    | expr '<' expr { $$ = Lt($1, $3); }
    | expr '>' expr { $$ = Gt($1, $3); }
    | expr LE expr { $$ = Le($1, $3); }
    | expr GE expr { $$ = Ge($1, $3); }
    | expr EQ expr { $$ = Eq($1, $3); }
    | expr '|' expr {
        set_boolean s;
        s.insert(rcp_static_cast<const Boolean>($1));
        s.insert(rcp_static_cast<const Boolean>($3));
        $$ = logical_or(s); }
    | expr '&' expr {
        set_boolean s;
        s.insert(rcp_static_cast<const Boolean>($1));
        s.insert(rcp_static_cast<const Boolean>($3));
        $$ = logical_and(s); }
    | expr '^' expr {
        vec_boolean s;
        s.push_back(rcp_static_cast<const Boolean>($1));
        s.push_back(rcp_static_cast<const Boolean>($3));
        $$ = logical_xor(s); }
    | '(' expr ')' { $$ = $2; }
    | '-' expr %prec UMINUS { $$ = neg($2); }
    | '~' expr %prec NOT {
        $$ = logical_not(rcp_static_cast<const Boolean>($2)); }
    | IDENTIFIER { $$ = p.parse_identifier($1); }
    | NUMERIC { $$ = p.parse_numeric($1); }
    | IDENTIFIER '(' expr_list ')' { $$ = p.functionify($1, $3); }
    | IMPLICIT_MUL {
        auto tup = p.parse_implicit_mul($1);
        $$ = mul(std::get<0>(tup), std::get<1>(tup)); }
    | IMPLICIT_MUL POW expr {
        auto tup = p.parse_implicit_mul($1);
        $$ = mul(std::get<0>(tup), pow(std::get<1>(tup), $3)); }
    ;

expr_list
    : expr_list ',' expr { $$ = $1; $$.push_back($3); }
    | expr { $$ = vec_basic(1, $1); }
    ;

@isuruf compared to the one in the PR, which one do you think looks better? I can't quite decide where to put the }.

@certik
Copy link
Contributor Author

certik commented Aug 6, 2019

Here is another alternative, I probably like this one the most so far:

st_expr
    : expr { $$ = $1; p.res = $$; }
    ;

expr
    : expr '+' expr { $$ = add($1, $3); }
    | expr '-' expr { $$ = sub($1, $3); }
    | expr '*' expr { $$ = mul($1, $3); }
    | expr '/' expr { $$ = div($1, $3); }
    | expr POW expr { $$ = pow($1, $3); }
    | expr '<' expr { $$ = Lt($1, $3); }
    | expr '>' expr { $$ = Gt($1, $3); }
    | expr LE expr { $$ = Le($1, $3); }
    | expr GE expr { $$ = Ge($1, $3); }
    | expr EQ expr { $$ = Eq($1, $3); }
    | expr '|' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_or(s); }
    | expr '&' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_and(s); }
    | expr '^' expr {
            vec_boolean s;
            s.push_back(rcp_static_cast<const Boolean>($1));
            s.push_back(rcp_static_cast<const Boolean>($3));
            $$ = logical_xor(s); }
    | '(' expr ')' { $$ = $2; }
    | '-' expr %prec UMINUS { $$ = neg($2); }
    | '~' expr %prec NOT {
            $$ = logical_not(rcp_static_cast<const Boolean>($2)); }
    | IDENTIFIER { $$ = p.parse_identifier($1); }
    | NUMERIC { $$ = p.parse_numeric($1); }
    | IDENTIFIER '(' expr_list ')' { $$ = p.functionify($1, $3); }
    | IMPLICIT_MUL {
            auto tup = p.parse_implicit_mul($1);
            $$ = mul(std::get<0>(tup), std::get<1>(tup)); }
    | IMPLICIT_MUL POW expr {
            auto tup = p.parse_implicit_mul($1);
            $$ = mul(std::get<0>(tup), pow(std::get<1>(tup), $3)); }
    ;

expr_list
    : expr_list ',' expr { $$ = $1; $$.push_back($3); }
    | expr { $$ = vec_basic(1, $1); }
    ;

@certik
Copy link
Contributor Author

certik commented Aug 6, 2019

I like the last one the most, so I pushed it in. The { and } are kind of hidden, and the double indentation ensures that one can visually quickly see what is the grammar rule and what is the semantic action.

This approach also encourages to keep the semantic action short: ideally code like

    | expr '|' expr {
            set_boolean s;
            s.insert(rcp_static_cast<const Boolean>($1));
            s.insert(rcp_static_cast<const Boolean>($3));
            $$ = logical_or(s); }

should probably be replaced with something like:

    | expr '|' expr { $$ = make_or($1, $3); }

} else {
sym = parse_identifier(lexpr);
}
SYMENGINE_ASSERT(lexpr.length() > 0);
Copy link
Member

@isuruf isuruf Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather this throw an exception even in Release mode instead of silently ignoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example when this can happen? I couldn't figure out any.

Copy link
Contributor Author

@certik certik Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we cannot figure out when this could happen for some user input, then the assert statement is appropriate. Asserts are for ensuring that our assumptions in the code are consistent/valid, but those do not run in release mode, because we assume our code is correct in release mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, there was some ambiguity in the grammar where 2e10 was parsed as an implicit mul.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can trigger it. Too bad there wasn't a test for this. All parsing tests pass. We need to test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite frankly, I don't like this implicit mul parsing, due to such ambiguities. Do you think people want implicit mul to be parsed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think people want implicit mul to be parsed?

Yes, julia people use this feature heavily in SymEngine.jl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm no fan of implicit mul.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, we should also fix #1462. We can even have a few different parsers if needed.

@certik
Copy link
Contributor Author

certik commented Sep 9, 2019

I don't have time to finish this before a release, let's do this PR after the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants