Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple syntax extensions to shorten grammars #545

Closed
reverofevil opened this issue Dec 1, 2017 · 9 comments
Closed

Simple syntax extensions to shorten grammars #545

reverofevil opened this issue Dec 1, 2017 · 9 comments

Comments

@reverofevil
Copy link

reverofevil commented Dec 1, 2017

While PEG.js is already a great tool for parsing, it requires a lot of boilerplate inside of the actions. As two cases constitute most of the time we need actions, some better syntax probably should be considered.

AST node creation

start  = _ expr:expr1 { return expr; }

expr1  = left:expr2 type:[+-] _ right:expr1 { return {type, left, right}; }
       / expr2

expr2  = left:expr3 type:[*/] _ right:expr2 { return {type, left, right}; }
       / expr3

expr3  = "(" _ expr:expr1 ")" _ { return expr; }
       / intlit

intlit = value:$[0-9]+ _ { return {type: 'int', value: parseInt(value, 10)}; }

_      = [ \t\n\r]*

(Right associative operators used for simplicity.) In this example all the

{ return {type, left, right}; }

are extraneous. When there is a sequence of named expressions without an action to use those names, it would be useful to imply object creation.

The only case when it's not enough while creating AST is when we need to put type: 'smth' into the generated object. While

smth = type:{ return "smth"; } this:this that:that

is totally an option, some syntactic sugar might be useful too. Here's an example, assuming that type: is commonly used as a name of node type tag (type is a bad name though, because it's useful for typed languages. kind would be better, but it's rare), and that users won't create AST nodes in arguments of / choice operator.

smth := this:this that:that

Cherry-picking

Actions like

{ return expr; }

happen a lot, because every lexeme in a programming language should be forwarded with something like _. It's tolerable when it happens on top level, but things like

exprX = (a:[+-] _ { return a; })* exprY

are really annoying. It would be nice to have some better syntax like

exprX = (@[+-] _)* exprY

where @ sign means "leave this as the only scalar result". No more than one expression in a sequence should be modified with @ prefix operator, and it shouldn't be used in same sequence with named expressions.

Alternatively : sign can be used instead to conserve @ for some future use, but it requires making current name:expr syntax whitespace-sensitive, and that's compatibility issue.

@rafaelclp
Copy link

rafaelclp commented Dec 19, 2017

+1 for the @ sign. After having written a few dozens of rules, needing to add { return a; } to simple rules just because of the _ has already become quite annoying and makes the grammar harder to read.

Also: #235

@reverofevil
Copy link
Author

@rafaelclp Thanks for the link! I probably caught the idea there back in 2014 when I've made my own PEG parser generator. (Well, there is no better explanation why I've chosen the same @ sign.)

@futagoza
Copy link
Member

@polkovnikov-ph Since I'm planning to use @ with annotations and import statements, I was planning to also use it for this feature, but after a while, I've started thinking, wouldn't it be confusing to use the same symbol (@) for all these? Since I plan to use % for external rule calls (see stage 2 of #523), the next best symbol to use that also fits into the current grammar nicely is # or ::

exprX = (#[+-] _)* exprY
exprY = ::expression ![+-]

What do you think?

@reverofevil
Copy link
Author

reverofevil commented Jan 22, 2018

@futagoza :: looks more consistent. Even : would work. Since library is in pre-release version, it's still possible to do this non backwards compatible change, so that people don't have to double tap on :. I don't think there is a lot of places where people wrote e :expression anyway.

@futagoza
Copy link
Member

I chose :: over : for one reason only, to not confuse the two. It's not about backwards compatibility.

@Mingun
Copy link
Contributor

Mingun commented Jan 22, 2018

Just a note: this problem rises up in #11 and #427. So I think that some of there issues must be closed as duplicates.

@reverofevil
Copy link
Author

@futagoza I'm fine with pretty much any way to do this. Specific character doesn't matter. It will never be worse than having to write { return ; } :)

@reverofevil
Copy link
Author

@Mingun I think this (#545) issue should be closed, because I read all the issues in 2013 and compiled them into another project. There's a lot of good comments under those issues.

@futagoza
Copy link
Member

futagoza commented Jan 22, 2018

#235 and #427 are the same as this, but #11 is regarding another matter

Edit: Added note to OP's comment on #235 that references this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants