Skip to content

Commit

Permalink
adding indentation support
Browse files Browse the repository at this point in the history
  • Loading branch information
kristianmandrup committed Mar 29, 2017
1 parent 205c55d commit 8346695
Show file tree
Hide file tree
Showing 7 changed files with 792 additions and 561 deletions.
1 change: 1 addition & 0 deletions .gitignore
@@ -1,3 +1,4 @@
browser/*
examples/*.js
node_modules/*
.vscode
133 changes: 133 additions & 0 deletions README.md
Expand Up @@ -503,9 +503,142 @@ environments:
* Safari
* Opera

Fork changes
------------

This fork is designed for experimenting with indentation based grammars, bases on this [issue comment](https://github.com/pegjs/pegjs/issues/217#issuecomment-286595368) on that topic.

It adds the following to `src/parser.pegjs`

```js
{
const OPS_TO_PREFIXED_TYPES = {
+: "++": "increment_match",
+: "--": "decrement_match"

PrefixedOperator
+: / "++"
+: / "--"

SuffixedOperator
+: / "+" !"+"
```
Which enabled out grammar to support `++Indent` and `--Indent`
- `++Indent` tells the parser to increase the minimum amount of matches required for `Indent`
- `--Indent` tells the parser to decrease the minimum amount of matches required for `Indent`
A simple indentation based grammar can then be expressed as:
```js
Start
= Statements

Statements
= Statement*

Statement
= Indent* @(S / I)

S
= "S" EOS {
return "S";
}

I
= "I" EOL ++Indent @Statements --Indent
/ "I" EOS { return []; }

Indent "indent"
= "\t"
/ !__ " "

__ "white space"
= " \t"
/ " "

EOS
= EOL
/ EOF

EOL
= "\n"

EOF
= !.
```
More details can be found [here](https://gist.github.com/dmajda/04002578dd41ae8190fc)
Intentation based PEG.js Grammar
--------------------------------
Describes a simple indentation-based language. A program in this language is
a possibly empty list of the following statements:
* S (simple) Consists of the letter `S`.
* I (indent)
Consists of the letter `I`, optionally followed by a newline and a list
of statements indented by one indentation level (2 spaces) relative to
the I statement itself.
Statements are terminated by a newline or `EOF`.
Example: `indentation/samples/simple-indent.js.txt`
```bash
I
S
I
S
S
```
The grammar needs to be compiled without caching.
To generate an indentation parser, try:
```bash
$ gulp parser
$ ll -a lib/
```
`ll -a` should let you see if the timestamps were updated, and thus new files generated by the compiler.
Now try generating an *indentation parser*
```bash
$ bin/./pegjs -o indentation/parsers/indent-parser.js indentation/grammars/simple-indent.pegjs`
```

This should (hopefully) generate a parser in `indentation/parsers/indent-parser.js`
I sadly get this *error*, but getting close ;)

```bash
Cannot read property 'apply' of undefined
```

Development
-----------

To contribute or experiment with your own patches and tweaks:
- make changes in the `/src` folder
- add examples to test your changes in `/examples` and tests in `/test`

To generate a new PEG parser, run `gulp parser` which generates `.js` files in `/lib`
The binary `pegjs` can be found in `/bin` and uses `lib/peg` to execute.

`let peg = require("../lib/peg")`

Testing
-------

Tests can be found in `/test` and are written in chai.

Resources
---------
* [Project website](https://pegjs.org/)
* [Wiki](https://github.com/pegjs/pegjs/wiki)
* [Source code](https://github.com/pegjs/pegjs)
Expand Down
35 changes: 35 additions & 0 deletions indentation/grammars/simple-indent.pegjs
@@ -0,0 +1,35 @@
Start
= Statements

Statements
= Statement*

Statement
= Indent* statement:(S / I) { return statement; }

S
= "S" EOS {
return "S";
}

I
= "I" EOL ++Indent statements:Statements --Indent { return statements; }
/ "I" EOS { return []; }

Indent "indent"
= "\t"
/ !__ " "

__ "white space"
= " \t"
/ " "

EOS
= EOL
/ EOF

EOL
= "\n"

EOF
= !.
Empty file.
5 changes: 5 additions & 0 deletions indentation/samples/simple-indent.js.txt
@@ -0,0 +1,5 @@
I
S
I
S
S

3 comments on commit 8346695

@futagoza
Copy link

@futagoza futagoza commented on 8346695 Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opcode error you mentioned in pegjs#217 (comment) is from generate-js.js#L386 or generate-js.js#L708. Your getting this error because the predicate operations are first converted to PEG.js bytecode before being generated into parse functions.

If you don't want to mess with the bytecode or JavaScript generators, I suggest adding a expansion based transform pass in the compiler:

  1. When pass starts, create variable: const newRules = [];
  2. Using visitor.js look for increment_match and decrement_match
  3. Get the maximum amount (e.g. n) for rules that will be incremented or decremented
  4. Based on the new n, generate the AST for new rules: Indent$3 = Indent Indent Indent
  5. Save the generated AST rule: newRules.push( Indent$3 );
  6. Generate AST for a rule_ref node: const reference = new RuleRef( Indent$3.name, node.location );
  7. Finish the visit by replacing the current expression: swapExpression( node, reference );
  8. When the pass is finishing, add the generated AST rules: ast.rules.push( newRules );

You should also throw a fatal warning when a decrement_match takes n below 0, as this will have been cause by extra decrement_match expressions.

If you want, I can submit a PR that should add the 2 (check and transform) passes to get this feature working.

@kristianmandrup
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@futagoza Thanks a lot! Please submit the PR :) I will also take a look at your fork. Will take me a little while to get up to speed on the internals and writing lexers/grammars/ASTs again. Been a long time since Computer Science. Thanks again!!

@futagoza
Copy link

@futagoza futagoza commented on 8346695 Apr 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kristianmandrup, sorry for the late reply, been preoccupied with a few things.

I've managed to implement a pass that does what I described above, but it seems the implementation above
only works on rules that have 1 expression that's a expression * or expression +, and then replaces the ++ expression or -- expression with the expanded result. Not what I was aiming for 🤣

I suggest digging into the bytecode or js generators to see if you or someone who can understand them can implement this in one of them. If afraid I can't help with the parser generator my self as I'm more of a language designer then a language implementer, and the code in there just baffles me right now, but I'm still studying it part time 😄

Update

I've pushed the changes into a new branch in my copy of pegjs: futagoza/pegjs/commits/indentation

You might want to look at the first 3 commits I did before taking a look at the relevant ON-TOPIC commit 😝

Please sign in to comment.