Skip to content

Releases: ruby/prism

v0.1.0

07 Oct 17:36
4ae838d
Compare
Choose a tag to compare

Creating a new release to give a status update of where we're at, and where we're going.

I want to say first that there are a lot of things in progress, and a lot of things that will be removed/changed by the time we move forward. Those things include:

  • Currently the entire parser is modeled as a pratt parser, which is straight up incorrect. The Ruby grammar doesn't work that way. That being said, it's much easier to add new nodes and test things without getting into the intricate details of where they are allowed to show up in the tree. So for now, it's staying. But it will be paired down to just what is known as the "arg" production rule in the current CRuby grammar eventually.
  • We're generating a lot of code at the moment. Some of that code I'm going to want to manage manually. For example, we're currently generating the functions that allocate every node type, and they have some basic knowledge of how to store their location information. This is all well and good, but the details of how to create these nodes is a little more complicated than is allowed by our templating engine. Eventually I'd like to not be generating that stuff. But since it's very easy to generate new nodes right now, I'm leaving it.
  • There are numerous bugs that I'm just ignoring at the moment. For example when you parse a class node, if you put a superclass the constant path gets parsed as a < method call. There are lots of these examples; we'll get to them all eventually.

That being said, here are some features that have been added since this project was started that are the beginnings of what we're going to have in the final product:

  • We're generating a shared library that has no dependencies on external projects our libraries. This is in place, being generated by a custom makefile. At the moment it's called librubyparser, but I'm open to literally any other name.
  • The shared library basically has two workflows:
    • yp_parse - accepts a parser, returns a pointer to the root node of the tree that was parsed. Parse errors are added to the parser's error list as parsing is performed. The user is then free to use this node as they please.
    • yp_serialize - accepts a parser, a node, and a buffer and serializes the node to a binary string on the buffer. The user is then free to use this buffer as they please.
  • We're generating a Ruby native extension library that allows interacting with the shared library from a Ruby context. We're using this for testing our parsing, which makes it not only helpful but necessary. This library includes definitions for all of the nodes in the tree much like syntax tree. All of the nodes can be queried, walked, and deconstructed. With the nodes in place, the library also provides a deserialization procedure for reading the binary string dumped by yp_serialize.
  • We've begun providing more documentation, and I intend on adding a lot more. Some nodes have documentation now, but I want every node to have clear, concise examples before this thing gets shipped. We've also been adding documentation to all of the C functions.
  • We have some basic error recovery, with plans for much more as we get further underway. At the moment, if a token is expected in a particular position, we can recover from that by replacing it with a missing token. We have decent error reporting now, which can be accessed through the C and Ruby interfaces.
  • We're comparing our lex output to ripper's at the moment and getting close to parity. There's a lot of state stored in the CRuby lexer (probably to make it easier to interface with bison) that we're not storing, which means it's difficult to get full compatibility, but we'll get there eventually.
  • We're tracking the scope of local variables in various scope nodes through the tree (currently 1 at the top level, 1 for each class node, and one for each module node). This is necessary for parsing super complicated examples like the ones we saw in tric this year.
  • A lot of the nodes in the tree have been simplified and made more semantic than their ripper/syntax tree counterparts. For example, Assignment is a node in syntax tree, but in YARP it has been split into {Class,Global,Local,Instance}VariableWrite and CallNode where appropriate. On the other hand, some nodes have been eliminated by collapsing them into common ones, like If and IfModifier being a part of the same node. The #1 goal here is making it easier and faster to compile once this is integrated into CRuby.

There are a ton of things we're still working on, but top-of-mind for the near future includes:

  • Error recovery at the node level. This involves tracking context as a stack and allowing parent nodes to recover from unexpected tokens if one is found. This largely mirrors the approach described here.
  • Just more nodes. We have a ton of stuff still to implement. I'd like to get method definitions in place soon, because that'll start to allow us to parse very basic full files. Also because method definitions involve a ton of different subnodes like *, **, &, massign, etc.