Add a way to capture the delimiters in a delimited repeat. #259

tomprince · 2021-04-26T18:37:18Z

I'm looking at migrating full-moon to use rust-peg for parsing. However, since it captures the entire text (such as whitespace and comments), I need to be able to capture the result of delimiters, as well as the main item, if I were to use **, or ++.

The text was updated successfully, but these errors were encountered:

kevinmehall · 2021-04-27T15:46:07Z

You could do something like:

rule list<I, S>(item: rule<I>, sep: rule<S>) -> (Option<I>, Vec<(S, I)>)
        = first:item() items:(s:sep() i:item() { (s, i) })* { (Some(first), items) }
        / { (None, vec![]) }

rule use_it() = list(<expr()>, <comma()>)

which is kind of like what ** expands to.

I would be interested to hear your experience and pain points in using this library for a lossless parser. Are you producing a typed or untyped syntax tree?

tomprince · 2021-04-27T18:54:34Z

You could do something like: [...]

It looks like the use of rule<...> as a type of rule argument isn't documented anywhere.

I would be interested to hear your experience and pain points in using this library for a lossless parser.

I've only just started working on converting the existing hand build parser to peg, so I don't know what pain points I'll run into. This is the first major one.

A couple of minor points:

I often have rule fragments like (a:e1 b:e2 {(a,b)}). It would be nice if I could instead just say (e1 e2).
Parsing against a complex [T] [1] requires defining a helper trait with a bunch of method and using the undocumented ## to call them. I'm not sure if there is something that could be done to make this more ergonomic.[2]

[1] I'm adapting an existing split lexer + parser, that clusters trivia like whitespace/comments with the adjacent tokens before parsing, so I'm parsing these token clusters (which also include position information, but only care about the root token for determining the parser.
[2] I realized as I was writing this that I could also use [token] {? if token ... } but something like

rule number() -> TokenReference<'text>
    = [token] {? if let TokenType::Number { number } = *token {
            Ok(token.with_value(number))
        } else {
            Err("not a number")
        }
     }

still feels a little bit awkward.

Are you producing a typed or untyped syntax tree?

I'm not sure what you mean by this?

tomprince · 2021-04-27T19:09:52Z

I would be interested to hear your experience and pain points in using this library for a lossless parser.

I just discovered that I can't implement ParseLiteral for my [T]. I was going to experiment using this to allow matching symbols in the parser using string literal syntax. Though, even if I could, that would allow me to write grammar with an invalid symbol that would only be detected at runtime.

godmar · 2021-07-03T16:44:14Z

You could do something like:

rule list<I, S>(item: rule<I>, sep: rule<S>) -> (Option<I>, Vec<(S, I)>)
        = first:item() items:(s:sep() i:item() { (s, i) })* { (Some(first), items) }
        / { (None, vec![]) }

rule use_it() = list(<expr()>, <comma()>)

which is kind of like what ** expands to.

I also have a use case where I'd like to collect the delimiters.
For instance, in a bash-style shell grammar, pipelines are separated by & or ; and within a pipeline, commands may be separated by | or |&. Before stumbling on this issue, my solution required 4 rules instead of 1 in each case; in general, with n choices of delimiters, it would be 2*n rules if I'm seeing this correctly.

So adding syntactic sugar may be useful. Also, it should probably return the separator that follows an item rather than the separator that precedes it (at least for my use case).

I'm currently successfully using the list<> rule given above. Very elegant.
For reference, the resulting code is:

    pub rule cmdline() -> Result<CommandLine, &'input str>
      = delimited_cmdline: list(<pipeline()>, <pipeline_separator()>) {
            let (pipe0, rest) = delimited_cmdline;
            let mut pipelines = vec![pipe0?];

            for (i, (sep, pipe)) in rest.into_iter().enumerate() {
                if matches!(sep, "&") {
                    let mut last = &mut pipelines[i];
                    last.bg_job = true;
                }
                pipelines.push(pipe.unwrap());
            }

            Ok(CommandLine {
                pipelines
            })
        }

    rule pipeline_separator() -> &'input str
        = $(";") / $("&")

kevinmehall · 2021-07-17T16:58:26Z

Also, it should probably return the separator that follows an item rather than the separator that precedes it (at least for my use case).

Yeah, one argument against making this some kind of built-in syntax is the number of different return types you might want, depending on how the separators associate with the items and whether empty lists and leading/trailing separators should be allowed:

(I, Vec<(S, I)>)
(Vec<(I, S>, I)
Vec<(I, Option<S>)>
Vec<(Option<S>, I)>
(Option<I>, Vec<(S, I)>)
(Vec<(I, S)>, Option<I>)
etc

(where I is the item and S is the separator)

godmar · 2021-07-19T14:21:34Z

The better alternative may then in fact be to improve documentation for the technique that uses rule; the user should be able to quickly create the variant that's best for them from the example if it's included in the README.

kevinmehall added the docs label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to capture the delimiters in a delimited repeat. #259

Add a way to capture the delimiters in a delimited repeat. #259

tomprince commented Apr 26, 2021

kevinmehall commented Apr 27, 2021

tomprince commented Apr 27, 2021

tomprince commented Apr 27, 2021

godmar commented Jul 3, 2021

kevinmehall commented Jul 17, 2021

godmar commented Jul 19, 2021

Add a way to capture the delimiters in a delimited repeat. #259

Add a way to capture the delimiters in a delimited repeat. #259

Comments

tomprince commented Apr 26, 2021

kevinmehall commented Apr 27, 2021

tomprince commented Apr 27, 2021

tomprince commented Apr 27, 2021

godmar commented Jul 3, 2021

kevinmehall commented Jul 17, 2021

godmar commented Jul 19, 2021