Improve documentation and examples how to handle strings properly #314

jbx1 · 2022-07-24T18:29:53Z

The documentation only shows examples of parsing numbers and single characters. Almost all the tests also don't parse strings, which makes it hard to know what one needs to do, especially if one is a bit of a beginner in Rust and rust-peg.

The information about the 'input lifetime annotation is a bit elusive (not documented), and it is not clear how this affects the lifetime annotations needed for any structs receiving the parsed str or Vec of str.

It would also be great if there were some recommendations as to how strings should be parsed, and if zero-copy can be achieved in any way.

Some proper documentation with a few examples of parsing singular or vec of strings (with operators such as ** and ++) would be really helpful.

The text was updated successfully, but these errors were encountered:

kevinmehall · 2022-07-25T04:12:20Z

The $() operator returns an &'input str slice of the input string corresponding to the text matched by the expression inside, and is zero-copy:

pub rule alphanumeric1() -> &'input str = $(['a'..='z' | 'A'..='Z' | '0'..='9']+)

though if you want to copy it into an owned String you can do so in an action:

pub rule alphanumeric2() -> String = v:$(['a'..='z' | 'A'..='Z' | '0'..='9']+) { v.to_owned() }

You can compose these into something that parses a sequence of strings:

pub rule alphanumeric_seq1() -> Vec<&'input str> = alphanumeric1() ** ","
pub rule alphanumeric_seq2() -> Vec<String> = alphanumeric2() ** ","

or inline the rule if you don't want the separate rule:

pub rule alphanumeric_seq2a() -> Vec<String> = (v:$(['a'..='z' | 'A'..='Z' | '0'..='9']+) { v.to_owned() }) ** ","

If by "string" you mean something like a quoted string literal, it gets a little more complicated to handle escape sequences rather than a simple slice of the input:

   pub rule double_quoted_string() -> String
    = "\""  s:double_quoted_character()* "\"" { s.into_iter().collect() }

    rule double_quoted_character() -> char
      = [^ '"' | '\\' | '\r' | '\n' ]
      / "\\n" { '\n' }
      / "\\u{" value:$(['0'..='9' | 'a'..='f' | 'A'..='F']+) "}" {?
            u32::from_str_radix(value, 16).ok().and_then(char::from_u32).ok_or("valid unicode code point")
        }
      / expected!("valid escape sequence")

Hope that helps. Leaving this issue open for these examples to be integrated somewhere in the documentation.

jbx1 · 2022-07-28T16:47:13Z

That's great. Maybe a bit more details about the semantics of the 'input lifetime would be helpful.

kevinmehall · 2022-07-29T04:06:38Z

The 'input lifetime just gets used for the the input argument in the generated parse function. So a rule like

pub rule x() -> Vec<&'input str> = ($(['a'..='z')) ** ","

expands into a function like

fn x(input: &'input str) -> Result<Vec<&'input str>, ParseError>

In #299 (probably for 0.9), the name will be customizable instead of hard-coded, making it seem a little less magical.

YingboMa · 2023-04-12T03:08:58Z

How can we match unicode identifiers? Is it possible to use unicode-ident in the grammar?

kevinmehall · 2023-04-12T04:23:21Z

Yes, [ ] patterns allow a boolean if like Rust's match cases, so you can do something like

rule identifier() -> &'input str = $([c if is_xid_start(c)] [c if is_xid_continue(c)]*)

kevinmehall added the docs label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation and examples how to handle strings properly #314

Improve documentation and examples how to handle strings properly #314

jbx1 commented Jul 24, 2022

kevinmehall commented Jul 25, 2022

jbx1 commented Jul 28, 2022 •

edited

kevinmehall commented Jul 29, 2022

YingboMa commented Apr 12, 2023 •

edited

kevinmehall commented Apr 12, 2023

Improve documentation and examples how to handle strings properly #314

Improve documentation and examples how to handle strings properly #314

Comments

jbx1 commented Jul 24, 2022

kevinmehall commented Jul 25, 2022

jbx1 commented Jul 28, 2022 • edited

kevinmehall commented Jul 29, 2022

YingboMa commented Apr 12, 2023 • edited

kevinmehall commented Apr 12, 2023

jbx1 commented Jul 28, 2022 •

edited

YingboMa commented Apr 12, 2023 •

edited