Take and/or TakeUpTo parsers #341

randomeizer · 2024-03-21T13:37:04Z

This is a pitch for adding one or possibly both parsers to the core library. There is nothing in the library that currently lets you do what these do. They are both achieving the same basic goal with a slightly different approach.

They were inspired by this thread in Slack, and I figured the solution might be worth sharing.

For example, the goal from the thread was parsing episode names. This is a simpler example, with a standard format of " - Ep ".

This is actually quite hard to parse right now, since name could contain anything (including "-", digits, etc). The main rule is that it always ends with " - ". For example, "Testing, 1, 2, 3 - a Brief History - Ep 2".

struct EpisodeTitle {
  let name: String
  let number: Int
}

`Take`

The Take parser is provided with two parsers: a 'taker' and a 'terminator'. The results for both are returned as a tuple. Here's how it would parse EpisodeTitle:

let parser: some Parser<Substring, EpisodeTitle> = Parse(EpisodeTitle.init) {
  Take {
    Prefix(1...).map(.string)
  } upTo: {
    " - "
    Int.parser()
    End()
  }
}

let episodeTitle = try parser.parse("Testing, 1, 2, 3 - a Brief History - Ep 2")

Because Take's output is a tuple, it's signature looks like this in our example:

let takeParser: some Parser<Substring, (String, Int)> = Take {
  Prefix(1...).map(.string)
} upTo: {
  " - "
  Int.parser()
  End()
}

This is a bit odd, but does generally boil down into single-level tuples thanks to how parsers work in the library.

`TakeUpTo`

This variant just lets you specify the terminator Parser, and anything before that is returned unmodified.

This is a somewhat simpler implementation, but with the key limitation that it doesn't let you have any complex parsing before the terminator. Also, it because it doesn't return the value parsed by the terminator, it can't return that value either, so it has to be parsed again.

For example:

let parser: some Parser<Substring, EpisodeTitle> = Parse(EpisodeTitle.init) {
  TakeUpTo {
    " - "
    Int.parser()
    End()
  }.map(.string)
  " - "
  Int.parser()
  End()
}

let episodeTitle = try parser.parse("Testing, 1, 2, 3 - a Brief History - Ep 2")

Alternate approaches

I was unable to find anything which would allow parsing in situations like this, but I might have missed something. I attempted to use:

Parser(Episode.init) {
  Many {
    Prefix(1)
  } terminator: {
    " - "
    Int.parser()
    End()
  }.map(.string)
  " - "
  Int.parser()
  End()
}

However, Many doesn't actively check the terminator - it just checks if the next input matches the terminator once the main parser stops matching. As such, just eats all the text, including the episode details.

Anyway, would like to hear your thoughts. Of the two, Take is more flexible, but also more complicated, and still has some rough edges I'd like to sand off.

Along with some test cases.

mbrandonw · 2024-03-22T03:01:04Z

Hi @randomeizer, these are definitely powerful parsers, and we do have a version of this directly in our pointfreeco codebase for parsing our transcript markdown files.

However, the main reason we haven't brought it into swift-parsing is because it is super inefficient. It has no choice but to repeatedly run the inner parser on a bunch of prefixes of the input until it finds a match. For that reason we're not sure want to include it in the library, but it's definitely something people can define on their own. And maybe we could even include a version of it in a case study.

randomeizer · 2024-03-25T02:44:51Z

Yeah, it definitely isn't efficient, but there are many ways of writing it even less efficiently.

It should definitely be clearly documented as such, but sometimes there is no other way to parse something, unfortunately.

randomeizer added 2 commits March 21, 2024 22:48

Initial implementations of Take and TakeUpTo parsers

6fa1cff

Along with some test cases.

Fixes for TakeUpTo

d856737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take and/or TakeUpTo parsers #341

Take and/or TakeUpTo parsers #341

randomeizer commented Mar 21, 2024 •

edited

mbrandonw commented Mar 22, 2024

randomeizer commented Mar 25, 2024

Take and/or TakeUpTo parsers #341

Are you sure you want to change the base?

Take and/or TakeUpTo parsers #341

Conversation

randomeizer commented Mar 21, 2024 • edited

Take

TakeUpTo

Alternate approaches

mbrandonw commented Mar 22, 2024

randomeizer commented Mar 25, 2024

randomeizer commented Mar 21, 2024 •

edited

`Take`

`TakeUpTo`