Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is Parser0#repSep not possible? #205

Open
nightscape opened this issue Apr 25, 2021 · 6 comments
Open

Why is Parser0#repSep not possible? #205

nightscape opened this issue Apr 25, 2021 · 6 comments

Comments

@nightscape
Copy link

Hi all,

I'm trying to port scala-uri to cats-parse.
One difficulty I'm facing is with parsing path parts including empty ones:

a/b/c
/a/b/c
a//c
/a//c

What I would like to do is something like this (simplified)

  def _path_segment: Parser0[String] = Parser.until0(charIn("/?#"))
  def _path: Parser[String] = (Parser.char('/').? ~ _path_segment.repSep0(char('/')).string

but this doesn't work, as Parser0 doesn't have repSep or repSep0 defined.
I understand that Parser0#rep is problematic, as one could easily run the empty parser infinitely, but in my naive thinking this problem shouldn't exist with repSep, right?

Or is there a nicer way to do this in cats-parse?

@martijnhoekstra
Copy link
Contributor

martijnhoekstra commented Apr 29, 2021

A combinator to repeat a Parser0 with a separator that's a Parser doesn't exist in the library, but looks safe to me. They could look like

def rep0sep0[A](data: Parser0[A], separator: Parser[Any]): Parser0[List[A]] =
  (data.? ~ (separator *> data).rep0).map { case (a, as) => a ++: as }

def rep0sep[A](data: Parser0[A], separator: Parser[Any]): Parser0[NonEmptyList[A]] = (data ~ (separator *> data).rep0).map {
  case (a, as) => NonEmptyList(a, as)
}

Then your use case becomes

val sep = char('/')
val part = until0(charIn("/?#"))
val path = sep.? *> rep0sep0(part, sep)
val _path = path.string

@johnynek
Copy link
Collaborator

We could add a function like this. I'd be happy to do that.

@johnynek
Copy link
Collaborator

johnynek commented May 6, 2021

By the way, I think a totally empty path is not a valid path is it?

So, even this function isn't great since it will match nothing.

It seems like a better way to write this parser would be to consider a few cases:

  1. nonEmpty string with a tail that has /a/b/c... where the a, b, c can be empty
  2. / followed by a/b/c... where a, b, c can be empty

In this way, you still get a Parser[Path] and not a Parser0.

@johnynek
Copy link
Collaborator

johnynek commented May 6, 2021

I thought about adding this code... but then I realized that we probably want to be a bit more thoughtful since we have min and max accepting versions of repSep, we probably want to mirror that.

But then, if you repeat 2 or more times with a sep that is non-empty you get a Parser not a Parser0, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).

In this way, you have (2 or more) | onetime.map(_ :: Nil) | pure(Nil) which would naturally degrade back down to Parser0 by the last two ors.

@zsluedem
Copy link
Contributor

zsluedem commented Oct 9, 2021

But then, if you repeat 2 or more times with a sep that is non-empty you get a Parser not a Parser0, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).

I have some difficulties understanding why we need Parser0 and Parser actually. What is the shortcoming to merge this two thing together?

@johnynek
Copy link
Collaborator

johnynek commented Oct 9, 2021

Unbounded repetition on a Parser0 isn't safe: it can just run forever allocating more memory parsing an infinite number of items making no progress.

This is not an uncommon error when working with Parser combinators.

The blog post linked in the read me of this repo covers this:

https://posco.medium.com/designing-a-parsing-library-in-scala-d5076de52536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants