Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add caret and spans like Trifecta (#234) #238

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

add caret and spans like Trifecta (#234) #238

wants to merge 2 commits into from

Conversation

re-xyr
Copy link

@re-xyr re-xyr commented Jun 12, 2021

Resolves #234; adds Parser(0)#withCaret, span, withSpan and Parser.caret.

I'm unsure about the performance of these combinators but I expect it to be not very good. This is due to recomputation in LocationMap each time the combinators are used. If this is much of a concern, this PR may be incapable. A probable optimization is to store the current Position (incl. row, col) in the State and update by need when the combinators are called (this is the approach of Megaparsec).

From Megaparsec's getSourcePos combinator which functions alike:

Return the current source position. This function is not cheap, do not call it e.g. on matching of every token, that's a bad idea. Still you can use it to get SourcePos to attach to things that you parse.

core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
@@ -668,6 +668,18 @@ class ParserTest extends munit.ScalaCheckSuite {
parseTest(Parser.stringIn(List("foo", "foobar", "foofoo", "foobat")).string, "foobal", "foo")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add generators that create any new elements to the AST (in this PR that would be input, but in a follow up it might be WithCaret or something). That way we can check all the other properties in the presence of this new node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to address this to make sure all the previous laws pass.

see for instance:

def defer(g: GenT[Parser]): GenT[Parser] =

as an example of how we want to transform some generators to make withSpan and spanOf and add them to here:

(1, rec.map { gen => GenT(!gen.fa) }),

and:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you added the generators, but I don't see that they were added to the root generator so we don't exercise all the existing laws in the presence of these operations.

@johnynek
Copy link
Collaborator

Thanks for sending this PR! I think we will find a way to merge something close to this.

@re-xyr
Copy link
Author

re-xyr commented Jun 15, 2021

I am thinking about Span. Do you think (Caret, Caret) will be better than a separate datatype?

core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
/** return the result of the parser together
* with the position of the starting of the consumption
*/
def careted0[A](pa: Parser0[A]): Parser0[(A, Caret)] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is confusing. The caret point before the A, but we return it after. Also, I wonder if we should bother to make this method since it is really just (caret ~ pa).map(_.swap).

For instance, if we change the order, as I would suggest as a mnemonic to remember the caret comes first, then it is just (caret ~ pa) which is probably too small for a method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that pa was primary and the caret was secondary so the result of pa comes first, but yes that would cause confusion about where is the caret exactly. If you prefer putting the caret at the first element I'll change it.

I think pa.careted is more readable than (caret ~ pa) in the sense that it associates tighter in sight which indicates it is somehow more of a single parsing unit. Indeed we can leave this out and let the user make the choice whether to write this helper instead since it is really short.

core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
override def parseMut(state: State): Caret = {
val (row, col) = state.locmap
.toLineCol(state.offset)
.getOrElse(throw new IllegalStateException("the offset is larger than the input size"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will throw when you are at the EOF right?

I think we need to fix that somehow.

I guess we can say that EOF should return toLineCol(eof - 1).map { case (row, col) => (row, col + 1) }

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll take a look at how I can solve that first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added GenT for spanning combinators but not yet added to the list in gen(0). Will add after this is solved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#286 fixes this issue.

@@ -668,6 +668,18 @@ class ParserTest extends munit.ScalaCheckSuite {
parseTest(Parser.stringIn(List("foo", "foobar", "foofoo", "foobat")).string, "foobal", "foo")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to address this to make sure all the previous laws pass.

see for instance:

def defer(g: GenT[Parser]): GenT[Parser] =

as an example of how we want to transform some generators to make withSpan and spanOf and add them to here:

(1, rec.map { gen => GenT(!gen.fa) }),

and:

@@ -668,6 +668,18 @@ class ParserTest extends munit.ScalaCheckSuite {
parseTest(Parser.stringIn(List("foo", "foobar", "foofoo", "foobat")).string, "foobal", "foo")
}

test("Parser.careted and spanned works") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I'd like to see a law like pa.parse(str) == pa.withSpan.map(_._1).parse(str) and pa.spanOf.parse(str) == pa.withSpan.map(_._2).parse(str)

something like that.


case class Caret(offset: Int, row: Int, col: Int)
object Caret {
implicit val eqCatsCaret: Eq[Caret] = Eq.fromUniversalEquals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making an Order instead and order by the offset? We might want to sort by carets.


case class Span(from: Caret, to: Caret)
object Span {
implicit val eqCatsCaret: Eq[Span] = Eq.fromUniversalEquals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we could make an Order here where we compare first by from, then to.

implicit val eqCatsCaret: Eq[Caret] = Eq.fromUniversalEquals
}

case class Span(from: Caret, to: Caret)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment if from and to are inclusive/exclusive? I assume to is exclusive.

/** return the result of the parser together
* with the position of the starting of the consumption
*/
def withCaret: Parser0[(A, Caret)] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about call this def caretBefore: Parser0[(Caret, A)] or something?

I'm really nervous withCaret and the fact that the caret is on the right is going to suggest to people that the caret is after the parse.

@@ -668,6 +668,18 @@ class ParserTest extends munit.ScalaCheckSuite {
parseTest(Parser.stringIn(List("foo", "foobar", "foofoo", "foobat")).string, "foobal", "foo")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you added the generators, but I don't see that they were added to the root generator so we don't exercise all the existing laws in the presence of these operations.

@johnynek
Copy link
Collaborator

do you think you will be able to address the comments I left?

@re-xyr
Copy link
Author

re-xyr commented Aug 19, 2021

Sorry, I think I've been busy in August. I may be able to address them in September.

@johnynek
Copy link
Collaborator

I'm planning to publish a new version soon if you have time to pick this back up.

@johnynek johnynek mentioned this pull request Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API to read source position
2 participants