Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-string inputs? #281

Open
rossabaker opened this issue Oct 16, 2021 · 1 comment
Open

Non-string inputs? #281

rossabaker opened this issue Oct 16, 2021 · 1 comment

Comments

@rossabaker
Copy link
Member

Has any thought been given to abstracting over the input type? I'm thinking specifically of binary inputs like Array[Byte], fs2.Chunk, scodec.bits.ByteVector, or java.nio.ByteBuffer. I'm struggling to compete with an HTTP/1 parser that works on Array[Byte].

The obvious answer is scodec. The old fs2-http parser built on it, while beautiful, is also much slower. I'm dreaming of a scodec with the cats-parse mutability trick.

I spiked on it a bit. Problems I encountered:

  • To remain compatible, we have to do unspeakable things with the String and Char based parsers when the underlying type is binary. If we added binary parsers, we'd have to do unspeakable things when the underlying type is characters.
  • Some desired types, like BitVector, are Long-indexed instead of Int. This ripples at least into State and Expectation.
  • A typeclass for inputs seems the right way, but maintaining compatibility gets even harder.
  • All of this could be overcome with a parallel BinParser and BinParser0, but the duplication is an awful shame. It might not even be the same library anymore.

A more modest abstraction is to accept CharSequence as input, at which point we can wrap binary inputs with something like Netty's AsciiString. It's still abusive with respect to Char vs. Byte. It also doesn't help with HTTP/2, where we might benefit from a BitVector.

This is probably all a terrible idea, but I thought I'd ask.

@johnynek
Copy link
Collaborator

johnynek commented Oct 22, 2021

I think there are a few angles:

  1. can we unify stream and batch parsing? I don't think you are directly asking about this, but it may be somewhat related. I think no. Here is some discussion: https://discord.com/channels/632277896739946517/867087707536097350/899742140387696690
  2. Can we abstract beyond String (e.g. CharSequence or even a generic type). I think we can, but I assume this may cost in performance (due to loss of inlining) and possibly break compatibility

I could imagine a way to be mostly if not entirely source code compatible by doing something like having

abstract class ParserModule {
  type Input
  protected def ... // all the core Input related code here
  // all current implementation here
  class Parser0[A] {
   ....
  }

  class Parser[A] extends Parser0[A] {
    ...
  }

  object Parser {
    ...
  }
}
package cats

object parse extends ParserModule {
  type Input = String
  def ... // implement some string specific code here
}

then you can do:

package cats.parse

object charseq extends ParserModule[CharSequence] {
 ...
}

If we did this, and could get the tests to compile without any changes and performance within a few percent, I think it would be worth publishing a new version 0.4.x that breaks binary compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants