Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming API which works with the basic and compat APIs #25

Open
hkratz opened this issue Apr 26, 2021 · 2 comments
Open

Add streaming API which works with the basic and compat APIs #25

hkratz opened this issue Apr 26, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@hkratz
Copy link
Contributor

hkratz commented Apr 26, 2021

Currently only full slices can be validated using the basic API. Using a streaming API with init(), update(), finish_validation() functions validation could be done on the fly.

With the compat API this can currently be awkwardly emulated by remembering how far the given slice is valid using the Utf8Error::valid_up_to() method.

@hkratz hkratz added the enhancement New feature or request label Apr 26, 2021
@hkratz hkratz self-assigned this May 9, 2021
@hkratz
Copy link
Contributor Author

hkratz commented May 14, 2021

This is partially implemented in v0.1.3 as low-level API in simdutf8::basic::imp.

Still missing:

  • compat API with early validation failure and exact error information.
  • Safe API with implementation auto-selection.

@dralley
Copy link

dralley commented Jul 13, 2022

Question: would it be possible to implement a transparent wrapper around BufReader with such an API? I'm thinking about using it for quick-xml, which I think it would be well suited for. Essentially:

  • Raw bytes get progressively validated as UTF-8 as they stream out of the BufReader
  • The XML parsing operates on raw bytes, searching for the standard characters such as < and >
  • Knowing that the input is valid UTF-8, we can safely use std::str::from_utf8_unchecked() as needed so long as we use the known character boundaries from parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants