Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify JsonStreamReader by always collecting string and number values in Vec / String #32

Open
Marcono1234 opened this issue Dec 13, 2023 · 1 comment
Labels
enhancement New feature or request internal Issue or pull request which is internal and has no user-visible effect performance Problem with performance or suggestion for performance improvement

Comments

@Marcono1234
Copy link
Owner

Problem solved by the enhancement

The current JsonStreamReader implementation tries to serve string and number values from the reader buffer, and only if that is not possibly (e.g. in case of escape sequences) falls back to collecting the value in a Vec (see JsonStreamReader::value_bytes_buf).

Enhancement description

Consider always collecting the value in a Vec / String and then depending on whether a borrowed or owned string is requested, either return a reference to this buffer, or replace the buffer with an empty one and return it (maybe shrinking / copying it in case its capacity is extremely larger than its length).

This would have the advantage that it will simplify the JsonStreamReader implementation, and possibly also allows removing the JSON reader data buffer and only relying on the underlying reader for buffering, if desired (see also #19 (comment)).

However, it has to be checked if the performance is noticeably negatively affected by this approach.

@Marcono1234 Marcono1234 added enhancement New feature or request internal Issue or pull request which is internal and has no user-visible effect performance Problem with performance or suggestion for performance improvement labels Dec 13, 2023
@Marcono1234
Copy link
Owner Author

Marcono1234 commented Feb 4, 2024

Could probably have a 'bytes consumer' trait similar to this:

trait BytesConsumer {
    fn hasSpaceFor(&self, bytesCount: usize) -> bool;
    
    fn addByte(&mut self, b: u8);
    
    fn addBytes(&mut self, b: &[u8]);
}

Where for Vec<u8> and DiscardingBytesConsumer the hasSpaceFor method always unconditionally returns true, but for &mut [u8] it checks how much space is left.

Could then use hasSpaceFor as loop condition before reading the bytes. That will hopefully allow reusing the same code for regular value reading and StringValueReader / transfer_to and skipping. Currently there is code duplication for this.

Regarding #21, could maybe have a 'drop guard' in the reading implementation which in case of an error or panic clears the Vec<u8> or &[u8] so that the incomplete UTF-8 data cannot be observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal Issue or pull request which is internal and has no user-visible effect performance Problem with performance or suggestion for performance improvement
Projects
None yet
Development

No branches or pull requests

1 participant