Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add byteLength method and hasState property #258

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Meigyoku-Thmn
Copy link

I would like to add the byteLength method and the hasState property.

byteLength method:

  • Guaranteed to be faster than iconv.encode(..).length because it doesn't allocate an entire buffer.
  • Usecase: you might have a very long string and don't want to create a very long buffer to encode it, but you want to write that string into a binary file with a prefix byte length (maybe in 7-bit format likes this method in .NET). So you can use this method, then you encode the string gradually on its' substrings by an encoder.

hasState property:

  • If reading binary data to decode, or encoding to write to binary file, you might want know if the decoder or encoder has any accumulated state inside them so you can decide if there is an error or not.

ashtuchkin and others added 19 commits July 14, 2020 13:13
 * Add two backends: node & web
 * Convert core lib files to use the backends (and not use Buffer)
 * Convert utf16 codec as an example
 * Add testing for both node side and webpack
 * Bump Node.js minimal supported version to 4.5.0 and modernize some
   existing code. This will allow us to get rid of
   safer-buffer, our only dependency.
Three major reasons for reimplementing UTF-16 and not use native codec:
 1. We want to remove StringDecoder & Buffer references due to ashtuchkin#235.
 2. StringDecoder is inconsistent with handling surrogates on Node v6-9
 3. NPM module string_decoder gives strange results when processing chunks -
    it sometimes prepends '\u0000', likely due to a bug.

Performance was and is a major concern here. Decoder shouldn't be affected because it uses
backend methods directly. Encoder is affected due to introducing character-level loop. It's
still very fast (~450Mb/s), so I'm not too worried. If needed, we can make it about 4x faster
in Node.js by introducing a dedicated backend method. Browser speeds will be the same.
…uite

To do that I've added a generation step and store the data in test/tables/ folder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants