Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem reading little-endian Unicode #62

Open
samalone opened this issue Apr 13, 2014 · 1 comment
Open

Problem reading little-endian Unicode #62

samalone opened this issue Apr 13, 2014 · 1 comment
Labels

Comments

@samalone
Copy link

I have an NSString that was converted from little-endian Unicode NSData. The first character of the string is the Unicode byte-order marker (BOM), which in little endian is 0xFF 0xFE.

The first time _loadMoreIfNecessary calls initWithBytes:length:encoding:, the BOM is in the buffer and the buffer is read correctly. However, when the second buffer is converted there is no BOM, and the data is treated as big-endian. This means that the second and all subsequent buffers of data are corrupted.

In one sense, the bug is that _loadMoreIfNecessary is converting each buffer of text independently, rather than maintaining conversion context from one buffer to the next. In general, text encodings require context to handle multi-byte characters, byte order markers and such. A more robust version of this function would use the lower-level Text Encoding Converter, which maintains context from one buffer to the next.

But an easier fix might be to change initWithCSVString: to use a fixed encoding like NSUTF16BigEndianStringEncoding rather than calling [csv fastestEncoding], which evaluates to NSUnicodeStringEncoding which is ambiguous. I believe that using a unambiguous encoding would prevent the error, even if its not as general a solution as using Text Encoding Converter.

@davedelong
Copy link
Owner

The convenience initializers now use a fixed encoding (NSUTF8StringEncoding), but this would still be an issue for NSInputStreams provided to the designated initializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants