Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with broken surrogate pairs #65

Closed

Conversation

IanWoods1993
Copy link

The test in the class BrokenSurrogatePair demonstrates an issue that appears to be rooted somewhere in the SMILE format. The byte array at the top includes a byte sequence that is outside the BMP (https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane). There is no problem writing the value to a JSON node, but when trying to write it in SMILE format, the program crashes:
com.fasterxml.jackson.databind.JsonMappingException: Broken surrogate pair: first char 0xd92f, second 0x20; illegal combination

This branch isn't intended to fix this problem, just demonstrate its existence in the hopes that it can be more easily identified and fixed. Thanks!

@cowtowncoder
Copy link
Member

@IanWoods1993 Ah! Ok, ignore notes in patch part then -- test case is what matters. Gotcha.

As to problem itself, I'll have a look to make sure I understand.
Thank you for reporthing this.

@cowtowncoder
Copy link
Member

Hmmh. Content you are trying to write is illegal Unicode: there is an unmatched Surrogate Pair, which is not legal for UTF-8 encoding. So I think exception makes sense here.

It is odd that this did not trigger decoding error on JSON side, however; I will see why this happens.

@IanWoods1993
Copy link
Author

Thanks so much for looking at this so quickly. Really appreciate it.

@cowtowncoder
Copy link
Member

@IanWoods1993 No prob, problems wrt low-level encoding/decoding are high priority for me.
I will create a jackson-core issue as a follow-up here.
It is also possible that perhaps read/write features to allow broken surrogate use would make sense (that is, to write illegal surrogate pairs, or decode ones) -- this mostly because Java Strings and chars do allow creation of such combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants