Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poison 5.0 fails to decode Unicode surrogate pairs, Poison 4.0.1 succeeds #217

Open
adworse opened this issue Oct 2, 2023 · 5 comments
Open

Comments

@adworse
Copy link

adworse commented Oct 2, 2023

Reproduction:

Poison.decode!("{\"description\":\"\\uD83D\\uDD32\\uD83D\\uDD34\\uD83D\\uDD33\"}")
@wisq
Copy link

wisq commented Oct 7, 2023

Yeah, running into this as well.

Poison 5.0:

iex(1)> ~S{"\ud83d\udc69\ud83c\udffd\u200d\ud83d\udcbb"} |> Poison.decode()
{:error,
 %Poison.ParseError{
   data: "\"\\ud83d\\udc69\\ud83c\\udffd\\u200d\\ud83d\\udcbb\"",
   skip: 32,
   value: "\\ud83d"
 }}

Poison 4.0.1:

iex(1)> ~S{"\ud83d\udc69\ud83c\udffd\u200d\ud83d\udcbb"} |> Poison.decode()
{:ok, "👩🏽‍💻"}

@irisTa56
Copy link

It seems related to a zero-width joiner between two surrogate pairs.
Note that I could reproduce @wisq's example, but couldn't reproduce @adworse's example.

Shorter examples I tried on Poison 5.0.0:

# with a zero-width joiner
iex(1)> Poison.decode(~S("\uD83D\uDC68\u200D\uD83D\uDC76"))
{:error,
 %Poison.ParseError{
   data: "\"\\uD83D\\uDC68\\u200D\\uD83D\\uDC76\"",
   skip: 20,
   value: "\\uD83D"
 }}
# without a zero-width joiner
iex(2)> Poison.decode(~S("\uD83D\uDC68\uD83D\uDC76"))
{:ok, "👨👶"}
# with a zero-width joiner but the following character is not a surrogate pair
iex(3)> Poison.decode(~S("\uD83D\uDC6E\u200D\u2642"))
{:ok, "👮‍♂"}

@devinus
Copy link
Owner

devinus commented Mar 5, 2024

All of these examples fail in the browser using JSON.parse other than the @adworse's original example which Poison 5.0 also correctly parses.

Poison 5.0 passes all spec tests, so I'm wary of allowing strings to parse that wont parse in a browser environment.

@wisq
Copy link

wisq commented Mar 5, 2024

Both Firefox and Chrome seem fine with my example.
Screenshot 2024-03-05 at 18 30 19
Screenshot 2024-03-05 at 18 30 50

@devinus
Copy link
Owner

devinus commented Mar 6, 2024

@wisq You're right, I must have somehow tested them wrong. Investigating this and @irisTa56's solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants