Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketch for UTF-16 support #95

Closed
wants to merge 2 commits into from
Closed

Conversation

lemire
Copy link
Member

@lemire lemire commented Jan 22, 2023

This cannot be merged because we have no testing for UTF-16.

The proposed strategy here is to convert everything in UTF-8. Note that our entire codebase so far is UTF-8 specific: it makes no sense to accept anything but UTF-8 unless we do a fair amount of reengineering.

It is common these days to have the core engine run in UTF-8 and to view UTF-16 as something you convert from or to.

@lemire lemire requested a review from anonrig January 22, 2023 19:19
Comment on lines +23 to +24
uint32_t value_one = 1;
bool is_little_endian = (reinterpret_cast<char*>(&value_one)[0] == 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is already included in the main branch (with your latest commit)

@anonrig
Copy link
Member

anonrig commented Jan 22, 2023

From my understanding, encoding is only supported in URL spec due to the percent_encode function that takes encoding as a parameter. I tried looking for test fixtures provided by the URL spec that focuses on any other input than UTF-8.

What is ambiguous is that percent-encode after encoding, with encoding (parameter) is only used inside query state. Let us keep the encoding parameter as it is and add encoding support to percent_encode preferably as a parameter.

@lemire
Copy link
Member Author

lemire commented Jan 22, 2023

@anonrig It should be documented that the input is always expected to be UTF-8 and that the encoding parameter refers to something else.

@lemire lemire linked an issue Jan 22, 2023 that may be closed by this pull request
@lemire
Copy link
Member Author

lemire commented Jan 23, 2023

Closing.

@lemire lemire closed this Jan 23, 2023
@anonrig anonrig deleted the dlemire/sketch_for_utf16_support branch March 21, 2023 18:11
@anonrig anonrig restored the dlemire/sketch_for_utf16_support branch March 21, 2023 18:11
@anonrig anonrig deleted the dlemire/sketch_for_utf16_support branch March 21, 2023 18:11
@mertcanaltin
Copy link

greetings I want to work on this subject

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for non-UTF-8 inputs
3 participants