Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use codepoint index for indices/1, index/ 1 and rindex/1 #3065

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wader
Copy link
Member

@wader wader commented Mar 12, 2024

Previsouly byte index was used.

Fixes #1430, fixes #1624, fixes #3064.

while ((p = _jq_memmem(p, (jstr + jlen) - p, idxstr, idxlen)) != NULL) {
a = jv_array_append(a, jv_number(p - jstr));
while (lp < p) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this even more efficient i guess we would need to count codepoints inside memmem somehow

@wader
Copy link
Member Author

wader commented Mar 12, 2024

Haven't entirely convinced myself yet that it should be fine to look for matches using the byte representation. Assuming both the needle and haystack is valid utf-8 i'm thinking it should be fine because of utf-8's self-synchronization property.

Update: now looking at jv_string_slice

jq/src/jv.c

Line 1374 in c95b34f

jv jv_string_slice(jv j, int start, int end) {
i'm not sure anymore if one can assume strings are valid utf-8 or is the invalid utf-8 checks not really needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants