utf8makevalid : test to identify sequence length and possible values not sufficient #118

JPDelprat · 2023-12-17T21:35:49Z

Hello,

In utf8makevalid, you use the following test to identify a 4 sequence bytes

"if (0xf0 == (0xf8 & *read))"

This is not correct if you suppose that you can have any invalid string as an input parameter, since only a few values in f0-ff ranges are valid.

Moreover, for valid values in f0-ff ranges, possible values for second byte are not the same one. For example, with f0, valid range for second byte is 90..bf, instead of 80..bf

Regards

sheredom · 2023-12-23T20:49:47Z

I'd happily accept a PR that tightened this up with the supporting testing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utf8makevalid : test to identify sequence length and possible values not sufficient #118

utf8makevalid : test to identify sequence length and possible values not sufficient #118

JPDelprat commented Dec 17, 2023

sheredom commented Dec 23, 2023

utf8makevalid : test to identify sequence length and possible values not sufficient #118

utf8makevalid : test to identify sequence length and possible values not sufficient #118

Comments

JPDelprat commented Dec 17, 2023

sheredom commented Dec 23, 2023