Question: Should input string be UTF-8 normalized? #127

dermoth · 2022-01-04T15:50:38Z

Hi,

I came across this tool while searching my own version of a NATO speller. Of course I got curious; I'm not really a web dev so mine is much more simplistic but one thing I noticed is that you don't seem to normalize UTF-8 before converting to NATO. Normalization would allow removing accents before conversion or copying them as-is for NATO, and is required to get consistent results for other alphabets that include accented characters. See my code (web page) for an example

Testing

To test you can paste this on your browser's js console to generate NFC and NFD version of accented characters (providing é and Ë as examples):

'é'.normalize('NFD')
'é'.normalize('NFD')
'Ë'.normalize('NFC')
'Ë'.normalize('NFD')

Then copy/paste the output into https://cryptii.com/pipes/nato-phonetic-alphabet

The issues

The denormalized é prints as Echo ́ (in my version I strip the accents from the denormalized form which can be matched using /[\u0300-\u036f]/g).
The diaeresis of the denormalized Ë doesn't even print, I see a square box.

I think on the 2nd issue this is because of the way you iterate over the characters; see line 54 of my code; this is how I loop over multiplanar unicode characters... Using just index on a string iterates over each individual element of the denormalized form.

For further reading about normalisation forms: https://unicode.org/reports/tr15/

Bug #17 also needs to be taken into consideration - it could actually be done in an standalone UTF-8 codec, else the spelling alphabet codec could have this as a parameter.

The text was updated successfully, but these errors were encountered:

dermoth added the question label Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Should input string be UTF-8 normalized? #127

Question: Should input string be UTF-8 normalized? #127

dermoth commented Jan 4, 2022 •

edited

Question: Should input string be UTF-8 normalized? #127

Question: Should input string be UTF-8 normalized? #127

Comments

dermoth commented Jan 4, 2022 • edited

Testing

The issues

dermoth commented Jan 4, 2022 •

edited