GlyphFixer characters converted twice again provide bad output #90

prankard · 2022-02-14T12:19:19Z

Hey, firstly thanks for the repo. Very very useful/essential when viewing/dealing anything arabic with Unity.
Also, I don't speak/read arabic, which made it hard to debug, so please consider these changes carefully.

When implementing some text from a client. I got given a curious text line which was not rendering in my app, I found some missing characters when rendering the text with RTLTMP (this wasn't the case, in fact RTLTMP was trying to convert some joined arabic characters).
So I've been struggling with rendering 'العالم' and 'اﻟﻌﺎﻟﻢ' (same graphical/converted output, different input characters)

If we take these four unicode chacters 'العالم' they are:
0627, 0644, 0639, 0645
This renders correctly and get's converted by RTLTMP.

However these five characters 'اﻟﻌﺎﻟﻢ' are:
0627, FEDF, FECC, FE8E, FEE2
Which appear correct here in your browser and even in TMP (as it's just viewing the unicode) but in RTLTMP, it trys to GlyphFixer on them again and adjusts them to a character that doesn't exist in the font.

From reading the Arabic Script in Unicode Wiki it looks like (but again I'm not sure, or well versed) that only contextual forms and ligatures happen in Arabic Presentation Forms A+B. Which in my head means that if it's in that range, we shouldn't convert the character.

My solution would be to change this line:
if (iChar < 0xFB50 && TextUtils.IsGlyphFixedArabicCharacter((char)iChar))

However, I'm not sure if this alone will fix it, or even if Arabic Presentation Forms A + B need to be fixed for different reasons (or have misunderstood).

There is could be a potential bug of combining both Arabic Characters (0600-06FF) with Arabic Presentation Forms characters (FB50-FEFF) in the same word but I think it should be fine as normally the supplied unicode text is isolated arabic chracters with ligatures/contextual form presentation forms characters as the arabic characters should be isolated and won't be GlyphFixed anyway.

A useful unit test for this issue would be to convert the text twice and ensure it's the same output.

I hope this makes sense from a non-arabic speaker. Let me know if I've got the wrong idea in any of this.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GlyphFixer characters converted twice again provide bad output #90

GlyphFixer characters converted twice again provide bad output #90

prankard commented Feb 14, 2022

GlyphFixer characters converted twice again provide bad output #90

GlyphFixer characters converted twice again provide bad output #90

Comments

prankard commented Feb 14, 2022