Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlyphFixer characters converted twice again provide bad output #90

Open
prankard opened this issue Feb 14, 2022 · 0 comments
Open

GlyphFixer characters converted twice again provide bad output #90

prankard opened this issue Feb 14, 2022 · 0 comments

Comments

@prankard
Copy link

Hey, firstly thanks for the repo. Very very useful/essential when viewing/dealing anything arabic with Unity.
Also, I don't speak/read arabic, which made it hard to debug, so please consider these changes carefully.

When implementing some text from a client. I got given a curious text line which was not rendering in my app, I found some missing characters when rendering the text with RTLTMP (this wasn't the case, in fact RTLTMP was trying to convert some joined arabic characters).
So I've been struggling with rendering 'العالم' and 'اﻟﻌﺎﻟﻢ' (same graphical/converted output, different input characters)

If we take these four unicode chacters 'العالم' they are:
0627, 0644, 0639, 0645
This renders correctly and get's converted by RTLTMP.

However these five characters 'اﻟﻌﺎﻟﻢ' are:
0627, FEDF, FECC, FE8E, FEE2
Which appear correct here in your browser and even in TMP (as it's just viewing the unicode) but in RTLTMP, it trys to GlyphFixer on them again and adjusts them to a character that doesn't exist in the font.

From reading the Arabic Script in Unicode Wiki it looks like (but again I'm not sure, or well versed) that only contextual forms and ligatures happen in Arabic Presentation Forms A+B. Which in my head means that if it's in that range, we shouldn't convert the character.

My solution would be to change this line:
if (iChar < 0xFB50 && TextUtils.IsGlyphFixedArabicCharacter((char)iChar))

However, I'm not sure if this alone will fix it, or even if Arabic Presentation Forms A + B need to be fixed for different reasons (or have misunderstood).

There is could be a potential bug of combining both Arabic Characters (0600-06FF) with Arabic Presentation Forms characters (FB50-FEFF) in the same word but I think it should be fine as normally the supplied unicode text is isolated arabic chracters with ligatures/contextual form presentation forms characters as the arabic characters should be isolated and won't be GlyphFixed anyway.

A useful unit test for this issue would be to convert the text twice and ensure it's the same output.

I hope this makes sense from a non-arabic speaker. Let me know if I've got the wrong idea in any of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant