Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right-to-left text layout support #113

Closed
lojjic opened this issue Apr 5, 2021 · 11 comments
Closed

Right-to-left text layout support #113

lojjic opened this issue Apr 5, 2021 · 11 comments

Comments

@lojjic
Copy link
Collaborator

lojjic commented Apr 5, 2021

In lieu of a full advanced text shaping solution (e.g. harfbuzz.wasm) I'd like some basic out-of-the-box support for RTL layout. Typr includes some level of support for Arabic glyph substitutions already, though I don't know how complete that is.

I've added some very basic RTL layout/wrapping logic already. Let's use this issue to track bugs with that and other gaps in support.

Temporary test page: https://troika-examples.netlify.app/#text-rtl

@boulabiar
Copy link

boulabiar commented Apr 6, 2021

First I want to thank you very much on working on this. Supporting Arabic and RTL layouts will be useful for many people.
I have made some first tests, the standard Arabic text is mostly well supported in cairo, Lemonada, Scheherazade fonts (with no Tachkil).

I was testing these 2 rules for Arabic:

  1. Whether the 3 forms of writing characters are fine (one in the beginning, in the middle, in the end) and connections (ligature).
  2. Tachkil which is the set of indication for pronunciation ُ َ ً ٌ (not used in most of the text you find on internet except in rare cases)

In mirza, some internal letters are not connected (the ending form of the letter is put instead of the internal one or otherwize)
arabicTachkil

With tachkil, some fonts worked fine while others either changed the form of the character next to it. Some worked with a text I've written in the box while haven't with a copied text.

If I use non Arabic letters like parentheses "(", ")" they are switched (needs to be reversed.).

This is a quick test I've made, I need to check more and give you more details where things get weird. (I need also to check the fonts, some fonts don't provide the needed characters)

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 6, 2021

Great, thanks! I'm happy to hear that it's got a decent start.

It's interesting that the result for word-position substitutions varies by font. The word-position detection logic in Typr is always the same, so there must be something different in how those fonts encode their substitutions that Typr doesn't handle. I'll look into Mirza specifically to see if I can determine a difference.

Since I don't know these characters, and thus can't determine correct vs. incorrect myself, it would be immensely helpful if you could give me some targeted test cases with expected results, maybe just single words, something like:

Input text: xxx
Should look like: [image]
Looks correct in font A: [image]
Looks incorrect in font B: [image]

As for the parentheses, I think that's the Paired Brackets part of the Bidi algorithm. I'm not sure yet if that's something I'll tackle on my own, but I'll definitely look into it.

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 8, 2021

I've pushed code with some rough bidirectional layout support. Right now it's purely manual using LRO/RLO/PDF control characters to define directional ranges. Full automatic bidi is much more complicated and I'm still getting my head wrapped around its scope, but being able to lay out the ranges (with line wrapping and selection!) is an important start.

image

@boulabiar
Copy link

boulabiar commented Apr 8, 2021

I'm really sorry I haven't posted a feedback yesterday. I thought about making a full test the weekend, but I think better do things in steps.
Let's start from fonts that work very well (there may some problems in some fonts) I've used the font Scheherazade, but Cairo and Lemonada give the same result.
Mirza and Amiri fonts always show disconnected letters.
The fonts Noto Sans, Roboto don't work at all.

In the picture below, I've used red to mean wrong form of the letter, and green is right form.
The problem appears only when we have Tachkil (vocal notes) or a latin or number character.

  1. Instead of the final form, we have an internal form.
  2. Inside the word, instead of the beginning form we have the internal form. (inside the word some letters don't have ligature)
  3. When we have a number just after the word, (كم2) we keep the ending form.
  4. numbers are reversed.

arabThree

Text I used:
كم2.
كم 2
بِسم اللَّه الرحمن الرحيم
بِسمِ اللَّهِ الرَّحمٰنِ الرَّحيمِ

This answer contains a picture on how letters are drawn
https://www.quora.com/How-can-anyone-read-Arabic-as-the-letters-are-all-connected-to-each-other/answer/Hashem-Mohamed-4

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 8, 2021

Thank you so much for this marked up testcase, that's immensely helpful!!! It really helps me understand things.

Typr's logic for detecting word position is definitely faulty; I've overridden it with logic adapted from opentype.js and the result now seems much better:

image

I'll contribute that Typr fix back upstream after further testing.

The "numbers are reversed" issue will be handled with the BiDi work I've started. For now that can be worked around with explicit LRO/PDF characters.

Keep these kinds of testcases coming! 🤩

@boulabiar
Copy link

That was fast.
Well, I haven't found something that needs more fixing except what can be done using BiDi work you mentioned (number and parentheses can be used widely with Arabic text).
Can you show an exemple on how to use LRO/PDF characters ? I was unable to reproduce the mixed text example myself.

Last thing which is not related to Arabic text but maybe related to SDF rendering, it's that some characters have black inside when 2 characters are connected together like here
image
image
and sometimes within the same character
image
This is only visible with the Lemonda font. Scheherazade, Cairo work fine (maybe because the characters connect in the right spot).
(Looks like a boolean operation in vector rendering tool.)

And thanks again for your work.

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 11, 2021

Thanks! I'm currently working on adding a full bidi algorithm implementation which I think should clear up all the other issues you described so far.

The "BiDi 1" text in the example's dropdown has an example of LRO/PDF, but don't worry about that for now, it's just a stopgap and isn't really correct anyway. True bidi will be better.

The boolean fill issue with that font is the same as discussed in #57 I think.

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 15, 2021

We now have full bidi support!

image

There are a couple bidi snippets in the example page but give it some testing with your own mixed rtl+ltr text.

This turned into a classic example of me going down a rabbit hole; I didn't find a suitable JS bidi implementation and didn't want to bring in fribidi.wasm, so I decided to take a swing at a new JS implementation as a nights and weekends project. Behold https://github.com/lojjic/bidi-js! I need to add some docs there but it's fully compliant according to the official bidi tests, quite small (~10kb) and pretty speedy though it could probably be optimized more.

I'm feeling really happy with this solution and how little it adds to the bundle size. I think we're very close on full RTL support now. I need to revisit the joining forms logic though, I realized that the logic I adapted from opentype.js only handles Arabic scripts but not others that also do joining.

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 16, 2021

I've pushed a more complete implementation of joining-type detection; the logic I'd adapted from Opentype.js proved to be incomplete. The new implementation actually embeds a highly compressed version of the unicode joining type definitions so it should now handle all joinable characters in Arabic and otherwise. It also gives a decent speed bump over the Typr code.

@MichaelHazani since you volunteered to test Hebrew, I think this is ready for you now. You can use this test page where I've added a couple Hebrew fonts to the "font" dropdown, and you can type in your own text. Thanks!

@michaelybecker
Copy link

Looks great!
("well, it seems the test is a success. Punctuation is where it should be; right-alignment looks good. Both fonts display Hebrew the way it should be displayed. Switching to English, i.e this word, doesn't break alignment. Well done!")
image

@lojjic
Copy link
Collaborator Author

lojjic commented Apr 19, 2021

I've released v0.41.0 with the work done here so far. There are undoubtedly other RTL scripts that will need additional specialized handling, but this gives a solid enough baseline that I think we can handle those on a case-by-case basis. And there's always the possibility of allowing an optional Harfbuzz plugin (#91) for some of the more advanced/obscure cases.

Thank you again @boulabiar and @MichaelHazani for your invaluable help here!!! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants