-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Space synthesis breaks Mongolian shaping in cascade through subsetted Noto Sans Mongolian despite unicode-range #4503
Comments
We already do that, don't we? harfbuzz/src/hb-ot-shape-normalize.cc Line 196 in 258f2a2
|
If I read this code right, it asks for U+0020 space glyphs through the glyph lookup callback function. That succeeds when we try the first (from the bottom) from the Noto Sans Mongolian subsets (from the CSS). But what we would need is a callback the other way round: Ask if it's okay to synthesize for U+202F from the unicode range U+0000-00FF. The synthesized space breaks the run, as that shaped glyph completes from the latin set, later the unshaped parts don't form the connection with the U+202F anymore. |
How would your callback know what to answer? Ways I see around this are:
|
The callback would look at unicode-range of the current subset and only if U+202F would be in the unicode-range of it, allow synthesis. So it would practically disable synthesis for U+202F for when the latin subset is in use. - Synthesis in a way violates unicode range as the font is used for a codepoint that may be outside unicode-range. In a way, similar to how we restrict the results of
Do you mean on the HarfBuzz side in shaping or clustering? |
This is already a "problem" because of the composition/decomposition we do. So you might get a letter shaped that is outside of the unicode-range. I want to avoid adding a new callback if we can find another way.
I meant on the HarfBuzz side. |
Works for me if this can be addressed inside of HarfBuzz. Agree that composition/decomposition blurs those lines, too. |
@jfkthame WDYT about not replacing NNSP ever? |
I'd be hesitant to do that -- as long as we have a general behavior of synthesizing fallbacks for known Unicode "space" characters that aren't supported by the chosen font, we should do our best to support all of them. Making this a special-case exception for U+202F in Mongolian script would be OK, I guess. But really the caller should be choosing the appropriate font before calling the shaper. |
That's a chicken & egg issue because of the normalization step we do in HB; hence the shaper-driven approach Chrome takes. |
See details in https://crbug.com/1499787
When shaping
--unicodes 180e,1821,202f,1836,1822
with Noto Sans Mongolian subsetted to latin - the U+202F space is synthesized with the space from the ASCII range, which then leads to breaking shaping down the line when shaping with the Mongolian subset.Space synthetisation is generally useful and we don't want to plainly switch it off, but it would be useful if HarfBuzz could be told to stay within a specified unicode range or call back to the client to ask "can_synthesize_space_for?" with the codepoint from the input buffer for which a space is about to be synthesized.
Your thoughts are welcome on this issue.
The text was updated successfully, but these errors were encountered: