Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variation selector as clusters and fallback - improving emoji presentation control in shaper-driven segmentation #4398

Open
drott opened this issue Sep 5, 2023 · 18 comments · May be fixed by #4529

Comments

@drott
Copy link
Collaborator

drott commented Sep 5, 2023

Spun off from: w3c/csswg-drafts#9291

Context: I am thinking about how to improve emoji fallback that takes variation selectors into account and how the shaper-driven fallback in Blink could be made into producing outcomes that respect variation selector control at least for emoji text and image presentation.

Hence, question upfront: How does HarfBuzz treat variation selectors in fallback?

  1. Does it treat them as clusters and return unshaped clusters if the variation selector sequence cannot be matched? or
  2. Does it internally fall back to using the potentially non-matching base character and proceed with shaping, marking the cluster as shaped with the non-variation-matching glyph id and proceed?

From the discussion in that issue, it's clear that UAX 29 does define the variation selector sequences as part of the grapheme cluster. So I am wondering what we would gain if we would modify HarfBuzz to reject unshaped VS clusters and force / prioritize system fallback for those. - If we would then have suitable monochromatic (text presentation) and emoji presentation fallback available, my thinking is that we could achieve a better control over glyph presentation style in those situations?

CC @shivamidow

@behdad
Copy link
Member

behdad commented Sep 5, 2023

[There was an issue about this but I can't find.]

What HarfBuzz does is this:

  • Try the cmap subtable version 14 if it's available; that will consume the variation-selector.
  • If it wasn't available, the variation selector will stay in the glyph stream and GSUB can ligate it.
  • Any lingering variation-selectors will be replaced with zero-width space glyph at the end since they are default-ignorables.

So to the shaper-driven fallback, the cluster looks fully shape even if the variation-selector was simply ignored.

What I can suggest for a fix is to add API hb_buffer_set_not_found_variation_selector_glyph such that you can detect lingering ones by using a custom number (or even just 0). It would make the cluster look unshaped and fallback happens. If you use 0 and no font supports the sequence, then you get a box shown. If you use a custom number you can hide it.

Would that work?

@drott
Copy link
Collaborator Author

drott commented Sep 5, 2023

Do you mean that the proposed API would only affect step 3? And it would only have an effect if used, otherwise the current behavior as you describe would persist (i.e. zwspace replacement in step3?) ?

I am thinking, in the last resort situation, we would want the existing behavior. In other words, when we have exhausted (a future) VS-aware fallback, we would want HarfBuzz to proceed with whatever is in the last resort font at the end of fallback. If yes, then I think such and addition would work. (Not super urgent, I don't have extra cycles to implement VS aware fallback immediately, but I think this API would be useful.)

So what would happen to the base glyph in the glyph stream? That would be there, but the next glyph id would not be replaced by a zero-width space, and they would have the same character index - so the shaper-driven post-shape analysis would flag this cluster as failed, that's what you mean right?

@drott
Copy link
Collaborator Author

drott commented Sep 5, 2023

[There was an issue about this but I can't find.]

Any chance you meant #1292, but that's not exactly filed with regards to adding API.

@khaledhosny
Copy link
Collaborator

Or #2804

Wouldn’t the proposed API break variation selectors other than vs-15/vs-16 that we want to ignore if not supporting by the current font instead of fallback? Or would it up to the client using this API to decide when to try font fallback and when not?

@behdad
Copy link
Member

behdad commented Sep 5, 2023

#515

@drott
Copy link
Collaborator Author

drott commented Sep 5, 2023

Yes, I read #515 again. I am bit a concerned about hb_buffer_set_not_found_variation_selector_glyph though in the sense of doing the client having to figure out an unused or suitable glyph id, which is not a super sharp differentiation in the glyph output since it's "in-band"? Is there still flag space available? Do we want to extend it?

I suppose a naive algorithm might be to go use max(glyph_id) if that's not taken but it's a potential pitfall - but works okay-ish. One thing that came to my mind if it has to be in-band: if HarfBuzz would return or make available for query a map with other gids that are mapped to signifying special purpose things in the output and HB would choose those GIDs itself in a non-conflicting way?

@behdad
Copy link
Member

behdad commented Sep 5, 2023

Yeah I was thinking -1, -2, etc. Those are guaranteed to not be valid glyph indices even with the proposed 24bit glyph extentions.

@behdad
Copy link
Member

behdad commented Sep 5, 2023

Or #2804

Wouldn’t the proposed API break variation selectors other than vs-15/vs-16 that we want to ignore if not supporting by the current font instead of fallback? Or would it up to the client using this API to decide when to try font fallback and when not?

The #515 suggests that it's desirable to fallback for CJK+varselector as well.

@behdad
Copy link
Member

behdad commented Sep 5, 2023

Do you mean that the proposed API would only affect step 3? And it would only have an effect if used, otherwise the current behavior as you describe would persist (i.e. zwspace replacement in step3?) ?

Correct.

I am thinking, in the last resort situation, we would want the existing behavior. In other words, when we have exhausted (a future) VS-aware fallback, we would want HarfBuzz to proceed with whatever is in the last resort font at the end of fallback. If yes, then I think such and addition would work. (Not super urgent, I don't have extra cycles to implement VS aware fallback immediately, but I think this API would be useful.)

The rendering code should ignore glyphs having the not-found-variation-selector value. That takes care of last-resort.

So what would happen to the base glyph in the glyph stream? That would be there, but the next glyph id would not be replaced by a zero-width space, and they would have the same character index - so the shaper-driven post-shape analysis would flag this cluster as failed, that's what you mean right?

Correct.

@drott
Copy link
Collaborator Author

drott commented Sep 5, 2023

The rendering code should ignore glyphs having the not-found-variation-selector value. That takes care of last-resort.

Do you mean with that that the rendering code should treat them as unshaped clusters or do something with the shape result to make them renderable? Outputting unavailable glyphs may lead to unexpected rendering depending on graphics API, for example may lead to TOFU displays on its own.

My understanding is that for last resort then the Chrome rendering code would just not set hb_buffer_set_not_found_variation_selector_glyph and let HB insert the zero-width spaces. I would prefer to mess with shape results as little as possible after they come back from shaping.

@behdad
Copy link
Member

behdad commented Sep 6, 2023

The rendering code should ignore glyphs having the not-found-variation-selector value. That takes care of last-resort.

Do you mean with that that the rendering code should treat them as unshaped clusters or do something with the shape result to make them renderable? Outputting unavailable glyphs may lead to unexpected rendering depending on graphics API, for example may lead to TOFU displays on its own.

Yeah I meant remove those when outputting the glyphstring. But you said you prefer to not do that.

My understanding is that for last resort then the Chrome rendering code would just not set hb_buffer_set_not_found_variation_selector_glyph and let HB insert the zero-width spaces. I would prefer to mess with shape results as little as possible after they come back from shaping.

Fair. If you can do that, that would solve the problem as well.

@drott
Copy link
Collaborator Author

drott commented Dec 15, 2023

CC @tursunova

@drott
Copy link
Collaborator Author

drott commented Dec 18, 2023

Change in #4529 looks good to me. But I am curious to hear @khaledhosny and @jfkthame's opinion on this, in particular because they were both quite vocal in #515. In that issue, @khaledhosny stated that "there was already enough HarfBuzz support" - @khaledhosny what did you mean by that - do you see a way to implement VS fallback without any API additions?

I think for Chrome the approach with this API would work.

@khaledhosny
Copy link
Collaborator

Change in #4529 looks good to me. But I am curious to hear @khaledhosny and @jfkthame's opinion on this, in particular because they were both quite vocal in #515. In that issue, @khaledhosny stated that "there was already enough HarfBuzz support" - @khaledhosny what did you mean by that - do you see a way to implement VS fallback without any API additions?

I think for Chrome the approach with this API would work.

The linked comment suggests using a custom get_variation_glyph callback that handles this, wouldn’t this work for you? e.g. returning true while setting the glyph to 0 or any other custom glyph id that you can handle later. You would get a single glyph for the sequence unlike #4398 where you would still get the base glyph + custom glyph id for the variation selector, but this probably make the client code simpler.

@khaledhosny
Copy link
Collaborator

khaledhosny commented Dec 30, 2023

Here is a quick try of the suggestion above:

#include <hb.h>
#include <stdio.h>
#include <stdlib.h>

constexpr hb_codepoint_t UNSUPPORTED_VS = static_cast<hb_codepoint_t>(-1);

hb_bool_t get_variation_glyph(hb_font_t *font,
                              void *font_data,
                              hb_codepoint_t unicode,
                              hb_codepoint_t variation_selector,
                              hb_codepoint_t *glyph,
                              void *user_data)
{
    hb_font_t *parent = static_cast<hb_font_t *>(font_data);
    if (!hb_font_get_variation_glyph(parent, unicode, variation_selector, glyph))
        *glyph = UNSUPPORTED_VS;
    return true;
}

int main(int argc, char **argv)
{
    if (argc < 2)
    {
        printf("usage: %s font-file\n", argv[0]);
        return 1;
    }

    hb_blob_t *blob = hb_blob_create_from_file(argv[1]);
    hb_face_t *face = hb_face_create(blob, 0);
    hb_font_t *parent = hb_font_create(face);
    hb_font_t *font = hb_font_create_sub_font(parent);

    hb_font_funcs_t *funcs = hb_font_funcs_create();
    hb_font_funcs_set_variation_glyph_func(funcs, get_variation_glyph, nullptr, nullptr);
    hb_font_set_funcs(font, funcs, parent, nullptr);

    hb_buffer_t *buffer = hb_buffer_create();

    char32_t text[] = U"#\uFE0F#\uFE0E#";
    hb_buffer_add_utf32(buffer, reinterpret_cast<uint32_t *>(text), -1, 0, -1);
    hb_buffer_guess_segment_properties(buffer);

    hb_shape(font, buffer, NULL, 0);

    unsigned int glyph_count;
    hb_glyph_info_t *info = hb_buffer_get_glyph_infos(buffer, &glyph_count);
    for (unsigned int i = 0; i < glyph_count; i++)
    {
        if (info[i].codepoint == UNSUPPORTED_VS)
        {
            printf("[UNSUPPORTED]: U+%04X, U+%04X\n", text[info[i].cluster], text[info[i].cluster + 1]);
        }
        else
        {
            char name[128];
            hb_font_glyph_to_string(font, info[i].codepoint, name, 128);
            printf("%s\n", name);
        }
    }

    hb_buffer_destroy(buffer);
    hb_font_destroy(font);

    return 0;
}

With a font that does not support emoji VSs, this gives:

[UNSUPPORTED]: U+0023, U+FE0F
[UNSUPPORTED]: U+0023, U+FE0E
numbersign

With NotoColorEmoji it gives:

gid3191
[UNSUPPORTED]: U+0023, U+FE0E
gid3191

(code probably should check of it is color font or not and ignore one of the variation selectors or the other based on it)

@drott
Copy link
Collaborator Author

drott commented Jan 2, 2024

Thanks for the example. I understand what you meant now. I think technically this code might work for the purpose, but I still see benefit in making that an explicit API call, because that makes then intent clearer, rather than placing the controlling logic into one of the callbacks.

You would get a single glyph for the sequence unlike #4398 where you would still get the base glyph + custom glyph id for the variation selector, but this probably make the client code simpler.

This sentence I did not fully understand, could you elaborate a bit? What's the difference in the output glyph stream with option a) custom get_variation_glyph override returning a not-found-placeholder glyph, and b) Behdad's suggested API?

@khaledhosny
Copy link
Collaborator

You would get a single glyph for the sequence unlike #4398 where you would still get the base glyph + custom glyph id for the variation selector, but this probably make the client code simpler.

This sentence I did not fully understand, could you elaborate a bit? What's the difference in the output glyph stream with option a) custom get_variation_glyph override returning a not-found-placeholder glyph, and b) Behdad's suggested API?

With my get_variation_glyph callback you will get a single glyph for the base character + VS, with Behdad’s API you will get two glyphs: the nominal glyph for the base character and the special glyph ID for the VS (that is what I gather from reading the code, I didn’t actually try it).

@behdad
Copy link
Member

behdad commented Jan 2, 2024

With my get_variation_glyph callback you will get a single glyph for the base character + VS, with Behdad’s API you will get two glyphs: the nominal glyph for the base character and the special glyph ID for the VS (that is what I gather from reading the code, I didn’t actually try it).

Correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants