Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode VS and advocating fonts/layout engines #766

Open
kojiishi opened this issue Apr 6, 2024 · 7 comments
Open

Unicode VS and advocating fonts/layout engines #766

kojiishi opened this issue Apr 6, 2024 · 7 comments

Comments

@kojiishi
Copy link

kojiishi commented Apr 6, 2024

This issue is a continuation of kojiishi/unicode-auto-spacing#17.
@macnmm @kenlunde @asmusf @kidayasuo were mentioned in the original discussion.

@macnmm
Copy link

macnmm commented Apr 7, 2024

As I replied in the CSS thread, I am saying that Japanese layout behavior logic currently depends on knowledge of the actual glyphs designs in fonts and their layout intent, and that there are no standards mapping a given codepoint to a glyph that has:

  • a standard width (e.g. full width monospaced, half-width monospaced, half-width proportional, etc)
  • a standard spacing behavior as defined in the JIS X 4051 standard, including minimum, desired, maximum, compression priority, expansion priority.
  • a standard vertical writing posture, which could be derived from item 1 if we agree a posture can be inferred from a width class, but having it separate is most flexible

Fonts, having glyphs for several writing scripts at once, are all over the place in how they design default glyphs for a given codepoint and how they add alternates. This makes it very difficult for layout engines not to rely on unique and fragile layout heuristics.

@asmusf
Copy link

asmusf commented Apr 7, 2024

Which brings us back to viewing the problem as one of defining the meaning and interpretation of SPACE CHARACTERS in the context of EA layout. There are many space characters of different width, from zero to some large widths.

Unicode is a plain text standard, therefore, we need to focus on questions like:

  • what space characters are required in certain situations
  • what space characters, if any, may be expected to be ignored by layout engines
  • what space characters may represent a request to the layout engine to override some behavior

In my view, there's nothing wrong with treating space characters (most of them) as "requests", rather than as "commands". There may be a few that we designate as "overrides" in which case we would strongly expect them to be honored.

There's a separate question whether we want to describe a "default" heuristic with definite spacing values and spacing contexts (and assuming particular glyph designs). If such a default could lead to text in common situations to be presented as expected (whether for common UI fonts, or common default office document fonts) that might be a reason to capture that information.

It would then also allow other cases to be specified as delta.

However, foremost our task is to help settle what space characters are supposed to signal when each of them are present, and where they are required, optional, or discouraged. (And which ones should carry an override intent).

I read Nat's comment in a way that says, if the font choices and the glyph design in the chosen fonts are unusual enough, the layout algorithm would have to be designed to match, so that a "generic" solution that starts with prescribing spacing and adjustment of spacing is of limited generality. But it would be nice if such algorithms could be applied to the same text backbone, just as we like to have HTML elements and contents different from the CSS so the latter can be swapped out w/o changes to the former.

@kojiishi
Copy link
Author

kojiishi commented Apr 8, 2024

@macnmm Apologies in advance but I'm not sure if I understand your poitns yet.

IIUC, you think adding VS to distinguish full/half helps resolving whether a code point is full or half. Is that correct?

If yes, I'm not opposed to the idea, but if your read the "Overview and Scope" section of L2/24-057, I think you will find that it's beyond the scope of this document. Currently, it's for reliable document interchange, a legible default, and to be the foundation for high-level protocols to build their features upon. These scopes don't require disntinguish full/half accurately. It also mentions that publishing-material quality is out of scope. The scope is similar to UAX50, which also doesn't require VS to distinguish full/half today, and IIUC the scope is currently supported.

If you want to proceed the idea, I'd suggest you define the new scope and goals, and write up a proposal. The proposal would then claim that it can extend the scope of UAX11, UAX50, and Auto Spacing for, I'm not sure what your exact scope is, but I guess like compatibility with existing documents. The UTC has accepted Emoji presentation VS and CJK quotes VS, so if you can drive the discussion, I think there's a possibility.

Also IIUC, @asmusf's reply is about space characters, so I think they're better for #771 and #772.

@macnmm
Copy link

macnmm commented May 6, 2024

In response to @asmusf above, I have not thought of the space characters in Unicode as "requests" per se, but as functional to layout in a couple special ways: one, as delimiters for line breaking (excepting the joining ones), and two, as specific spacers that also can be expanded if needed for [Latin-like] justification (excepting the fixed ones). Neither of these special behaviors apply when spacing Latin and Japanese from each other, for example. We need a new behavior for that, and so far it is solely accomplished by the layout engine without help from the Unicode model. Why is it distinct from the existing space character behaviors? Japanese J-Latin spacing rules generally are a different desired width than the space character in the font (the J-L spacing is not used to delimit Latin words); it can be compressed when fitting more characters on the line, and needs to be done in a certain order; expansion is also allowed, but perhaps not first, and not to extent space character expansion is allowed.

@asmusf
Copy link

asmusf commented May 7, 2024

@macnmm: how do you override the automagical action of the layout engine if it produces nonsense? Should we suggest using ZWJ or ZW Space or ZWNBSP (or any other character) to indicate that the author is aware that the algorithm will try to put a space but should desist?

We have a number of those characters that we use for line-breaking and word-breaking because the algorithms aren't perfect.

To me, it would not be enough to have an option in the user interface of some high-end design software, which then squirrels away that override in a proprietary file format. That's not interoperable. (Or leaving it to markup).

Also, should we expect that any fixed-width spaces (if present) are honored (and to what degree) by the layout software? So, if I insert a THIN SPACE, will that result in different output?

Just to be clear, I'm happy with the default of not doing anything in the ordinary case; I'm talking about the exceptional case where "doing nothing" leads to an undesired outcome and some guidance by the user may be needed.

@macnmm
Copy link

macnmm commented May 14, 2024

Generally my stance is that if you wish to override the layout engine's behavior, that doing so at the plain text layer is a bit strange -- the layout engine should provide overrides for any controversial behaviors, should they be implementing those behaviors properly... I agree that where to break the line or join words can be overridden in the text layer. But spacing and justification seem to me to belong in the layout and layout overrides layer.

@asmusf
Copy link

asmusf commented May 14, 2024

We are underspecifed in Unicode on what an author can expect as result of making a choice among the many different space characters. This is not just an East Asian problem, but a bit of a general one.

If we can't agree on recommendation of which character to choose for which purpose, then we stop having interoperability. Because the net effect is that everyone uses "whatever seems to work", but different implementations may give unexpected results if presented with the wrong input sequence.

This may be fine for creating frozen layout format documents, but is not ideal for stuff that gets rendered in any environment different from the authoring system.

Now, I'm quite comfortable with being somewhat general. But the various space characters already differ to the degree that they participate in spacing adjustments and justification. (Or should differ, anyway, in any well-behaved layout, independent of the precise details).

99.9% are going to be either U+0020 SPACE / U+00A0 NO-BREAK SPACE and these are the ones that are handled most flexibly by any layout system. No need to change that (other than suggesting when not to insert a SPACE to mimic the result of a layout system in plain text).

It's the dozen or so other spaces for which we need to describe better which of their characteristics will be respected (fully or in tendency) by a well-behaved layout system and to give indications when to use or not to use any of them in plain text (that is later to be rendered).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants