internal IdentityEncoder should be more clear with rune handling #360

gunnsth · 2020-05-23T11:06:36Z

The IdentityEncoder is used to represent Identity-H and Identity-V encodings, that are used to map 2-byte character codes to 2-byte CIDs:

The horizontal identity mapping for 2-byte CIDs; may be used with CIDFonts
using any Registry, Ordering, and Supplement values. It maps 2-byte character
codes ranging from 0 to 65,535 to the same 2-byte CID value, interpreted highorder
byte first.

When used with TrueType CID fonts, the CID values typically map directly to GID (glyph indices), where the CID value does not have any unicode meaning. Thus it can be confusing that it implements the TextEncoder interface, having methods such as CharcodeToRune where it is returning a "rune" that is not actually the utf-8 rune but just the integer value of the CID... This is confusing and can easily lead to problems.

We probably need to clarify the terminology and maybe split the TextEncoder interface up. The Identity-H should just map bytes to CIDs and such. If a CIDToGIDMap is defined that also needs to be used.

The text was updated successfully, but these errors were encountered:

gunnsth added the style/design label Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal IdentityEncoder should be more clear with rune handling #360

internal IdentityEncoder should be more clear with rune handling #360

gunnsth commented May 23, 2020

internal IdentityEncoder should be more clear with rune handling #360

internal IdentityEncoder should be more clear with rune handling #360

Comments

gunnsth commented May 23, 2020