You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And the current code selects the first one (unicodeEncoding):
pidPsid := u32(table, offset)
// We prefer the Unicode cmap encoding. Failing to find that, we fall
// back onto the Microsoft cmap encoding.
if pidPsid == unicodeEncoding {
bestOffset, bestPID, ok = offset, pidPsid>>16, true
break
} else if pidPsid == microsoftSymbolEncoding ||
pidPsid == microsoftUCS2Encoding ||
pidPsid == microsoftUCS4Encoding {
bestOffset, bestPID, ok = offset, pidPsid>>16, true
// We don't break out of the for loop, so that Unicode can override Microsoft.
}
and none of the >U+FFFF characters are available.
Should we prefer microsoftUCS4Encoding to the
16-bit-only unicodeEncoding ?
An alternative is to also accept PID = 0 (Unicode), PSID = 4 (Unicode 2.0, full repertoire, i.e. not restricted to the Basic Multilingual Plane). FWIW, ttx shows me 5 cmap subtables, not 4, for HanaMinB.ttf.
Also, the code as is prefers Unicode to Microsoft cmap encodings, but I can't remember the reason why, and maybe we don't need to. We should probably prefer cmap format 12 tables over cmap format 4, though, for the greater (non-BMP) range.
I'm dumping the glyphs from HanaMinB.ttf
( available at
https://osdn.net/frs/redir.php?m=pumath&f=%2Fhanazono-font%2F64385%2Fhanazono-20160201.zip
), where most of the characters are > U+FFFF.
Enclosed please find the output of
ttfdump -t cmap HanaMinB.ttf
According to the ttfdump output, this ttf file contains 4 cmap
subtables, covering the 4 encodings defined in truetype.go:
unicodeEncoding = 0x00000003 // PID = 0 (Unicode), PSID = 3 (Unicode 2.0)
microsoftSymbolEncoding = 0x00030000 // PID = 3 (Microsoft), PSID = 0 (Symbol)
microsoftUCS2Encoding = 0x00030001 // PID = 3 (Microsoft), PSID = 1 (UCS-2)
microsoftUCS4Encoding = 0x0003000a // PID = 3 (Microsoft), PSID = 10 (UCS-4)
And the current code selects the first one (unicodeEncoding):
pidPsid := u32(table, offset)
// We prefer the Unicode cmap encoding. Failing to find that, we fall
// back onto the Microsoft cmap encoding.
if pidPsid == unicodeEncoding {
bestOffset, bestPID, ok = offset, pidPsid>>16, true
break
} else if pidPsid == microsoftSymbolEncoding ||
pidPsid == microsoftUCS2Encoding ||
pidPsid == microsoftUCS4Encoding {
bestOffset, bestPID, ok = offset, pidPsid>>16, true
// We don't break out of the for loop, so that Unicode can override Microsoft.
}
and none of the >U+FFFF characters are available.
Should we prefer microsoftUCS4Encoding to the
16-bit-only unicodeEncoding ?
HanaMinB.ttf-dump-cmap.txt
The text was updated successfully, but these errors were encountered: