|
7 | 7 | \usepackage{fancyhdr}
|
8 | 8 | \usepackage{graphicx}
|
9 | 9 | \usepackage{hhline}
|
| 10 | +\usepackage{todonotes} |
| 11 | + |
| 12 | +\usepackage{draftwatermark} |
| 13 | +\SetWatermarkText{Draft} |
| 14 | +\SetWatermarkScale{4} |
| 15 | + |
| 16 | +\usepackage{geometry} |
| 17 | +\geometry{legalpaper, margin=1in} |
| 18 | + |
| 19 | +\usepackage[hidelinks]{hyperref} |
| 20 | +\hypersetup{ |
| 21 | + colorlinks=true, |
| 22 | + linkcolor=blue, |
| 23 | + filecolor=magenta, |
| 24 | + urlcolor=cyan |
| 25 | +} |
10 | 26 |
|
11 | 27 | \usepackage{xcolor}
|
12 | 28 | \definecolor{light-gray}{gray}{0.95}
|
13 | 29 |
|
14 | 30 | \title{Unicode in Terminals \\
|
15 | 31 | a proposal to standardizing basic Unicode features}
|
16 | 32 | \author{Christian Parpart}
|
17 |
| -\date{2020-07-27 (draft, revision 0)} |
| 33 | +\date{2021-09-04 (draft, revision 1)} |
18 | 34 |
|
19 | 35 | \newcommand{\code}[1]{\colorbox{light-gray}{\texttt{#1}}}
|
20 | 36 |
|
| 37 | +\newcommand{\Unicode}{\textbf{Unicode 13}} |
| 38 | + |
21 | 39 | \newcommand{\DECRQM}[1]{\code{CSI ? #1 \$ p}}
|
22 | 40 | \newcommand{\DECSET}[1]{\code{CSI ? #1 h}}
|
23 | 41 | \newcommand{\DECRST}[1]{\code{CSI ? #1 l}}
|
24 | 42 |
|
25 |
| -\newcommand\VtModeNum{2027} % Grapheme cluster mode Id |
26 |
| -\newcommand{\GCON}{\DECSET{\VtModeNum{}}} % DECSM for enabling grapheme cluster processing |
27 |
| -\newcommand{\GCOFF}{\DECRST{\VtModeNum{}}} % DECRM for disabling grapheme cluster processing |
| 43 | +\newcommand\VtModeNum{2027} % mode Id that is used by this specification |
| 44 | +\newcommand{\GCON}{\DECSET{\VtModeNum{}}} % DECSM for enabling grapheme cluster processing |
| 45 | +\newcommand{\GCOFF}{\DECRST{\VtModeNum{}}} % DECRM for disabling grapheme cluster processing |
28 | 46 | \newcommand{\GCTEST}{\DECRQM{\VtModeNum{}}} % DECRQM for requesting current grapheme cluster processing mode
|
29 | 47 |
|
30 | 48 | \begin{document}
|
|
35 | 53 |
|
36 | 54 | \section{History and current state}
|
37 | 55 |
|
38 |
| -Historically, only 7-bit characters were supported by terminals and different languages by selecting |
39 |
| -their respective code pages. |
40 |
| -Later on |
| 56 | +Historically, only 7-bit characters with C0 control codes |
| 57 | +were supported by terminals and different languages |
| 58 | +by selecting their respective code pages. |
41 | 59 |
|
42 |
| -\begin{itemize} |
43 |
| - \item Back in the days: 7bit ASCII text, 8bit ASCII text, many code pages for switching character set |
44 |
| - \item Then Unicode came, the one to rule them all. But Terminals are incompatible. |
45 |
| - \item Unicode UTF-8 came, could be incooperated into terminals, |
46 |
| -\end{itemize} |
| 60 | +Later on this was extended to 8-bit ASCII and along with C1 control codes. |
47 | 61 |
|
48 |
| -TBD. |
| 62 | +With the introduction of Unicode there were no need to have codepages anymore, |
| 63 | +but the Unicode spec was not explicitly designed to also cover terminals, |
| 64 | +except that C0 and C1 codepoints were preserved. |
49 | 65 |
|
50 |
| -... |
| 66 | +With Unicode UTF-8 it was possible to at least pass Unicode characters to the |
| 67 | +terminal, but rendering of a few characters as well as their respective |
| 68 | +cursor placement is not defined in the Unicode standard. |
51 | 69 |
|
52 |
| -Is Grapheme cluster handling an issue? Only when the application makes assumptions about |
53 |
| -the cursor placement after having sent out a sequence of Unicode codepoints that form a grapheme |
54 |
| -cluster. |
| 70 | +Also, Unicode introduced codepoint sequences that are mapping to |
| 71 | +a single user perceived character - so called grapheme clusters. |
| 72 | +The terminal has never attempted any formalization on how to deal with |
| 73 | +grapheme clusters, variation selectors, their east asian width, nor |
| 74 | +emoji and emoji presentation handling. |
55 | 75 |
|
56 |
| -\section{Backwards Compatibility} |
| 76 | +This spec tries to address some of the problems terminals are suffering |
| 77 | +with Unicode today. |
57 | 78 |
|
58 |
| -TBD. |
| 79 | +\section{Backwards Compatibility} |
59 | 80 |
|
60 | 81 | basic points are:
|
61 | 82 | Everything is disabled by default, so legacy apps don't break more than they
|
62 | 83 | used to break already.
|
63 | 84 |
|
| 85 | +Backwards compatibility is retained by leaving everything as undefined |
| 86 | +as it is without this specification. |
| 87 | + |
| 88 | +The application can test for the availability of this feature |
| 89 | +and has to explicitly enable it in order to get the set of properties |
| 90 | +as defined in this document guaranteed. |
| 91 | + |
64 | 92 | \section{Future Compatibility and Stability}
|
65 | 93 |
|
66 |
| -TBD. |
| 94 | +Unicode itself had a major breakage at version between version 8 and 9 |
| 95 | +with regards to some codepoints having their east asian width changed. |
| 96 | + |
| 97 | +It is feared that this may happen at any time in the future again, although, |
| 98 | +there were no other width change since then. |
| 99 | + |
| 100 | +This specification requires a few Unicode algorithms to be mandatory implemented. |
| 101 | +These may or may not change in the future. |
| 102 | + |
| 103 | +\todo{Pass on version using sub-parameters with the unicode version |
| 104 | +or just allocate a new mode number in case of major changes?} |
67 | 105 |
|
68 | 106 | \section{Mode Detection}
|
69 | 107 |
|
70 |
| -\GCTEST can be used to test the which mode is currently active or if this feature is not available |
71 |
| -at all - such as with non-supporting terminals or with terminals that have this support disabled. |
| 108 | +\GCTEST can be used to test if mode is currently active |
| 109 | +or if this feature is not active (or event available at all) - |
| 110 | +such as with non-supporting terminals |
| 111 | +or with terminals that have this support disabled. |
72 | 112 |
|
73 | 113 | \section{Mode Switching}
|
74 | 114 |
|
75 | 115 | \begin{itemize}
|
76 |
| - \item \GCON{} for enabling grapheme clustering |
77 |
| - \item \GCOFF{} for disabling grapheme clustering |
| 116 | + \item \GCON{} for ensuring conformance to all rules as defined by this specification |
| 117 | + \item \GCOFF{} for undefined behavior |
78 | 118 | \end{itemize}
|
79 | 119 |
|
80 | 120 | \section{Feature Detection}
|
81 | 121 |
|
82 |
| -\DECRQM{\VtModeNum} can not just be used for testing the current mode but this VT sequence will also |
83 |
| -respond with a specific code indicating that this mode (and thus this feature) is not supported. |
| 122 | +\GCTEST can be used for testing the current state of this mode as well |
| 123 | +as, if this mode is not supported at all, this will be indicated in the reply as |
| 124 | +well. |
84 | 125 |
|
| 126 | +\todo{Do we want to also expose the feature availability via \code{DA1}?} |
85 | 127 | The \code{DA1} could be extended to also indicate support, but \code{DECRQM} is sufficient.
|
86 | 128 |
|
87 |
| -There is a \textbf{feature detection} spec in the works, that could be used in the future for |
88 |
| -detecting this feature, too. |
| 129 | +\section{Semantics} |
89 | 130 |
|
90 |
| -\section{Column width of a grapheme cluster} |
| 131 | +The following set of semantics \textbf{MUST} be adhered to if this mode is enabled. |
| 132 | +If the mode \code{\VtModeNum} is not set the behavior is as undefined as |
| 133 | +if this specification was not implemented at all in order to retain |
| 134 | +behavior of current terminals and their legacy applications. |
91 | 135 |
|
92 |
| -\begin{itemize} |
93 |
| - \item TODO |
94 |
| - \item TBD |
95 |
| -\end{itemize} |
| 136 | +\subsection{Grapheme Cluster} |
| 137 | + |
| 138 | +With this mode enabled, the terminal \textbf{MUST} support grapheme clusters |
| 139 | +in conformance to algorithm as described in \ref{ref:UTS-29}. |
| 140 | + |
| 141 | +This implies that every consecutively written character on the terminal |
| 142 | +stream that is non-breakable as per \ref{ref:UTS-29} will |
| 143 | +always end up in the same terminal's grid cell. |
| 144 | + |
| 145 | +Therefore, extending a grapheme cluster with consecutively added codepoints |
| 146 | +will not move the cursor except for variation selector 16 (VS16) that may |
| 147 | +have caused the width of the grapheme cluster to change to wide (2 grid cells). |
| 148 | + |
| 149 | +When the cursor moves to a grid cell that contains a complete or incomplete |
| 150 | +grapheme cluster, this grid cell's contents will be erased and overwritten |
| 151 | +rather then textually concatinated. |
| 152 | + |
| 153 | +Therefore cursor movement semantics of the terminal remain unchanged. |
| 154 | + |
| 155 | +\subsection{Emoji} |
| 156 | + |
| 157 | +Emoji symbols are always rendered in square aspect ratio |
| 158 | +(as proposed by \ref{ref:UTS-51}), |
| 159 | +implying a East Asian Width of Wide, 2 grid cells. |
| 160 | + |
| 161 | +ZWJ emoji are required to be displayed as a single image with a width of 2 |
| 162 | +grid cells. |
| 163 | + |
| 164 | +The alternate display of ZWJ emoji in a decomposed sequence of sub-images |
| 165 | +must not be used as a fallback as it will break cursor movemeent guarantees. |
| 166 | + |
| 167 | +If a ZWJ emoji cannot be rendered the display behavior is undefined - |
| 168 | +for example, a unicode replacement character \code{U+FFFD} could be |
| 169 | +displayed instead. |
| 170 | + |
| 171 | +In emoji emoji presentation, the cursor will always move by 2 grid cells. |
| 172 | + |
| 173 | +The contents of the skipped grid cell is undefined. \todo{really? Maybe we want to be explicit here.} |
| 174 | +Good practise would though be to have this cell be cleared and its SGR set |
| 175 | +to the currently active SGR attributes. |
| 176 | + |
| 177 | +\subsection{Variation Selector 16} |
| 178 | + |
| 179 | +VS16 promotes the grapheme cluster to emoji emoji presentation, |
| 180 | +implying that this will force the grapheme cluster's width to be 2, |
| 181 | +which may possibly cause reflowing of that symbol to the next line |
| 182 | +if on right margin with AutoWrap mode is set. |
| 183 | + |
| 184 | +\subsection{Variation Selector 15} |
| 185 | + |
| 186 | +VS15 forces the grapheme cluster to emoji text presentation. |
| 187 | +This will \textbf{NOT} change the underlying width |
| 188 | +but only change the display to prefer textual non-colored presentation. |
| 189 | + |
| 190 | +This matches the behavior of todays web browsers and should thus |
| 191 | +feel most intuitive to users. |
| 192 | + |
| 193 | +The cursor will thus still move by 2 grid cells (thus having 1 skipped) |
| 194 | +if the symbol has the default presentation of emoji. |
| 195 | + |
| 196 | +\subsection{Margins and AutoWrap with Emoji} |
| 197 | + |
| 198 | +Emoji written at the right margin with AutoWrap mode disabled |
| 199 | +may or may not be rendered in half or not be displayed at all. |
| 200 | +This behavior is undefined to ease implementation and adoption |
| 201 | +of this specification. |
96 | 202 |
|
97 | 203 | \section{Performance Considerations}
|
98 | 204 |
|
99 |
| -Maybe mention "Blink's Text Stack" (or Contour's text stack) and how they deal with caching. |
| 205 | +The grapheme cluster segmentation algorithm is expensive. |
| 206 | +But performance optimizations can be applied with the assumption |
| 207 | +that most of the inbound text will most likely be US-ASCII. |
| 208 | + |
| 209 | +\todo{Maybe mention "Blink's Text Stack" (or Contour's text stack) and how they deal with caching.} |
100 | 210 |
|
101 | 211 | \section{References}
|
102 | 212 |
|
103 | 213 | \begin{itemize}
|
104 |
| - \item DECRQM, https://vt100.net/docs/vt510-rm/DECRQM.html |
105 |
| - \item DECSM, https://vt100.net/docs/vt510-rm/SM.html |
106 |
| - \item DECRM, https://vt100.net/docs/vt510-rm/RM.html |
107 |
| - \item Grapheme segmentation algorithm, URL to Unicode TR and section, |
108 |
| - https://unicode.org/reports/tr29/\#Grapheme\_Cluster\_Boundaries |
| 214 | + \item \label{ref:DECRQM}DECRQM, https://vt100.net/docs/vt510-rm/DECRQM.html |
| 215 | + \item \label{ref:DECSM}DECSM, https://vt100.net/docs/vt510-rm/SM.html |
| 216 | + \item \label{ref:DECRM}DECRM, https://vt100.net/docs/vt510-rm/RM.html |
109 | 217 | \item Maybe also URL to "Blink's Text Stack",
|
110 |
| - https://chromium.googlesource.com/chromium/src/+/master/third\_party/blink/renderer/platform/fonts/README.md |
111 |
| - or the one from Contour: |
112 |
| - https://github.com/christianparpart/contour/blob/master/docs/text-stack.md |
| 218 | + \url{https://chromium.googlesource.com/chromium/src/+/master/third\_party/blink/renderer/platform/fonts/README.md} |
| 219 | + or the one from Contour for the additional terminal context: |
| 220 | + \url{https://github.com/christianparpart/contour/blob/master/docs/text-stack.md} |
| 221 | + \item \label{ref:UTS-29}UTS 29, Grapheme segmentation algorithm |
| 222 | + \url{https://unicode.org/reports/tr29/\#Grapheme\_Cluster\_Boundary\_Rules} |
| 223 | + \item \label{ref:UTS-51}UTS 51, Unicode Emoji |
| 224 | + \url{https://unicode.org/reports/tr51/\#Display}, paragraph 2 |
113 | 225 | \end{itemize}
|
114 | 226 |
|
115 | 227 | \end{document}
|
0 commit comments