Skip to content

Commit a321315

Browse files
Updates and adaptions to recent discussion in contour-terminal/contour#404
1 parent b117f57 commit a321315

File tree

3 files changed

+171
-59
lines changed

3 files changed

+171
-59
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ clean:
1111

1212
${TARGET_DIR}/${BASENAME}.pdf: $(SOURCE_FILES)
1313
@mkdir -p ${TARGET_DIR}
14-
@cd spec && latexmk -pdflatex ${BASENAME}.tex \
15-
-aux-directory=../${TARGET_DIR} -output-directory=../${TARGET_DIR}
14+
@pdflatex -aux_directory=${TARGET_DIR} -output-directory=${TARGET_DIR} $^
15+
@pdflatex -aux_directory=${TARGET_DIR} -output-directory=${TARGET_DIR} $^
1616

1717
.PHONY: all clean

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,32 @@
22

33
**IMPORTANT: THIS PROJECT IS IN ALPHA STAGE & ACTIVE DEVELOPMENT**
44

5-
Let's make Unicode support in terminal emulators better - not perfect, but better.
5+
Let's make Unicode support in terminal emulators better -not perfect- but better.
66

77
For that I'd like to introduce a small spec that at least tries to tackle **some**
88
basics that would greatly help user experience.
99

1010
Of course, the terminal emulator is not enough, terminal applications have
1111
to catch up, too. But without support from terminals, the applications
12-
cannot even start doing so. This draft spec tries to fix that.
12+
cannot even start doing so. This project tries to fix that.
1313

14-
## Goal of this repository
14+
## Goal of this project repository
1515

16-
It would be nice if this repository serves as a communication hub for improving this spec
17-
that ideally enough terminal emulators will adopt so we could call this the future defacto image protocol
18-
for terminals, so that developers have it easier in the future on how to get images into their
19-
terminal applications.
16+
It would be nice if this repository serves as a communication hub for
17+
improving this spec that ideally enough terminal emulators will adopt,
18+
so we could call this the future extension for terminals.
2019

2120
## How to contribute
2221

23-
Everybodies point of view is valuable, whether terminal emulator developer, terminal application or
24-
toolkit developer, or a user.
22+
Everybodies point of view is valuable, whether terminal emulator developer,
23+
terminal application or toolkit developer, or a user.
2524

26-
While getting this spec in shape, I'd like to get your feedback to find a common
27-
concensus that most of us can agree on with the goal to get an adoption as broad as possible.
25+
While getting this spec in shape, I'd like to get your feedback to find
26+
a common concensus that most of us can agree on with the goal to get
27+
an adoption as broad as possible.
2828

29-
Sure, this won't happen in a day or even 2 years. But someone has to start at some point,
30-
so more can follow.
29+
Sure, this won't happen in a day or in a year.
30+
But someone has to start at some point, so more can follow.
3131

3232
## This spec is NOT
3333

@@ -37,8 +37,8 @@ so more can follow.
3737

3838
## This spec will
3939

40-
- Enable users to make use of Ligatures and Emoji without sacrifice.
41-
- Have legacy applications as well as newer ones respecting this spec compatible in one terminal.
40+
- enable users to make use of programming ligatures and Emoji without sacrifice.
41+
- have legacy applications as well as newer ones respecting this spec compatible in one terminal.
4242

4343
## Roadmap
4444

spec/terminal-unicode-core.tex

Lines changed: 154 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,42 @@
77
\usepackage{fancyhdr}
88
\usepackage{graphicx}
99
\usepackage{hhline}
10+
\usepackage{todonotes}
11+
12+
\usepackage{draftwatermark}
13+
\SetWatermarkText{Draft}
14+
\SetWatermarkScale{4}
15+
16+
\usepackage{geometry}
17+
\geometry{legalpaper, margin=1in}
18+
19+
\usepackage[hidelinks]{hyperref}
20+
\hypersetup{
21+
colorlinks=true,
22+
linkcolor=blue,
23+
filecolor=magenta,
24+
urlcolor=cyan
25+
}
1026

1127
\usepackage{xcolor}
1228
\definecolor{light-gray}{gray}{0.95}
1329

1430
\title{Unicode in Terminals \\
1531
a proposal to standardizing basic Unicode features}
1632
\author{Christian Parpart}
17-
\date{2020-07-27 (draft, revision 0)}
33+
\date{2021-09-04 (draft, revision 1)}
1834

1935
\newcommand{\code}[1]{\colorbox{light-gray}{\texttt{#1}}}
2036

37+
\newcommand{\Unicode}{\textbf{Unicode 13}}
38+
2139
\newcommand{\DECRQM}[1]{\code{CSI ? #1 \$ p}}
2240
\newcommand{\DECSET}[1]{\code{CSI ? #1 h}}
2341
\newcommand{\DECRST}[1]{\code{CSI ? #1 l}}
2442

25-
\newcommand\VtModeNum{2027} % Grapheme cluster mode Id
26-
\newcommand{\GCON}{\DECSET{\VtModeNum{}}} % DECSM for enabling grapheme cluster processing
27-
\newcommand{\GCOFF}{\DECRST{\VtModeNum{}}} % DECRM for disabling grapheme cluster processing
43+
\newcommand\VtModeNum{2027} % mode Id that is used by this specification
44+
\newcommand{\GCON}{\DECSET{\VtModeNum{}}} % DECSM for enabling grapheme cluster processing
45+
\newcommand{\GCOFF}{\DECRST{\VtModeNum{}}} % DECRM for disabling grapheme cluster processing
2846
\newcommand{\GCTEST}{\DECRQM{\VtModeNum{}}} % DECRQM for requesting current grapheme cluster processing mode
2947

3048
\begin{document}
@@ -35,81 +53,175 @@
3553

3654
\section{History and current state}
3755

38-
Historically, only 7-bit characters were supported by terminals and different languages by selecting
39-
their respective code pages.
40-
Later on
56+
Historically, only 7-bit characters with C0 control codes
57+
were supported by terminals and different languages
58+
by selecting their respective code pages.
4159

42-
\begin{itemize}
43-
\item Back in the days: 7bit ASCII text, 8bit ASCII text, many code pages for switching character set
44-
\item Then Unicode came, the one to rule them all. But Terminals are incompatible.
45-
\item Unicode UTF-8 came, could be incooperated into terminals,
46-
\end{itemize}
60+
Later on this was extended to 8-bit ASCII and along with C1 control codes.
4761

48-
TBD.
62+
With the introduction of Unicode there were no need to have codepages anymore,
63+
but the Unicode spec was not explicitly designed to also cover terminals,
64+
except that C0 and C1 codepoints were preserved.
4965

50-
...
66+
With Unicode UTF-8 it was possible to at least pass Unicode characters to the
67+
terminal, but rendering of a few characters as well as their respective
68+
cursor placement is not defined in the Unicode standard.
5169

52-
Is Grapheme cluster handling an issue? Only when the application makes assumptions about
53-
the cursor placement after having sent out a sequence of Unicode codepoints that form a grapheme
54-
cluster.
70+
Also, Unicode introduced codepoint sequences that are mapping to
71+
a single user perceived character - so called grapheme clusters.
72+
The terminal has never attempted any formalization on how to deal with
73+
grapheme clusters, variation selectors, their east asian width, nor
74+
emoji and emoji presentation handling.
5575

56-
\section{Backwards Compatibility}
76+
This spec tries to address some of the problems terminals are suffering
77+
with Unicode today.
5778

58-
TBD.
79+
\section{Backwards Compatibility}
5980

6081
basic points are:
6182
Everything is disabled by default, so legacy apps don't break more than they
6283
used to break already.
6384

85+
Backwards compatibility is retained by leaving everything as undefined
86+
as it is without this specification.
87+
88+
The application can test for the availability of this feature
89+
and has to explicitly enable it in order to get the set of properties
90+
as defined in this document guaranteed.
91+
6492
\section{Future Compatibility and Stability}
6593

66-
TBD.
94+
Unicode itself had a major breakage at version between version 8 and 9
95+
with regards to some codepoints having their east asian width changed.
96+
97+
It is feared that this may happen at any time in the future again, although,
98+
there were no other width change since then.
99+
100+
This specification requires a few Unicode algorithms to be mandatory implemented.
101+
These may or may not change in the future.
102+
103+
\todo{Pass on version using sub-parameters with the unicode version
104+
or just allocate a new mode number in case of major changes?}
67105

68106
\section{Mode Detection}
69107

70-
\GCTEST can be used to test the which mode is currently active or if this feature is not available
71-
at all - such as with non-supporting terminals or with terminals that have this support disabled.
108+
\GCTEST can be used to test if mode is currently active
109+
or if this feature is not active (or event available at all) -
110+
such as with non-supporting terminals
111+
or with terminals that have this support disabled.
72112

73113
\section{Mode Switching}
74114

75115
\begin{itemize}
76-
\item \GCON{} for enabling grapheme clustering
77-
\item \GCOFF{} for disabling grapheme clustering
116+
\item \GCON{} for ensuring conformance to all rules as defined by this specification
117+
\item \GCOFF{} for undefined behavior
78118
\end{itemize}
79119

80120
\section{Feature Detection}
81121

82-
\DECRQM{\VtModeNum} can not just be used for testing the current mode but this VT sequence will also
83-
respond with a specific code indicating that this mode (and thus this feature) is not supported.
122+
\GCTEST can be used for testing the current state of this mode as well
123+
as, if this mode is not supported at all, this will be indicated in the reply as
124+
well.
84125

126+
\todo{Do we want to also expose the feature availability via \code{DA1}?}
85127
The \code{DA1} could be extended to also indicate support, but \code{DECRQM} is sufficient.
86128

87-
There is a \textbf{feature detection} spec in the works, that could be used in the future for
88-
detecting this feature, too.
129+
\section{Semantics}
89130

90-
\section{Column width of a grapheme cluster}
131+
The following set of semantics \textbf{MUST} be adhered to if this mode is enabled.
132+
If the mode \code{\VtModeNum} is not set the behavior is as undefined as
133+
if this specification was not implemented at all in order to retain
134+
behavior of current terminals and their legacy applications.
91135

92-
\begin{itemize}
93-
\item TODO
94-
\item TBD
95-
\end{itemize}
136+
\subsection{Grapheme Cluster}
137+
138+
With this mode enabled, the terminal \textbf{MUST} support grapheme clusters
139+
in conformance to algorithm as described in \ref{ref:UTS-29}.
140+
141+
This implies that every consecutively written character on the terminal
142+
stream that is non-breakable as per \ref{ref:UTS-29} will
143+
always end up in the same terminal's grid cell.
144+
145+
Therefore, extending a grapheme cluster with consecutively added codepoints
146+
will not move the cursor except for variation selector 16 (VS16) that may
147+
have caused the width of the grapheme cluster to change to wide (2 grid cells).
148+
149+
When the cursor moves to a grid cell that contains a complete or incomplete
150+
grapheme cluster, this grid cell's contents will be erased and overwritten
151+
rather then textually concatinated.
152+
153+
Therefore cursor movement semantics of the terminal remain unchanged.
154+
155+
\subsection{Emoji}
156+
157+
Emoji symbols are always rendered in square aspect ratio
158+
(as proposed by \ref{ref:UTS-51}),
159+
implying a East Asian Width of Wide, 2 grid cells.
160+
161+
ZWJ emoji are required to be displayed as a single image with a width of 2
162+
grid cells.
163+
164+
The alternate display of ZWJ emoji in a decomposed sequence of sub-images
165+
must not be used as a fallback as it will break cursor movemeent guarantees.
166+
167+
If a ZWJ emoji cannot be rendered the display behavior is undefined -
168+
for example, a unicode replacement character \code{U+FFFD} could be
169+
displayed instead.
170+
171+
In emoji emoji presentation, the cursor will always move by 2 grid cells.
172+
173+
The contents of the skipped grid cell is undefined. \todo{really? Maybe we want to be explicit here.}
174+
Good practise would though be to have this cell be cleared and its SGR set
175+
to the currently active SGR attributes.
176+
177+
\subsection{Variation Selector 16}
178+
179+
VS16 promotes the grapheme cluster to emoji emoji presentation,
180+
implying that this will force the grapheme cluster's width to be 2,
181+
which may possibly cause reflowing of that symbol to the next line
182+
if on right margin with AutoWrap mode is set.
183+
184+
\subsection{Variation Selector 15}
185+
186+
VS15 forces the grapheme cluster to emoji text presentation.
187+
This will \textbf{NOT} change the underlying width
188+
but only change the display to prefer textual non-colored presentation.
189+
190+
This matches the behavior of todays web browsers and should thus
191+
feel most intuitive to users.
192+
193+
The cursor will thus still move by 2 grid cells (thus having 1 skipped)
194+
if the symbol has the default presentation of emoji.
195+
196+
\subsection{Margins and AutoWrap with Emoji}
197+
198+
Emoji written at the right margin with AutoWrap mode disabled
199+
may or may not be rendered in half or not be displayed at all.
200+
This behavior is undefined to ease implementation and adoption
201+
of this specification.
96202

97203
\section{Performance Considerations}
98204

99-
Maybe mention "Blink's Text Stack" (or Contour's text stack) and how they deal with caching.
205+
The grapheme cluster segmentation algorithm is expensive.
206+
But performance optimizations can be applied with the assumption
207+
that most of the inbound text will most likely be US-ASCII.
208+
209+
\todo{Maybe mention "Blink's Text Stack" (or Contour's text stack) and how they deal with caching.}
100210

101211
\section{References}
102212

103213
\begin{itemize}
104-
\item DECRQM, https://vt100.net/docs/vt510-rm/DECRQM.html
105-
\item DECSM, https://vt100.net/docs/vt510-rm/SM.html
106-
\item DECRM, https://vt100.net/docs/vt510-rm/RM.html
107-
\item Grapheme segmentation algorithm, URL to Unicode TR and section,
108-
https://unicode.org/reports/tr29/\#Grapheme\_Cluster\_Boundaries
214+
\item \label{ref:DECRQM}DECRQM, https://vt100.net/docs/vt510-rm/DECRQM.html
215+
\item \label{ref:DECSM}DECSM, https://vt100.net/docs/vt510-rm/SM.html
216+
\item \label{ref:DECRM}DECRM, https://vt100.net/docs/vt510-rm/RM.html
109217
\item Maybe also URL to "Blink's Text Stack",
110-
https://chromium.googlesource.com/chromium/src/+/master/third\_party/blink/renderer/platform/fonts/README.md
111-
or the one from Contour:
112-
https://github.com/christianparpart/contour/blob/master/docs/text-stack.md
218+
\url{https://chromium.googlesource.com/chromium/src/+/master/third\_party/blink/renderer/platform/fonts/README.md}
219+
or the one from Contour for the additional terminal context:
220+
\url{https://github.com/christianparpart/contour/blob/master/docs/text-stack.md}
221+
\item \label{ref:UTS-29}UTS 29, Grapheme segmentation algorithm
222+
\url{https://unicode.org/reports/tr29/\#Grapheme\_Cluster\_Boundary\_Rules}
223+
\item \label{ref:UTS-51}UTS 51, Unicode Emoji
224+
\url{https://unicode.org/reports/tr51/\#Display}, paragraph 2
113225
\end{itemize}
114226

115227
\end{document}

0 commit comments

Comments
 (0)