New principle: Text-based syntaxes should be designed to be usable by humans #453

LeaVerou · 2023-09-11T10:42:34Z

This came up in my discussion with @eemeli about MessageFormat 2, whose syntax is designed under the assumption that it will primarily be generated by tooling.

IMO any text-based format will inevitably be written and/or edited by humans, and this should always take the experience of being written by humans into account in the design of the syntax. We've failed at this before (e.g. SVG path syntax), and the result has been humans editing the format and hating every minute of it. We already have formats that are machine-friendly: binary formats. The primary reason to define text-based formats is to make them editable by humans, so we may as well go all the way.

The tradeoff of course is how to balance goals like parsing efficiency and filesize efficiency, both of which are inversely correlated with human usability. What strategy would be best? Two syntaxes, one machine-friendly and one human-friendly? But then you end up with the same problem, as humans may still need to edit the machine-friendly format generated by a tool. Should we just recommend that cases where efficiency is important should be binary formats? That also seems suboptimal. Maybe we should have a more vague recommendation like "try to balance these concerns with human usability"? Let's discuss.

LeaVerou · 2023-09-11T18:07:26Z

@hober also shared this on Slack https://tess.oconnor.cx/2006/02/argumentum-ad-adminiculum

aphillips · 2023-10-05T22:06:10Z

I disagree that MF2 has an assumption that the syntax will primarily be generated by tooling. Most MF2 participants are aware that MF2 messages are likely to be authored by humans in resource formats and edited by translators in translation tools (or text editors). Many of the most vexing technical issues we have faced as a group hinge entirely on how humans will understand and interact with the message syntax--we have ample negative experience with escaping requirements, embedding, quoting, syntax balancing (i.e. { requires } and the like), and so on. There is a rather hot debate currently about how to handle certain aspects of whitespace--a situation in which machines will make no errors, but it is author's expectations that need to be served.

This is not to say that I disagree with this "new principle" discussion. On the contrary, I think it is super important.

(FWIW, I am the chair of the Unicode MessageFormat2 Working Group)

martinthomson · 2024-02-06T20:19:39Z

Points from discussion:

If a human might produce a format, they probably will.
Text formats are deliberately intended for human production; they are inherently less efficient than binary. Embrace the fact that this is for humans and accept that the format will be inefficient.
Even if tooling will likely be used to produce the format, tooling will also be used to support human authoring.
Humans value flexibility; anticipate the sorts of flexibility that humans might want. Contrast JSON (inflexible and this is bad) and HTML. (too flexible and this is bad).
Human-centric formats could have tolerant error handling. CSS and HTML errors tend to have a localized effect, they don't cause the entire document to be invalid. XML documents fail to parse on errors like that.
Error handling is not trivial to retrofit once a format becomes popular. It isn't just browsers, but also the entire ecosystem of tools that need to be updated.

plinss · 2024-03-04T21:29:06Z

From today's discussion:

The starting point should be a well defined data model, then the text format is defined as a process that will take an arbitrary byte stream and produce a consistent data model. Everything else should refer to the data model not the text format. This implies a stringent error recovery process. We feel that hard failures, e.g. JSON and XML are not the best approach.

martinthomson · 2024-03-05T00:17:01Z

I'm going to suggest "not always the best approach" is perhaps a better framing.

HTML and CSS allow humans to author documents that are presented to humans. A bit of slop and flexibility fits very well with those goals.

However, programming languages fit a different place in that space. Yes, some amount of allowance for variance is fine - provided that the intent remains unambiguous. So if I say "let x = foo;" or say "const x = foo;" or just "x = foo;", there are meaningful differences between those statements that can have an effect on what ultimately happens. This is why "use strict" is a thing in JavaScript: we learned that the same loose interpretation rules that apply to HTML do not apply in a context where a local error can have global effect. In other words, the idea that accidents can have effect that is localized does not hold for software.

That makes "the effect of accidents is localized" a key property for me.

OR13 · 2024-03-07T19:08:09Z

Some references from IETF land that you might consider related to this issue and i18n.

LeaVerou added Topic: Meta Agenda+ labels Sep 11, 2023

torgo added this to the 2023-11-06-week milestone Nov 5, 2023

torgo modified the milestones: 2023-11-06-week, 2023-12-04-week Dec 3, 2023

torgo assigned LeaVerou and torgo Dec 4, 2023

torgo added the Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) label Dec 4, 2023

torgo added this to leaverou in To Write Dec 4, 2023

hadleybeeman self-assigned this Dec 4, 2023

LeaVerou added a commit that referenced this issue Feb 1, 2024

First pass at #453 (text-based syntaxes should be designed for humans)

8d2f183

LeaVerou mentioned this issue Feb 1, 2024

[css-shapes] CSS flexibility for path()s (and let’s fix paths while we’re at it?) w3c/csswg-drafts#9889

Open

martinthomson linked a pull request Feb 6, 2024 that will close this issue

First pass at #453 (text-based syntaxes should be designed for humans) #472

Open

torgo modified the milestones: 2023-12-04-week, 2024-03-04-week Mar 3, 2024

martinthomson mentioned this issue Mar 4, 2024

New principle: Discourage polyglot formats #239

Open

OR13 mentioned this issue Mar 25, 2024

Consider: Banning usage of multiple suffixes ietf-wg-mediaman/suffixes#23

Open

torgo modified the milestones: 2024-03-04-week, 2024-04-01-week Mar 31, 2024

torgo modified the milestones: 2024-04-01-week, 2024-06-03-week Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New principle: Text-based syntaxes should be designed to be usable by humans #453

New principle: Text-based syntaxes should be designed to be usable by humans #453

LeaVerou commented Sep 11, 2023

LeaVerou commented Sep 11, 2023

aphillips commented Oct 5, 2023

martinthomson commented Feb 6, 2024 •

edited

plinss commented Mar 4, 2024

martinthomson commented Mar 5, 2024

OR13 commented Mar 7, 2024

New principle: Text-based syntaxes should be designed to be usable by humans #453

New principle: Text-based syntaxes should be designed to be usable by humans #453

Comments

LeaVerou commented Sep 11, 2023

LeaVerou commented Sep 11, 2023

aphillips commented Oct 5, 2023

martinthomson commented Feb 6, 2024 • edited

plinss commented Mar 4, 2024

martinthomson commented Mar 5, 2024

OR13 commented Mar 7, 2024

martinthomson commented Feb 6, 2024 •

edited