Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New principle: Text-based syntaxes should be designed to be usable by humans #453

Open
LeaVerou opened this issue Sep 11, 2023 · 6 comments · May be fixed by #472
Open

New principle: Text-based syntaxes should be designed to be usable by humans #453

LeaVerou opened this issue Sep 11, 2023 · 6 comments · May be fixed by #472
Assignees
Labels
Agenda+ Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) Topic: Meta
Projects

Comments

@LeaVerou
Copy link
Member

This came up in my discussion with @eemeli about MessageFormat 2, whose syntax is designed under the assumption that it will primarily be generated by tooling.

IMO any text-based format will inevitably be written and/or edited by humans, and this should always take the experience of being written by humans into account in the design of the syntax. We've failed at this before (e.g. SVG path syntax), and the result has been humans editing the format and hating every minute of it. We already have formats that are machine-friendly: binary formats. The primary reason to define text-based formats is to make them editable by humans, so we may as well go all the way.

The tradeoff of course is how to balance goals like parsing efficiency and filesize efficiency, both of which are inversely correlated with human usability. What strategy would be best? Two syntaxes, one machine-friendly and one human-friendly? But then you end up with the same problem, as humans may still need to edit the machine-friendly format generated by a tool. Should we just recommend that cases where efficiency is important should be binary formats? That also seems suboptimal. Maybe we should have a more vague recommendation like "try to balance these concerns with human usability"? Let's discuss.

@LeaVerou
Copy link
Member Author

@hober also shared this on Slack https://tess.oconnor.cx/2006/02/argumentum-ad-adminiculum

@aphillips
Copy link

I disagree that MF2 has an assumption that the syntax will primarily be generated by tooling. Most MF2 participants are aware that MF2 messages are likely to be authored by humans in resource formats and edited by translators in translation tools (or text editors). Many of the most vexing technical issues we have faced as a group hinge entirely on how humans will understand and interact with the message syntax--we have ample negative experience with escaping requirements, embedding, quoting, syntax balancing (i.e. { requires } and the like), and so on. There is a rather hot debate currently about how to handle certain aspects of whitespace--a situation in which machines will make no errors, but it is author's expectations that need to be served.

This is not to say that I disagree with this "new principle" discussion. On the contrary, I think it is super important.

(FWIW, I am the chair of the Unicode MessageFormat2 Working Group)

@torgo torgo added this to the 2023-11-06-week milestone Nov 5, 2023
@torgo torgo modified the milestones: 2023-11-06-week, 2023-12-04-week Dec 3, 2023
@torgo torgo added the Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) label Dec 4, 2023
@torgo torgo added this to leaverou in To Write Dec 4, 2023
@hadleybeeman hadleybeeman self-assigned this Dec 4, 2023
@martinthomson
Copy link
Contributor

martinthomson commented Feb 6, 2024

Points from discussion:

  1. If a human might produce a format, they probably will.
  2. Text formats are deliberately intended for human production; they are inherently less efficient than binary. Embrace the fact that this is for humans and accept that the format will be inefficient.
  3. Even if tooling will likely be used to produce the format, tooling will also be used to support human authoring.
  4. Humans value flexibility; anticipate the sorts of flexibility that humans might want. Contrast JSON (inflexible and this is bad) and HTML. (too flexible and this is bad).
  5. Human-centric formats could have tolerant error handling. CSS and HTML errors tend to have a localized effect, they don't cause the entire document to be invalid. XML documents fail to parse on errors like that.
  6. Error handling is not trivial to retrofit once a format becomes popular. It isn't just browsers, but also the entire ecosystem of tools that need to be updated.

@torgo torgo modified the milestones: 2023-12-04-week, 2024-03-04-week Mar 3, 2024
@plinss
Copy link
Member

plinss commented Mar 4, 2024

From today's discussion:

The starting point should be a well defined data model, then the text format is defined as a process that will take an arbitrary byte stream and produce a consistent data model. Everything else should refer to the data model not the text format. This implies a stringent error recovery process. We feel that hard failures, e.g. JSON and XML are not the best approach.

@martinthomson
Copy link
Contributor

I'm going to suggest "not always the best approach" is perhaps a better framing.

HTML and CSS allow humans to author documents that are presented to humans. A bit of slop and flexibility fits very well with those goals.

However, programming languages fit a different place in that space. Yes, some amount of allowance for variance is fine - provided that the intent remains unambiguous. So if I say "let x = foo;" or say "const x = foo;" or just "x = foo;", there are meaningful differences between those statements that can have an effect on what ultimately happens. This is why "use strict" is a thing in JavaScript: we learned that the same loose interpretation rules that apply to HTML do not apply in a context where a local error can have global effect. In other words, the idea that accidents can have effect that is localized does not hold for software.

That makes "the effect of accidents is localized" a key property for me.

@OR13
Copy link

OR13 commented Mar 7, 2024

Some references from IETF land that you might consider related to this issue and i18n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+ Status: Consensus to write We have TAG consensus about the principle but someone needs to write it (see "To Write" project) Topic: Meta
Projects
To Write
leaverou
Development

Successfully merging a pull request may close this issue.

7 participants