Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(node): allow all diacritical marks in slugs #2389

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hpfr
Copy link

@hpfr hpfr commented Sep 2, 2023

2d58651 limited the removal of nonspacing characters to combining diacritical marks targeting the Latin alphabet because removing marks for other alphabets results in undesirable changes to the semantics of the filename.

Even for the Latin alphabet, though, removing these accents is not necessarily desirable and can change meaning. Filesystem support for Unicode is widespread. If users can type diacritical marks for their language into their node titles, they can likely type the same marks when managing files at a command line shell. Furthermore, multilingual users will still end up with non-ASCII-compatible filenames if their node titles happen to include letters from other languages.

Therefore, be consistent across alphabets by default, and preserve all alphanumeric Unicode characters in filenames, regardless of whether those characters happen to be Latin-based. string-glyph-compose is retained, so NFC normalization still happens. While we’re at it, tidy the function as a whole.

Ref: #1460

Cc @khinsen (#230): Any thoughts on this? The above PR has already introduced NFC normalization and diacritics, but not for the Latin alphabet. I’m curious how your dual-boot setup looks where you’re running into normalization issues between macOS and Linux. Are you mounting a filesystem other than HFS or APFS in macOS that doesn’t have an nfc mount option?

2d58651 limited the removal of nonspacing characters to combining
diacritical marks targeting the Latin alphabet because removing marks
for other alphabets results in undesirable changes to the semantics of
the filename.

Even for the Latin alphabet, though, removing these accents is not
necessarily desirable and can change meaning. Filesystem support for
Unicode is widespread. If users can type diacritical marks for their
language into their node titles, they can likely type the same marks
when managing files at a command line shell. Furthermore, multilingual
users will still end up with non-ASCII-compatible filenames if their
node titles happen to include letters from other languages.

Therefore, be consistent across alphabets by default, and preserve all
alphanumeric Unicode characters in filenames, regardless of whether
those characters happen to be Latin-based. `string-glyph-compose` is
retained, so NFC normalization still happens. While we’re at it, tidy
the function as a whole.

Ref: org-roam#1460
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant