Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode URL slug gotcha #247

Open
toothbrush opened this issue Feb 6, 2019 · 3 comments
Open

Unicode URL slug gotcha #247

toothbrush opened this issue Feb 6, 2019 · 3 comments

Comments

@toothbrush
Copy link

toothbrush commented Feb 6, 2019

I ran into an interesting bug today (not actually a bug with Frog, but a gotcha i thought was worth mentioning here). I have a post with a title, for example malgré. There's a bit of code

https://github.com/greghendershott/frog/blob/master/frog/paths.rkt#L308-L311

that normalises slugs. It's quite permissive in what it allows (anything which passes char-alphabetic? is what i care about), which didn't seem to be a problem. Finally, it uses string-normalize-nfd to normalise the string.

The issue arose when I used an online mailing list service to send out an email with a link to my post. My browser pretty-prints the URL to look like http://me.com/2019/02/malgré.html which is not incorrect, but when i pasted that into the mailing list service, it turns out my subscribers got a 404. What had happened is that Frog turns the link into

.../malgre%CC%81.html

whereas the naive ASCII->UTF encoding would be this: (which is what Mailchimp generated from my .../malgré.html input in the body of my newsletter)

.../malgr%C3%A9.html

Of course, my web host says those two filenames aren't the same. The answer is probably that I should use a sane browser (Chrome seems to copy correctly, i think my troubles arose from using Safari), but i only felt safe after patching the relevant snippet to read something like the following:

   (for/list ([c (in-string (string-normalize-nfc s))])
     (cond [(regexp-match? #rx"^[a-zA-Z0-9]$" (~a c)) c]
           [else #\-]))

This is probably frightfully hacky, and results in less pretty URLs like .../malgr.html but for now i figured i could live with that. Feel free to close if this is dumb or irrelevant, but at least it's here for posterity. Thanks!

@greghendershott
Copy link
Owner

Did you see #174, and if so, is it relevant?

(I'm not asking a rhetorical question. It's genuine. That was a couple years ago and the details are long gone from my L1 or L2 cache.)

@toothbrush
Copy link
Author

I hadn't seen #174, although i did a cursory search before hitting Submit. It looks like it might be the same issue, but it's tired here and i'm late, so i'll think about this in a background process.

(and this is severely off-topic, but i wanted to say: much respect for your work on / ideals behind
https://deals.extramaze.com/!)

@greghendershott
Copy link
Owner

but it's tired here and i'm late

😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants