-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL not properly formed with diacritics/accents not encoded #3991
Comments
i think this is a bug in TT-RSS or your browser. im not sure |
Sorry, what bug? Per RFC 3986, section 2.3, the URL should consist of only comprise of specific character set, which does not contain non-ascii characters, period. Any other characters need to be UTF-8 encoded, per RFC3987. Meanwhile RSS-Bridge allows those characters to make it to the URL. Sure, modern browsers or some clients will automatically UTF-8 encode such query before they send it outside to webservers, but RSS-Bridge should not rely on that and instead generate a feed URL that conforms to the standards. |
are you copy pasting url from browser? are you talking about those urls that are produced inside i was unable to reproduce. using firefox. |
okay i get it. it happens when parameters are used in http requests without url encoding them. in the particular case of related: #3091 |
That means each and every bridge has to handle encoding themselves for each of their arbitrary string inputs, whereas RSS-Bridge could do this itself once by encoding the complete feed URL it generated. There's no harm here: any characters needing encoding will get encoded, otherwise it will be left as is. Not to mention the bridge code should not be concerned with things like that — its scope is to prepare articles and their content in UTF-8, not handle the intrinsics of HTTP communication between the RSS-Bridge server and an RSS client. No offense, but I think you downplay the seriousness of this issue for any non-ASCII languages. |
I like your arguments. Okay let me dwell a bit on it. |
i have discovered that curl will automatically escape the url if needed. but if curl detects an already escaped url, it will NOT escape. so this particular error only happens if a url is already partially escaped (as was the case with RedditBridge), |
The problem here is not with how RSS handles that internally (i.e. the curl lib that it uses), but on the outside, i.e. with the RSS clients that you pass unescaped RSS-Bridge URL to. In other words, we need to make sure that the URL generated and returned to the user (opened in a new browser tab) by the RSS Bridge after you click "Generate Feed" needs to be properly formed. |
im confused now. can you give an example? |
for the record i did some changes related to this issue in 545dc96 but they are a refactor (should not be externally visible changes) |
Describe the bug
If any of the feed query parameters contains diacritic (accent) characters, they are left as is and not encoded, which will results in some of the clients fail to add the RSS feed with a "URL invalid" error. See: https://stackoverflow.com/questions/33211310/convert-french-accent-to-specific-encoding-in-php
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Diacritics/accents should be properly encoded
The text was updated successfully, but these errors were encountered: