-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some URLs containing apostrophe (') causes internal error #745
Comments
If I escape the URL on the command line (adding %20 and %27 in the relevant places), I still get the internal error, so I presume the problem is caused by the fact that the page as served contains an unescaped apostrophe. |
You could try adding I think any permanent change would need some careful investigation and more tests, not least because |
Thanks very much for the suggestion, which fixes things for me. I sympathise with this being a delicate issue to fix, so if you agree that at least it's not obvious that an unescaped apostrophe is illegal in a URL, perhaps it might be possible to add a (hidden?) configuration option so that it's not necessary to edit the source to work around this problem? |
I just had a quick look into this. Forgive me if I'm telling you things you already know! RFC 3986 seems to be relevant here. There's a nice summary. |
Indeed the reason for the contents of I think there is a workaround for now without any changes: in my attempt to recreate the problem adding a trailing slash to the URL avoids the exception - although now the link is seen as outside the domain filter. Looks like you aren't using internlinks=REGEX Regular expression to add more URLs recognized as internal links. Default is that URLs given on the command line are internal. Command line option: none https://linkchecker.github.io/linkchecker/man/linkcheckerrc.html#filtering |
Thanks for the workaround. I don't quite understand why it works: I'm indeed not using |
For syntax checking every link is broken down and put back together. Somehow because of the apostrophe the reconstituted link doesn't appear to LinkChecker to be a child of the URL that was passed i.e. not internal. Setting |
Thanks for the explanation! |
I just realised that I hadn't actually tried your workaround on an unpatched version of linkchecker. It doesn't seem to work. I am using the following linkcheckerrc:
Then when I run: I get the same error as before. |
I haven't tried it again but reading #745 (comment) again there were two parts: internlinks and a trailing slash. Hopefully:
|
Thanks, I hadn't understood that adding the trailing slash was needed on top of the |
Summary
Pointing linkchecker at some URLs containing an ASCII apostrophe causes an internal error.
Steps to reproduce
linkchecker -Dall "https://boyde.ithaky.net/Mark's Gospel"
Actual result
Internal error.
Expected result
No error!
Environment
Configuration
Logs
Other notes
Since this site contains lots of very similar pages, but this is the only one with an apostrophe in its name, that would seem to be the cause of the problem. Apostrophes do not need to be escaped in URIs as far as I can tell, so my CMS doesn't, and indeed, the web server and browsers seem to be quite happy with it.
linkchecker is also happy with other URLs I give it that contain apostrophes, so I'm not exactly sure why it goes wrong in this case, except that it's a directory URL not a file URL.
Thanks for linkchecker!
The text was updated successfully, but these errors were encountered: