Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for internal links #45

Open
vsoch opened this issue Jun 24, 2020 · 5 comments
Open

Support for internal links #45

vsoch opened this issue Jun 24, 2020 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@vsoch
Copy link
Collaborator

vsoch commented Jun 24, 2020

hey @SuperKogito - it occurred to me today that we don't support checking internal links, meaning that if we render a jekyll site, we aren't able to see if something that starts with a / in an img src or link href works (internally). I think this is something that would be needed, and also would mimic the functionality of html-proofer. What do you think? How would it work?

@SuperKogito
Copy link
Member

Hello @vsoch, you are right about that. Until now, we only checked urls that have the standard full format. Internal links check would be a good addition to the lib but I am not sure how to go about this tbh.
We can either extract those paths using a custom regex -> build a full links from them (using the structure of the project) -> then use the requests lib to test them like the usual. Or we can just check if the files paths exist using the os lib`, which should be enough. Do you have any other alternatives to consider? :)

@vsoch
Copy link
Collaborator Author

vsoch commented Jun 25, 2020

My thinking is along the same lines - I think there are two cases to consider, and we might start with the easier of the two:

  • render html into a sites folder, meaning that we only need to check img src, and a href links
  • still allow markdown, and also check for the [title](link) format.

I think the first would be easier because we would just parse the html with beautiful soup (or similar) and then ensure that the file exists. That would require installing bs4 and the xml parsing library (it starts with html but I can't remember the name off the bat). The second is harder, but less likely. I'm not actually sure how html-proofer does it, to be honest. What do you think?

@SuperKogito
Copy link
Member

I am not that familiar with jekyll websites but I think I got your point. The first idea seems feasible, using beautiful soup (or similar) is a good way to collect hrefs and img src, combined with path checking it should cover most cases. (I think you mean html5lib, also lxml is an option).

so the second part, I am afraid I did not get :( why would it be hard to still check for title ? can you elaborate a bit on it please?

@vsoch
Copy link
Collaborator Author

vsoch commented Jun 26, 2020

Sure! So in markdown there are two styles of links, inline (what we discussed above) and then reference. Reference might look like this:

[my title][the-id]

.... lots of text here, go to bottom of file...

[the-id]: http://example.com/  "Optional Title Here"

So it's not hugely terrible, but it would mean we would need to parse for both kinds, and then for the reference links, look for them in that weird format (potentially anywhere else in the markdown). Does that make sense?

@SuperKogito
Copy link
Member

Apparently, I missed this 2 years ago 😞 hence I am responding now. Well I get your point, but I think the way the library evolved so far, internal links are not a priority. Also, they would give us quite the headache and so far no one is requesting them. So let's keep this as an improvement for when someone/ one of us is feeling brave enough to attack this 😄

@SuperKogito SuperKogito added enhancement New feature or request help wanted Extra attention is needed labels Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants