Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatioc check for broken links in documentation #6208

Open
becseya opened this issue May 9, 2024 · 4 comments
Open

Automatioc check for broken links in documentation #6208

becseya opened this issue May 9, 2024 · 4 comments

Comments

@becseya
Copy link
Contributor

becseya commented May 9, 2024

Problem to solve

As seen in #6207, some links can become outdated in the docs / repo over time.
The goal is to detect and report them automatically.

Success criteria

CI/CD checks for broken links

Solution outline

  • A Python crawler checks the HTML output of sphinx (doxygen too?)
  • A regex matcher find links in the repo and checks if they point to a valid website
    • black/while listing of folders might be useful

Rabbit holes

  • finding the porper regex
    • some links might be "relative" (not start with http(s)) or have esoteric format due to the "rich-text context" they are later processed in
  • Large amount of request on each PR might trigger some security limits

Testing

TODO

Teaching

Internal tool, not needed

@becseya becseya changed the title Automatioc check for dead link in documentation Automatioc check for broken links in documentation May 9, 2024
@kisvegabor
Copy link
Member

I was running docs.lvgl.io through a dead link checker website. It stopped after 2000 checks and found 367 dead links. Most of them are coming from the Edit on GitHub button on the API pages. E.g. https://docs.lvgl.io/master/API/layouts/grid/lv_grid.html
@kdschlosser can we do something with that?

Of course there where some good findings too.

With a script we can have a good control about what to check and what not. E.g. do not go to back to release/v7.11.

@kdschlosser
Copy link
Contributor

They used to all work. I know there were changes made to that part of the documentation generation when the MicroPython stuff was removed. IDK what was done when that happened. I am going to have to go back into the commits and dig out the old code that got changed....

@kdschlosser
Copy link
Contributor

checking links is not that hard to do and it is able to be done at the documentation generation level by doing a recursive check against the generated HTML output using regex to collect the links and then using Python requests to see if the link works. This is able to be done using threads and have each thread check like 20 links. I can also spread it out across the available cores as well.

@kisvegabor
Copy link
Member

kisvegabor commented May 14, 2024

Both are cool! Thanks! If you already know how to approach it, please open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants