New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI for checking for broken links manually, weekly and in PRs #1633
base: main
Are you sure you want to change the base?
Conversation
…software/bssw.io into mcm86-check-urls-weekly
@markcmiller86, please @mention me and let me know when this is ready to review |
Apologies...I converted to DRAFT for time being. |
Have adjusted it to ignore...
|
Ok, this addresses all the cases in #1632. I am pulling out of DRAFT mode. Its ready for review. |
@rinkug, @bartlettroscoe and @bernhold this is now ready for review. The weekly CI check should result in a small list of either bonified broken links or false negatives (e.g. says its broken when its not) to try to check and then either go edit files to fix, add to ignore patterns or just ignore. |
Alternatively, we don't do as CI at all (at least not scheduled or maybe even manual...PR checks are still good though). Instead, we use python tools and someone on EB board periodically runs checks with locally available tooling and updates list of failure cases and then attempts to fix truly broken links as they are encountered. |
I'm happy to fix the bugs, but as I mentioned I can't work on this during work time. You just pinged me 1-2 days ago so you haven't really given me a chance to work on it. I would suggest adding the URLs to the skip list for now. |
@vsoch I mentioned you here in this conversation for a different reason...your thoughts and experience, if any, in dealing with and developing a strategy for the larger issue of high probabily of some (of the over 5,500 -- and still growing) links likely always failing for reasons other than the link actually needing to be replaced. |
Ah gotcha! If you want to make new additions or changes speeder, then I'd recommend changed files: https://github.com/marketplace/actions/changed-files. I use that for container matrices so I only build containers with updated Dockerfile. If you are concerned about existing links breaking (across many files) a dumb thing I do is to segment a list of things (e.g., paths) into equal lists based on matches hashes to calendar month days, then run for the day (a small subset) each night. https://github.com/vsoch/split-list-action. You probably don't want to be checking everything on every PR, every time! |
Thinking about this some more, I hope that will be the case if we set things up well and have well defined responsibilities (see below)
Retries in the same run of
That will just create a bunch of GH issues that people will ignore because most of these will be false failures . This will just clutter up the GitHub issue DB. (I have a lot of experience observing human behavior about reacting to GH Issues.) You need a process where there are very few false failures. My experience over 20+ years observing human behavior is that if the number of false failures is above say 30%, most people will just assume it could be a false failure and ignore it.
How do people determine what is "truly broken" without manually clicking on the links in a browser? But I guess if the number of random failures stays low (say less than 10 total on any run of So I would suggest something like:
But who will be responsible for looking at the link check results? (If it is everyone's responsibility, then it is no one's responsibility.) The above seems like a reasonable process, as long as someone is willing to spend 10 minutes every month looking at potential broken links.
It seems that the GHA job will only check links in changed *.md files so I would argue that will not create a lot of false failures (because most *.md files have a few links). So I would argue we should run this GHA check on all PRs (and add some comments to the GHA output about expecting some random link check failures). |
@vsoch, from the discussion above, I don't think there is anything wrong with |
I agree...That said, @vsoch has add a I just enlisted @vsoch for her thoughts on this broader question to see if she might be able to suggest solutions. |
@markcmiller86, when is this going to be released in a new version of: being used by It looks like there has not been a commit to the |
Ping @davidbeckingsale - I have a many ideas for how to address the issues here but I don't have bandwidth soon to work on it, even in free time - there are just other things I need/want to do first. I think an RSE could help here. |
@bartlettroscoe that is my library - the action is just a wrapper for urlchecker-python. And yes, it's been running smoothly for 2 years and hasn't had any feature requests or issues. I'm lead maintainer for these projects, they have about 330 users. Feel free to use something else. |
Hi @bartlettroscoe @markcmiller86 I wanted to follow up here - realizing that probably I'm best oriented to work on this, I put aside the work I wanted to do for this afternoon and tackled this issue. I have a new release of urlchecker-python (0.0.35) and a branch with the action that you can test. Importantly:
The main issues had nothing to do with the above, and actually came down to changes in the selenium interface. When I debugged I found this change which meant that we were not using a web driver at all and relying solely on requests. The fact that most were failing is a reflection of a huge change in the web - it used to be the case that most would work for requests, and there was only an off case or two where you needed the driver. Now more sites are able to detect (what comes down to) scrapers, and they don't allow this. So since our webdriver was failing, this was resulting in bad results. @markcmiller86 could you please test out the branch, with the added option to not check certificates, and let me know your feedback? If/when it is good I can merge that PR and do another release of urlchecker-action. Thanks! |
@vsoch this is great news! 🎉 💪. Thanks so much for tackling this ❤️. I will try to test before end of weekend. |
Looks like we can try to suppress the warnings to clean up the output a bit:
|
@vsoch I tested as an action using the |
I am preparing a branch with tweaks, such as silencing those warnings. Let me know if you want something else tested before I push a test image and we can try again. |
Hmm...I kinda like those. I mean, one issue I have with most actions is that when things go wrong, there is very little information as to why. Maybe tie this output to your |
okay, happy not to do any more work then! 😆 |
Resolves: #1431
Links will get checked automatically weekly at 5:17 AM on Sundays. They can also be checked manually by just running the workflow.
We need to adjust the filters used in the checker and there are a lot of broken links found (see #1632)
[ ] View the modified*.md
files as rendered in GitHub.[ ] If changes are to the GitHub pages site under thedocs/
directory, consider viewing locally with Jekyll.