Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dead-links-check.yml #137

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

dcaldr
Copy link

@dcaldr dcaldr commented Jul 1, 2023

CI that checks for dead links as suggested by Issue #60 I used work from https://github.com/lycheeverse/lychee-action
has some false positives i.e. www.bitdefender.com as 403 error that I'm not able to fix. but most reported links are really broken. Further additions could be using cache or try to auto-solve links via internet archive as presented: more commandline arguments
I could put more time and effort, but as this is my first pull request I'm not sure if it's useful.

CI that checks for dead links
@cat-alyst
Copy link
Contributor

Thank you for making a PR request!! 🤩

Links are a tricky thing, for sites we own (MITRE & CTID) it makes sense to check and a great call out. For sites we do not own...we will probably always come up with errors.

Here is the reason, vendors (the main supplier of reports) can and do remove published reports 💔 . Annoying but since it's their report it's also their right. It's not uncommon for us to be using a report during development and suddenly find the report 💨 gone 😿 . Our work around ❤️‍🩹 has been to download reports earmarked as useful so we do not rely on the online version. This way if anyone has questions regarding citations, we can promptly provide the documentation even if the links are broken 🔗 . However GitHub is not the best place for document storage. So we don't upload those here.

Any thoughts on other solutions? I haven't looked too deep in this project yet but it's now on my docket. If there is a way to ignore some links while verifying others, that would be helpful. This is also a good call out for a documentation update. Thank you! 🙏

ci: add: arguments to workflow and  clean workflow test commits
add: accept code 403 as not error

Signed-off-by: dcaldr <22105838+dcaldr@users.noreply.github.com>
@dcaldr
Copy link
Author

dcaldr commented Jul 7, 2023

I did some trial and error testing on the tool. The tool can also suggest for the dead links their saved version in wayback machine (Internet Archive)it is way slower but can get the job done.
Here is example of summary report from my testing (hopefully) summary-106err
Ignoring links - can be done via arguments, with special "config file" or allowing http error codes as good -> what I have right now. I will try to explain what it does now:

@@ -13,5 +13,5 @@ jobs:
- name: Link Checker
uses: lycheeverse/lychee-action@v1.8.0
with:
args: " --suggest --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst' --exclude-mail -a 429 --exclude-path *fin7/Resources/Step7/BOOSTWRITE-src/curl/README.md "
args: " --suggest --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst' --exclude-mail -a 403,429 --exclude-path *fin7/Resources/Step7/BOOSTWRITE-src/curl/README.md "
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • -- suggest adds wayback links
  • --verbose --no-progress format of output
  • *.md etc. targeting only selected files (not .c for example) --include-verbatim could add search inside md code blocks
  • -a treats http codes 403 and 429 as good Bitdefender and about two others returns those (due to needed cookies and js) - maybe could be replaced with specific exclusion --exclude
  • --exclude-path this one file is in UTF-16 (?bug?) link checker chrashes on this (very rarely even now after excluding )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants