Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve --exclude regular expression examples #1314

Open
nathany opened this issue Dec 3, 2023 · 1 comment
Open

Improve --exclude regular expression examples #1314

nathany opened this issue Dec 3, 2023 · 1 comment
Labels
docs Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed

Comments

@nathany
Copy link

nathany commented Dec 3, 2023

This is what the documentation has right now:

--exclude <EXCLUDE>
          Exclude URLs and mail addresses from checking (supports regex)

You can exclude links from getting checked by specifying regex patterns with --exclude (e.g. --exclude example\.(com|org)). If a file named .lycheeignore exists in the current working directory, its contents are excluded as well. The file allows you to list multiple regular expressions for exclusion (one pattern per line).

The URL I was attempting to exclude is an unusual case. It's from the generated HTML for a book that is discussing error handling for an invalid web address:

Display the error that occurs when <code>url.Parse</code> is used with an invalid web address, 
such as one containing a space: <code>https://a b.com/</code>

The URL appears several times in the file but never as part of an A HREF. The result is:

✗ [ERR] https://a/ | Failed: Network error: dns error: no record found for Query { name: Name("a."), query_type: AAAA, query_class: IN }

It took me a while to figure out how to ignore this. --exclude a would result in 567 Excluded and variations like ^a or ^a$ didn't ignore it (based on the example in #1280). From the documentation, it wasn't clear that the solution was to add the scheme.

The solution: ^https://a/$

Later I found this example in lychee.example.toml, which include the scheme in the regex:

# Exclude URLs and mail addresses from checking (supports regex).
exclude = ['^https://www\.linkedin\.com', '^https://web\.archive\.org/web/']

Perhaps an additional example or a better example in the README would help?

I don't mind opening a PR for that (not for my specific example, but something like that linkedin example seems good). Before I do, I just wanted to double check that this is working as intended?

@mre
Copy link
Member

mre commented Dec 3, 2023

Yes, we can add an additional example. The regex matches on the full URL including the scheme in order to support cases like this. Perhaps we find a way to better describe this behavior. Feel free to open a PR with a suggestion. We could also think about adding more examples to the lychee documentation website, which is also open source.

@mre mre added docs Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed labels Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants