Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a mode to handle "pretty URLs", i.e. URIs without .html extension #1422

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dscho
Copy link
Contributor

@dscho dscho commented May 11, 2024

In many circumstances (GitHub Pages, Apache configured with MultiViews, etc), web servers process URIs by appending the .html file extension when no file is found at the path specified by the URI but a .html file corresponding to that path is found.

To allow Lychee to use the fast, offline method of checking such files locally via the file:// scheme, let's handle this scenario gracefully by adding the --auto-append-html-fileext option.

This is especially nice to have when migrating sites that have been run on, say, a web app, to a static site generated by Hugo, as I am trying to do with Git's home page.

@dscho dscho force-pushed the add-auto-append-html-fileext-option branch 2 times, most recently from c2b561e to 7fe842d Compare May 11, 2024 22:16
@dscho
Copy link
Contributor Author

dscho commented May 12, 2024

Hmm. Most of the CI failures (all except one) appear outside the code touched by this PR. Help?

@dscho dscho force-pushed the add-auto-append-html-fileext-option branch 2 times, most recently from db77418 to b61ba55 Compare May 12, 2024 23:15
@dscho dscho mentioned this pull request May 12, 2024
@dscho dscho force-pushed the add-auto-append-html-fileext-option branch from b61ba55 to 64acc73 Compare May 12, 2024 23:49
@mre
Copy link
Member

mre commented May 13, 2024

In general, I like the idea. However, I'd love to make it more generic, i.e. cover more use-cases.
For instance, a user might want to try a different extension like .htm or .php instead.
Multiple extensions could be thinkable, too. And the order of extensions might also be nice to be configurable.
Here's an extended proposal.

--check-extensions <EXTS>
      Test the specified file extensions for URIs when checking files locally.
      Multiple extensions can be separated by commas. Extensions will be checked in order of appearance.
      
      Example: --check-extensions html,htm,php,asp,aspx,jsp,cgi

Any thoughts?

@dscho dscho force-pushed the add-auto-append-html-fileext-option branch 2 times, most recently from 1c40f0a to 3b3caa1 Compare May 14, 2024 23:36
@dscho
Copy link
Contributor Author

dscho commented May 14, 2024

Here's an extended proposal.

--check-extensions <EXTS>
      Test the specified file extensions for URIs when checking files locally.
      Multiple extensions can be separated by commas. Extensions will be checked in order of appearance.
      
      Example: --check-extensions html,htm,php,asp,aspx,jsp,cgi

@mre done!

@thomas-zahner
Copy link
Member

Sounds like a useful addition. I agree that we can add this more generic approach to lychee

lychee-lib/src/client.rs Outdated Show resolved Hide resolved
lychee-lib/src/client.rs Outdated Show resolved Hide resolved
// if the path has no file extension, try to append some
let mut path_buf = path.clone();
let mut found = false;
for ext in &self.check_extensions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the found flag and manually looping and breaking is not too nice. Instead try to use .iter and find_map.

Maybe similar to the following (didn't test it):

let check_extension_exists = |ext: &str| { // could also be a function
    let mut path_buf = path.clone();
    path_buf.set_extension(ext);
    if path_buf.exists() {
        Some(path_buf)
    } else {
        None
    }
};

self.check_extensions.iter()
    .find_map(check_extension_exists)
    .map(|path_buf| {
        if self.include_fragments {
            self.check_fragment(&path_buf, uri).await
        } else {
            Status::Ok(StatusCode::OK)
        }
    })
    .next()
    .unwrap_or_else(|| ErrorKind::InvalidFilePath(uri.clone()).into());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomas-zahner correct me if I am wrong, but don't we want to stop at the first found extension? The suggested map() would not stop there, and in the best case do unnecessary work, in the worst case do unwanted work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my proposed code to address your concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually iterators are lazy and run only when it's really necessary. This is described in the rust book.

For example we can convert an infinite range of numbers into an iterator, map the numbers and take just a few of the numbers. Then the map only executes as many times as we take a result. In my code snippet above the next() consumes just one element so that the map only runs once.

    let iter = (0..).into_iter().map(|x| x * 2).take(5);
    println!("{:?}", iter.collect::<Vec<_>>());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually iterators are lazy and run only when it's really necessary. This is described in the rust book.

For example we can convert an infinite range of numbers into an iterator, map the numbers and take just a few of the numbers. Then the map only executes as many times as we take a result. In my code snippet above the next() consumes just one element so that the map only runs once.

    let iter = (0..).into_iter().map(|x| x * 2).take(5);
    println!("{:?}", iter.collect::<Vec<_>>());

Okay, I didn't know that :-)

So now I'm torn: the code you showed that uses a map() looks less obvious than I'd like it to be, certainly more complicated than equivalent constructs in languages that I'm more familiar with than Rust. It may very well be more idiomatic Rust code than the for loop code that is in the current version of this PR. However, the latter version is much easier to parse and grok for me. So I am torn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe my proposed iter solution is not perfect but normally iter is to be preferred over for loops. Maybe the iter solution could be simplified further. @mre what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I also prefer iterators, but in this case in particular, I'd lean towards the for-loop because of the early return. Now that the found flag got removed, it's gotten much more readable already. So I'm fine with the current version.

@dscho dscho force-pushed the add-auto-append-html-fileext-option branch 5 times, most recently from a17cb4a to b3742fb Compare May 18, 2024 20:39
In many circumstances (GitHub Pages, Apache configured with MultiViews,
etc), web servers process URIs by appending the `.html` file extension
when no file is found at the path specified by the URI but a `.html`
file corresponding to that path _is_ found.

To allow Lychee to use the fast, offline method of checking such files
locally via the `file://` scheme, let's handle this scenario gracefully
by adding the `--check-extensions=html` option.

Note: This new option can take a list of file extensions to use; The
first one for which a corresponding file is found is then used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the add-auto-append-html-fileext-option branch from b3742fb to 976ba4b Compare May 18, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants