Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP request method fallback #1327

Open
sanmai-NL opened this issue Dec 28, 2023 · 3 comments
Open

Support HTTP request method fallback #1327

sanmai-NL opened this issue Dec 28, 2023 · 3 comments
Labels

Comments

@sanmai-NL
Copy link

sanmai-NL commented Dec 28, 2023

Some misconfigured webservers don't accept HEAD requests, but using HEAD is more efficient and appropriate in general. As a rough but effective work-around, allow the configuration key method to be a list ["head", "get"] instead of a string, where Lychee's checking algorithm falls back to the subsequent HTTP method in the list if the former gives an error response. Regrettably, in practice the response codes returned upon HEAD requests by these misconfigured webservers vary (404, 403, etc.), so don't bother to implement a fallback condition evaluator.

@mre
Copy link
Member

mre commented Jan 4, 2024

That's a great idea! What counts as an error response? For connection errors, both methods shouldn't work. Currently, errors are defined as

pub const fn is_failure(&self) -> bool {
matches!(
self,
Status::Error(_) | Status::Cached(CacheStatus::Error(_)) | Status::Timeout(_)
)
}

where

#[allow(variant_size_differences)]
#[derive(Debug, Hash, PartialEq, Eq)]
pub enum Status {
/// Request was successful
Ok(StatusCode),
/// Failed request
Error(ErrorKind),
/// Request timed out
Timeout(Option<StatusCode>),
/// Got redirected to different resource
Redirected(StatusCode),
/// The given status code is not known by lychee
UnknownStatusCode(StatusCode),
/// Resource was excluded from checking
Excluded,
/// The request type is currently not supported,
/// for example when the URL scheme is `slack://`.
/// See https://github.com/lycheeverse/lychee/issues/199
Unsupported(ErrorKind),
/// Cached request status from previous run
Cached(CacheStatus),
}

and

/// Kinds of status errors
/// Note: The error messages can change over time, so don't match on the output
#[derive(Error, Debug)]
#[non_exhaustive]
pub enum ErrorKind {
/// Network error while handling request
#[error("Network error")]
NetworkRequest(#[source] reqwest::Error),
/// Cannot read the body of the received response
#[error("Error reading response body: {0}")]
ReadResponseBody(#[source] reqwest::Error),
/// The network client required for making requests cannot be created
#[error("Error creating request client: {0}")]
BuildRequestClient(#[source] reqwest::Error),
/// Network error while using Github API
#[error("Network error (GitHub client)")]
GithubRequest(#[from] octocrab::Error),
/// Error while executing a future on the Tokio runtime
#[error("Task failed to execute to completion")]
RuntimeJoin(#[from] JoinError),
/// Error while converting a file to an input
#[error("Cannot read input content from file `{1}`")]
ReadFileInput(#[source] std::io::Error, PathBuf),
/// Error while reading stdin as input
#[error("Cannot read input content from stdin")]
ReadStdinInput(#[from] std::io::Error),
/// Errors which can occur when attempting to interpret a sequence of u8 as a string
#[error("Attempted to interpret an invalid sequence of bytes as a string")]
Utf8(#[from] std::str::Utf8Error),
/// The Github client required for making requests cannot be created
#[error("Error creating Github client")]
BuildGithubClient(#[source] octocrab::Error),
/// Invalid Github URL
#[error("Github URL is invalid: {0}")]
InvalidGithubUrl(String),
/// The input is empty and not accepted as a valid URL
#[error("URL cannot be empty")]
EmptyUrl,
/// The given string can not be parsed into a valid URL, e-mail address, or file path
#[error("Cannot parse string `{1}` as website url: {0}")]
ParseUrl(#[source] url::ParseError, String),
/// The given URI cannot be converted to a file path
#[error("Cannot find file")]
InvalidFilePath(Uri),
/// The given URI cannot be converted to a file path
#[error("Cannot find fragment")]
InvalidFragment(Uri),
/// The given path cannot be converted to a URI
#[error("Invalid path to URL conversion: {0}")]
InvalidUrlFromPath(PathBuf),
/// The given mail address is unreachable
#[error("Unreachable mail address: {0}: {1}")]
UnreachableEmailAddress(Uri, String),
/// The given header could not be parsed.
/// A possible error when converting a `HeaderValue` from a string or byte
/// slice.
#[error("Header could not be parsed.")]
InvalidHeader(#[from] http::header::InvalidHeaderValue),
/// The given string can not be parsed into a valid base URL or base directory
#[error("Error with base dir `{0}` : {1}")]
InvalidBase(String, String),
/// The given input can not be parsed into a valid URI remapping
#[error("Error remapping URL: `{0}`")]
InvalidUrlRemap(String),
/// The given path does not resolve to a valid file
#[error("Cannot find local file {0}")]
InvalidFile(PathBuf),
/// Error while traversing an input directory
#[error("Cannot traverse input directory: {0}")]
DirTraversal(#[from] jwalk::Error),
/// The given glob pattern is not valid
#[error("UNIX glob pattern is invalid")]
InvalidGlobPattern(#[from] glob::PatternError),
/// The Github API could not be called because of a missing Github token.
#[error("GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.")]
MissingGitHubToken,
/// Used an insecure URI where a secure variant was reachable
#[error("This URI is available in HTTPS protocol, but HTTP is provided, use '{0}' instead")]
InsecureURL(Uri),
/// Error while sending/receiving messages from MPSC channel
#[error("Cannot send/receive message from channel")]
Channel(#[from] tokio::sync::mpsc::error::SendError<InputContent>),
/// An URL with an invalid host was found
#[error("URL is missing a host")]
InvalidUrlHost,
/// Cannot parse the given URI
#[error("The given URI is invalid: {0}")]
InvalidURI(Uri),
/// The given status code is invalid (not in the range 100-1000)
#[error("Invalid status code: {0}")]
InvalidStatusCode(u16),
/// Regex error
#[error("Error when using regex engine: {0}")]
Regex(#[from] regex::Error),
/// Too many redirects (HTTP 3xx) were encountered (configurable)
#[error("Too many redirects")]
TooManyRedirects(#[source] reqwest::Error),
/// Basic auth extractor error
#[error("Basic auth extractor error")]
BasicAuthExtractorError(#[from] BasicAuthExtractorError),
/// Cannot load cookies
#[error("Cannot load cookies")]
Cookies(String),
/// Accept selector parse error
#[error("Accept range error")]
AcceptSelectorError(#[from] AcceptSelectorError),
}

Does that include all the cases we need to cover? I am wondering specifically about 5XX status codes. These wouldn't be covered by the fallback.

@mre mre added waiting-for-feedback enhancement New feature or request labels Jan 29, 2024
@mre
Copy link
Member

mre commented May 13, 2024

@sanmai-NL ping, in case you have any thoughts on this.

@sanmai-NL
Copy link
Author

I think your design makes sense. And if some adjustments turn out to be needed later, I don't expect those to have a high cost of change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants