Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rewrite API #2789

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Implement rewrite API #2789

wants to merge 7 commits into from

Conversation

the10thWiz
Copy link
Collaborator

The goal of this PR is to implement a Rewrite API, following the basic outline presented in #2716.

The goal is to replace the existing Options on FileServer to instead use a rewrite based API. This will make FileServer pluggable, making it possible to implement things like cached compression implementable outside of Rocket.

The basic usage example is as follows:

FileServer::new("static")
   .rewrite(DotFiles)
   .rewrite(NormalizeDirs)
   .rewrite(Index("index.html"))

There are a couple of major limitations to the rewrite trait, specifically the file to be served must exist on the local file system, and the trait is synchronous, so it should not do any IO when rewriting paths. However, I would argue that if you want to move beyond these, you should just implement your own Handler, and fully replace FileServer.

@the10thWiz
Copy link
Collaborator Author

the10thWiz commented May 12, 2024

I've implemented a basic gzip compression rewriter, which provides an API similar to what was discussed in #2716.

FileServer::new("static", Options::None)
   .rewrite(NormalizeDirs)
   .rewrite(Index("index.txt"))
   .rewrite(CachedCompression::new()),

Full example, with implementation: https://github.com/the10thWiz/rocket-caching-layer

This particular implementation generates the compressed version of the file once it's been requested once, and responds with the uncompressed version in the meantime. It's not too hard to imagine different implementations, with different trade-offs. I built this as a proof-of-concept, to show that the new API actually worked for the intended use-case.

I've also marked most of the Options as deprecated, and implemented a quick check to generate an initial set of rewrites based on the Options provided.

@the10thWiz the10thWiz marked this pull request as ready for review May 12, 2024 04:49
@SergioBenitez
Copy link
Member

I'm about to review this. Note that I tried to implement this API myself some time ago and ran into a bunch of issues. Here's my attempt, in case it helps with anything: https://paste.rs/wEcVu.rs

Hopefully you've manage to overcome the limitations I found!

@the10thWiz
Copy link
Collaborator Author

@SergioBenitez

Hopefully you've manage to overcome the limitations I found!

I'm not sure what the exact limitations you came across, but here are a few things I've done differently:

  • Rewrites can only produce the content of a file on disk. This simplifies the API somewhat, since a rewrite can only take one of a few specific actions. This also allows further modification in pretty much every case, so every rewrite is always applied in every case.
  • I've implemented DotFiles differently. Rather than adding a rewrite that either allows or disallows them, I have set up the FileServer to output a Hidden response, that the DotFiles rewrite converts to a normal File response.
  • Rather than using Cow<Path>, I'm using PathBuf. I'm pretty sure we always need to allocate the path, so we wouldn't gain much from Cow<Path>, but I'm open to switching if we can potentially remove some allocations.

There are a few points we need to decide on before this implementation can be finalized:

  • Is calling .is_dir() (internally std::fs::metadata) acceptable in an async environment? There is an async tokio version, because this is an IO call, but the current implementation uses .is_dir() internally already. My current version also uses .is_dir(), and I'm not sure if this is a big deal or not.
  • How do we want to handle Missing and IndexFile. These are an interesting case, because their primary function is a check on startup. We could extend the Rewrite API to include a startup check (and default to passing), but Missing would actually want us to ignore an error. For now, I've included IndexFile (which does nothing), and excluded Missing because I don't have a way to implement it.
  • How do we want to roll this out? My current implementation is technically backwards compatible, and outputs a deprecation warning. If we intend to introduce this as part of 0.6, we could make this a breaking change, but I'd to know your thoughts before I go in on either option.
  • Should we include a compression rewrite? I've created one (see the link in a previous comment) outside of Rocket, but compression is such a common requirement we should consider including it either in core, or as a contrib crate.

Apparently, no_compile doesn't prevent the doctest from being compiled
@SergioBenitez
Copy link
Member

The entire patch for my rewrite API implementation is here: https://paste.rs/AbR41.patch

I'll be referring to it in my review.

Copy link
Member

@SergioBenitez SergioBenitez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary difference I see between this API and the one I implemented, at a more fundamental level, is the degree to which it seeks to make FileServer completely generic. In my implementation, FileServer becomes purely a sequence of rewrites:

+pub struct FileServer {
+    rewriters: Vec<Arc<dyn Rewriter>>,
+}

Whereas here, we still have the notion of a root and some options. I think to consider this a successful API, we need something that looks more like the former: we want to be able to take a path from the client and progressively apply rewrites until we have the response we're looking for, with those rewrites being controlled completely in a generic manner.

To that effect, I do think we need to make FileServer be:

pub struct FileServer {
    rewriters: Vec<Arc<dyn Rewriter>>,
}

And then be able to recover the existing functionality as a series of rewrites that we provide out of the box.


The second core difference I note is in the treatment of responses, in particular with respect to headers. In my changeset, I introduce changes to NamedFile in which I add a HeaderMap to the struct:

+pub struct NamedFile {
+    file: File,
+    path: PathBuf,
+    headers: HeaderMap<'static>,
+}

(In reality I also added Metadata, but I think that was a mistake.)

I think this makes sense as it creates parity between NamedFile and FileServer. My intent was to allow the rewriter to return a Path and optionally a HeaderMap and then use those to construct a NamedFile that gets used to generate the response. This proved to not be enough as we also need to support rewriting into redirects, and thus returning a Response was added.

In your implementation, you have an enum that effectively replaces the need for Response in my changeset.

I think the right approach is somewhere in-between.


At the end of my experiment, and after taking a look at the code here, I'm left with the impression that the rewrite API I suggested is fundamentally trying to be monadic in Option<FileResponse> of some sort, and that we should really approach that API a little more directly. That is, to boil things down to a sequence of calls of the form Option<FileResponse> -> Option<FileResponse>, where FileResponse looks something like FileServerResponse:

enum FileResponse<'a> {
    File(File<'a>),
    Redirect(Redirect<'a>),
}

struct File<'a> {
    path: Cow<'a, Path>,
    headers: HeaderMap<'a>,
}

Then, FileServer is really just an Option<FileResponse<'a>> that starts as a Some(FileResponse::File(File::from(req.uri()))) (or something morally equivalent) and proceeds through a series of and_then()s, map()s, and filter()s. I think this presentation would also make the ordering argument a bit clearer as we're already used to the ordering consequences between these operations.

If this path is followed, then the Rewrite trait becomes:

pub trait Rewrite: Send + Sync + Any {
    fn rewrite<'r>(&self, response: Option<FileResponse<'r>>) -> Option<FileResponse<'r>>;
}

It can be implemented for all function of the same form (for<'r> F: Fn(Option<FileResponse<'r>>) -> Option<FileResponse<'r>>) as well as the bind form (FileResponse<'r> -> Option<FileResponse<'r>>) and the map form (FileResponse<'r> -> FileResponse<'r>). We can also specialize implementations for common forms such as File<'a> -> File<'a>, File<'a> -> Option<File<'a>>, and File<'a> -> FileResponse<'r>.

This would mean we can implement the basic usage as something like:

FileServer::new()
    .filter_file(|file| !file.path.components().any(|c| c.starts_with('.')))
    .map_file(|file| if file.path.is_dir() && !file.path.ends_with('/') {
        Redirect(Redirect::to(file.path + '/'))
    } else {
        File(file)
    })
   .map_file(|file| if file.path.is_dir() && file.path.join("index.html").exists() {
        File(file.path.join("index.html"))
    } else {
        file
    })

Of course, we'd want to provide these as reusable components, but the point is that the implementation becomes significantly more trivial.


/// Similar to `to_path_buf`, but always allows dotfiles, and reports whether
/// the path contains dotfiles.
pub fn to_path_buf_dotfiles(&self) -> Result<(PathBuf, bool), PathError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid the duplication introduced here. The signature also makes this method difficult to work with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Something like this was needed to support DotFiles the way I wanted to implement it though.

@the10thWiz
Copy link
Collaborator Author

I think the APIs we have proposed (the two rewrite traits) are to some extent isomorphic. My trait has the request and file server root as additional parameters, but I'm pretty sure we could eliminate that without any issues. (I've already refactored away the need for root, and NormalizedDirs is the only one that still requires req, and this could be removed pretty easily). From there, FileServerResponse::NotFound is sort of equivalent to None.

The reason I opted to go with an explicit NotFound variant was to enable filtering dot files by default, i.e. FileServer::new("static") implies that dot files should not be served. To implement this, the None case needs to carry the path, enabling the DotFiles rewrite to convert it back to a Some case. This is also the reason I added to_path_buf_dotfiles.

I opted to retain Options for two reasons: first, to maintain backwards compatibility (and avoid the need to defer this to 0.6), but it sounds like you want this to be a breaking change. The second reason is that we can't mimic the behavior of Options::Missing and Options::IndexFile cleanly with a rewrite based API, since their primary function is a check during startup. If we are willing to lose this functionality, we can fully eliminate options from the FileServer struct.

With that in mind, FileServer becomes:

pub struct FileServer {
    root: PathBuf,
    rewrites: Vec<Arc<dyn Rewriter>>,
    rank: isize,
}

I think including root and rank is worthwhile. rank should be included, because it can't really be replicated by rewrites. I think we should include root, because we should restrict FileServer to a single root. This means handle becomes (roughly) File(self.root.join(req.uri().path()), a series of rewrites, and converting the final output to a Response.

Overall, I agree with most of the API changes you propose. The only changes I'm not willing to lose is hiding dot files by default, and some kind of startup check for whether the root directory exists.

@SergioBenitez
Copy link
Member

SergioBenitez commented May 29, 2024

The reason I opted to go with an explicit NotFound variant was to enable filtering dot files by default, i.e. FileServer::new("static") implies that dot files should not be served. To implement this, the None case needs to carry the path, enabling the DotFiles rewrite to convert it back to a Some case. This is also the reason I added to_path_buf_dotfiles.

We don't need to force filtering dot files by default: it would suffice to make the default constructor include a rewrite that filters dotfiles. This way we avoid negative logic.

I opted to retain Options for two reasons: first, to maintain backwards compatibility (and avoid the need to defer this to 0.6), but it sounds like you want this to be a breaking change.

0.6 is right around the corner. And in general, we shouldn't compromise on the API to avoid a breaking change unless there's a huge benefit to be had, which I don't feel is the case here.

The second reason is that we can't mimic the behavior of Options::Missing and Options::IndexFile cleanly with a rewrite based API, since their primary function is a check during startup. If we are willing to lose this functionality, we can fully eliminate options from the FileServer struct.

I think Options::Missing should become part of the functionality of specific rewrites. If a rewrite works from some directory, then it can/should be responsible for erroring as needed. This can be part of the constructor for that rewrite.

Options::IndexFile can and should be a rewrite, and it is in my implementation. In fact, it's one of the simplest of rewrites:

|_| File(self.0)

I think including root and rank is worthwhile. rank should be included, because it can't really be replicated by rewrites.

Absolutely.

I think we should include root, because we should restrict FileServer to a single root.

I think this is the case for FileServer today, but nothing should prevent a rewrite that looks in multiple roots, for example. Furthermore, we likely want to eventually extend FileServer to serve files included in the binary directly, and this would prevent that from being possible. In general, I don't think we gain anything from having root be endemic to FileServer but stand to lose quite a bit of flexibility.

Overall, I agree with most of the API changes you propose. The only changes I'm not willing to lose is hiding dot files by default, and some kind of startup check for whether the root directory exists.

I agree, and it doesn't need to be in conflict with the changes I propose. For the former, we can include it as part of the default constructor, not as part of FileServer directly. For the latter, a check can be done by the rewrite instead of by FileServer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants