Skip to content
This repository has been archived by the owner on Dec 31, 2023. It is now read-only.

Enhance documentation on strict_slashes #92

Open
ahopkins opened this issue Jan 20, 2022 · 15 comments
Open

Enhance documentation on strict_slashes #92

ahopkins opened this issue Jan 20, 2022 · 15 comments

Comments

@ahopkins
Copy link
Member

FOR REFERENCE:

That does not seem sensible IMO. It would be a breaking change for a lot of applications. I would think that most people on the Internet would assume https://mysite.com/foo and https://mysite.com/foo/ to be the same thing. This is default in Sanic for as long as I can remember.

Since we corrected the bug in the old router, this is the 3rd time this has come up (I just went back to check). I would not call that frequent. I would call it a failure of the docs. We should fix that.

If we want to do anything, then maybe we discuss whether or not str matching includes and empty string "" (which it does and is the root of this "problem"). That (however) is a distinct issue from strict slashes.

@johndoe46 Here is a more thorough discussion: sanic-org/sanic#2239 (comment)

And, it is also probably worth mentioning the alternative I gave there that <foo:alpha> will not have the empty string matching issues.

Originally posted by @ahopkins in sanic-org/sanic#2384 (comment)

@johndoe46
Copy link
Contributor

I'd argue that the breaking change has already happened (see my case) and that expressions containing "a lot of" and "most" are actually assumptions (but sometimes that's all we can do). But I'm fine with the given solutions, thank you for pointing to the right discussion.

I must admit that I just updated my Sanic dependency without reading the change log, thinking not much could break. I was wrong, and by the time this lesson is learned, my app is already patched, so no big deal really :-)

@Tronic
Copy link
Member

Tronic commented Jan 20, 2022

@ahopkins It has also been a problem as long as I can remember. Got bitten by it in my very first Sanic app, and I still often forget to include that parameter until later at development. NEVER I have become across a case of that handling being of any use (and most URLs don't end with slashes anyway).

But agreed, it will require a smarter deprecation path, such as issuing warnings if the value is not provided explicitly, and after some time changing the default or removing the feature altogether (like I have said before, I don't share the view of people assuming that with or without slash is the same thing).

@Tronic
Copy link
Member

Tronic commented Jan 20, 2022

I do not think this is a documentation issue, but rather just a broken feature. Directories (handler with trailing slash) accessed without trailing slash should cause redirection to the correct URL. The current implementation which directly responds with content on the slashless URL does not match the behaviour of any HTTP servers AFAIK.

@johndoe46
Copy link
Contributor

I also took for granted that redirection was the "normal" and the "right" thing to do, but in order to think out of my own assumptions ;-) I took the time to test various influential frameworks from different platforms. Unfortunately, there is no clear consensus:

Route being "/test/"

Flask:   /test => http 308 to /test/
Bottle:  /test => http 404
Django:  /test => http 301 to /test/
Fastapi: /test => http 307 to /test/
Laravel: /test => http 200
Express: /test => http 200
Sinatra: /test => http 404
Gin:     /test => http 301 to /test/

The lack of an accepted "best practice" is a bit of a bummer and the variety of redirection status codes is hilarious... but that's how it is.

@ahopkins
Copy link
Member Author

I will add some more thoughts later this weekend. I just want to throw out there that I've thought about putting together a survey and trying to get people to let us know more about how they are using Sanic. This might be a good item for that.

@Tronic
Copy link
Member

Tronic commented Jan 21, 2022

@johndoe46 Thanks for the thorough testing of frameworks. Nginx does 301 redirect, which I suspect Apache and Lighttpd probably do as well but given the variety of responses you found I don't take that for granted.

If Sanic could implement this as a fallback that does not affect normal routing but kicks in when no routes match unless a trailing slash is added, I wouldn't have any problem with an automatic 308 redirect and then ditching the strict_slashes setting (again with a deprecation path). 308 (or 307) because it preserves method and request body, where 301 converts everything into GET requests.

Some automated systems (especially curl) might not follow redirects by default, so people need to fix their URLs if this is changed to use redirection.

On the other hand, if a HTML document is served at two URLs one without trailing slash and another with one, any relative paths within that document behave differently. E.g. /invoices/ page that links to invoice123.pdf in that same directory doesn't work if the user types in the original URL without trailing slash. Using redirection instead fixes such issues.

@ahopkins
Copy link
Member Author

ahopkins commented Jan 26, 2022

I did a similar extensive amount of research when redeveloping the router to determine how we should handle /foo and /foo/. Go search "trailing slash" and you will find a ridiculous amount of opinions. The RFC itself mentions that comparison is not an exact standard.

As noted above, false negatives cannot be eliminated. In practice, their probability can be reduced, but this reduction requires more processing and is not cost-effective for all applications.

Debating whether we should or should not add varying redirects is largely an unhelpful discussion at this point when we are dealing with existing applications. Adding functionality like that would fundamentally change how Sanic operates. I am not saying that is a bad thing, but how could you handle that with a deprecation? You cannot since you are largely talking about a completely different style router with no easy backwards compat pattern. Perhaps that is the answer, perhaps we offer various routers. While not trivial, it is not cumbersome to substitute your own router. If we provided several alternatives then it might be a way that we can easily maintain existing support and allow for handling redirects differently.

Sanic does not currently offer a directory view. Although I do have a branch where I have played with allowing this in app.static routes, we generally are dealing with routes define in route definitions. These usually are returning a single resource. If the developer decides to use the trailing slash, that is their decision to make. As shown above, there is little consensus in how that should be done. Which is why strict_slashes was introduced.

TBH, I think much of this confusion comes down to how we handle <foo:str> by allowing empty strings to match. Perhaps that is the problem we should be looking to resolve.

app.route("/one/<two:str>")
app.route("/one/<two:str>/<three:str>")

I think a more sensible first step which is less intrusive would be to not match empty segments, but allow for backwards compat with a new anystr.


To recap, my proposal:

  1. Better documentation is needed... now
  2. We disallow empty strings on str and introduce a new anystr
  3. We create alternative routers that handle 3XX redirects
  4. app.static is given an OPTIONAL directory view, which we could also allow in regular route definitions with a sanic.response.directory convenience

Alternative router implementation would look like this:

from sanic import Sanic
from sanic.router import RedirectRouter

app = Sanic("MyApp", router=RedirectRouter())

@johndoe46
Copy link
Contributor

TBH, I think much of this confusion comes down to how we handle <foo:str> by allowing empty strings to match. Perhaps that is the problem we should be looking to resolve.

I completely agree with this.

@sjsadowski
Copy link
Contributor

@ahopkins We disallow empty strings on str and introduce a new anystr

This has my vote and now I realize I had not been writing tests to catch this problem currently, so I get to go back and do that. I'd rather update str and disallow empties, as logically an empty string is null and in my mind, should not match. "strornull" may be better than anystr, but I don't really care what it's called.

@ahopkins
Copy link
Member Author

Not tied to the name. strornull works too. The point is that there might be a use case for that and I would not want to drop support without providing a backup.

@ashleysommer
Copy link
Member

ashleysommer commented Jan 27, 2022

Every time I try out a new web framework, I test the router behavior by running tests as described above. I know the frameworks will each have their own way of dealing with trailing slashes, and matching routes with/without trailing slash, because each framework or library has different designers and a slightly different goal or use case in mind. Its OK that there is no standard for this, but its very important that documentation is thorough, accurate, and up to date.

TBH, I think much of this confusion comes down to how we handle foo:str by allowing empty strings to match.

I agree. This is the problem we should be looking at fixing, not strict_slashes.
This exact issue bit me back in November, I released a small API with a bug caused by this.

I had two routes:

@app.route("/api/items/<itemname:str>")
async def item(request, itemname):
    ...
    
@app.route("/api/items/")
async def item_index(request):
    ...   

I was surprised that a request to route http://localhost:8080/api/items/ went to the item handler rather than the item_index handler. Because the str route matched first, on the empty string.

Quick fix by manually calling the other handler:

@app.route("/api/items/<itemname:str>")
async def item(request, itemname):
    if not itemname:
        return await item_list(request)
    ...    

We disallow empty strings on str and introduce a new anystr
Not tied to the name. strornull works too.

Good idea. Maybe strorempty would be a suitable name.

Note, I also ran into the same problem with the :path arg type.

@app.route("/download/container/<path:path>")
async def download_path(request, path):
    ... # redirect to static download
    
@app.route("/download/container/")
async def container_index(request):
    ...  # render container index

Requests to http://localhost:8080/download/container/ were counter-intuitively going to the download_path handler, instead of the container_index handler. So question is, should path also match empty string?

@ahopkins
Copy link
Member Author

So question is, should path also match empty string?

I guess that one is a little tricky. Initially I wanted to say no, but that might not be the case. Allowing empty paths here might have more use and make sense because you could use that in crawling the root of a directory. If we forced that to be non-empty, then there really would not be a way to replicate that behavior. I am more inclined to allow that behavior and just document it.

@ahopkins
Copy link
Member Author

Change has been implemented in the referenced PR: sanic-org/sanic-routing#58

@Tronic
Copy link
Member

Tronic commented Jan 27, 2022

I am not saying that is a bad thing, but how could you handle that with a deprecation? You cannot since you are largely talking about a completely different style router with no easy backwards compat pattern. Perhaps that is the answer, perhaps we offer various routers. While not trivial, it is not cumbersome to substitute your own router. If we provided several alternatives then it might be a way that we can easily maintain existing support and allow for handling redirects differently.

I don't really see how that is problem.

  1. Make the only router behave as if strict_slashes was always True
  2. Slashless URLs with no match end up with 404 or 405 errors
  3. Before calling exception handlers, try running the router again with a slash appended and if it succeeds, return a redirect

This also ends up making access log entries on redirects, so people can fix those broken links. And yes, we regularly break some things to allow development. Redirects instead of sending the document break some applications, and for that reason we have the .3 releases that allow for breaking changes.

On redirects, the temporary kind (307) might be better because apparently browsers can cache the result of a permanent redirect and never again request without slash. This could be problematic during development, when one changes route paths. For search engines (once the app is finished) the permanent redirect would be better so that they directly link to the correct URL.

Alternative routers for different modes seems like completely insane complication.

@Tronic
Copy link
Member

Tronic commented Jan 27, 2022

I am again underlining that all resources should have unique URLs so the flexibility with the trailing slash is bad (for similar reasons why it should be prevented www.example.com and example.com both serving the same documents and instead provide a redirect from one to the other). A particularly nasty source of bugs is the one I mentioned where the HTML served uses relative links to other things it expects to find in the same folder, and then failing depending on whether a trailing slash was used or not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants