Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI with empty path #22

Open
Wouter1 opened this issue Jun 16, 2021 · 9 comments
Open

URI with empty path #22

Wouter1 opened this issue Jun 16, 2021 · 9 comments
Labels

Comments

@Wouter1
Copy link

Wouter1 commented Jun 16, 2021

From the wiki ​https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

If an authority component is present, then the path component must either be empty or begin with a slash (/).

So this should be correct uri
https://localhost

(notice no slash at the end)

However

>>> URI("http://local")
URI('http://local/')
@amcgregor
Copy link
Member

amcgregor commented Jun 16, 2021

Indeed, path-empty is a BNF form not currently accounted for. 🤔

Looking at practical implications, the encoding (REPR, in this case) containing the slash appears to be a result of the attempt to form the hierarchical part by naively combining authority with path, and "resolving" that path. The path of a URI having no path is, in fact, distinct:

>>> URI('http://example.com')
<<< URI('http://example.com/')

>>> URI('http://example.com').path
<<< PurePosixPath('.')

Resolution of . results in / given no other base path. I agree that this is bug-worthy; a special case should be added for this "pathless" capability. (I.e. if the path is exactly ., there is no path.)

Conversely, the paths you have given as examples are the same URI after absolutizing. The first bullet point case below this section covers this situation, I believe.

@Wouter1
Copy link
Author

Wouter1 commented Jun 16, 2021

Yes after absolutizing it might be the same.
I can not exactly predict if and when this difference would play up or how relevant this is for me. It played up in a bit artificial unit test here so I could work around it. But also I prefer to not wait for bug reports from my users.

@amcgregor
Copy link
Member

amcgregor commented Jun 16, 2021

Additional note: the relative reference resolution examples appear to strongly prefer that if a path has ever existed, that a root must be preserved:

"../.."         =  "http://a/"
"../../"        =  "http://a/"

@Wouter1
Copy link
Author

Wouter1 commented Jun 16, 2021

I would not draw conclusions about preferences from examples. Better stick with the specification

@amcgregor
Copy link
Member

amcgregor commented Jun 16, 2021

Examining the code, you can "hotfix" patch this behavior yourself if required:

from uri.part.path import PathPart

PathPart.empty = ""

That will swap out the default representation for an "empty" path globally:

>>> URI('http://example.com')
<<< URI('http://example.com')

>>> URI('http://example.com/foo')
<<< URI('http://example.com/foo')

I may have already thought about this, then forgotten. 😜 Edit to note: my desire has been to follow the specification as closely as is reasonable, while making sensible accomodations for user-expected behavior, i.e. as experienced in a browser user agent. In a majority of the browsers I utilize, if you omit the trailing / representing the path, it's added automatically since you actually are requesting the root of the domain. (path-empty is equivalent to / for practical reasons.)

Better stick with the specification.

Double check where these examples are literally coming from.

@Wouter1
Copy link
Author

Wouter1 commented Jun 16, 2021

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Double check where these examples are literally coming from

These are examples from the spec. So ?

@amcgregor
Copy link
Member

amcgregor commented Jun 16, 2021

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Nope, but that's a trivial one to try out:

>>> URI('http://example.com/')
<<< URI('http://example.com/')

The adaption is explicit, as mentioned, targeting the PurePosixPath('.') case of a URI omitting any path. Edit to be clear: / is not a path-empty.

@Wouter1
Copy link
Author

Wouter1 commented Jun 16, 2021

Ok good! Sounds like the way to go.
Is this fix going to be released or put into the develop version as well?
Because selling my users a "hot-fixed" build will raise more questions than selling them the develop version.

@amcgregor
Copy link
Member

amcgregor commented Jun 16, 2021

This will require discussion/debate; changing default behaviours is not often an easy decision. Which to target: pure specification, or practical application? The Zen of Python does have something to say about this, and alteration of behavior to match specific desires is demonstrably quite simple. Explicitly why I structured the representation into "parts" the way I have.

Consider it less of a "hotfix" than configuration, if you need to. This configuration also only alters behavior for your own application process; it merely isn't passed in as part of the constructor itself, it's process-global. (It's not like monkeypatching code paths—this is just the alteration of a global constant dedicated for this explicit purpose!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants