Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RFC 2396 path segments #474

Open
Diaoul opened this issue Jul 16, 2020 · 8 comments
Open

Implement RFC 2396 path segments #474

Diaoul opened this issue Jul 16, 2020 · 8 comments

Comments

@Diaoul
Copy link

Diaoul commented Jul 16, 2020

Currently there is no such thing as segments in yarl, which could be useful when you construct your path from a list and don't want to end up chain creating lots of URL objects with some hacky approach (see #138)

It should also be possible to do:

segments = ["foo", "bar/"]  # with proper quoting here
url = URL("https://example.com") / segments
@asvetlov
Copy link
Member

Would you prepare a pull request?

@Diaoul
Copy link
Author

Diaoul commented Jul 19, 2020

I can give it a shot. I assume you want immutability also on that part so I guess segments will be a tuple and a cached property. Any recommendation before I jump into it?

@asvetlov
Copy link
Member

Are you talking about url.segments property?
url.parts already exists for it. I'm ok with supporting .segments as an documented alias for .parts if you want it really.

My initial thought about the issue was that you are talking about URL("https://example.com") / ["foo", "bar/"] support. For me, it means that the right operand of __truediv__ method should accept typing.Union[str, typing.Sequence[str]] (runtime check is required as well as updated type annotation).

@Diaoul
Copy link
Author

Diaoul commented Jul 19, 2020

OK I didn't know there was a parts attribute. However RFC segments seems to not include the / as a valid segment. I didn't even look for that name TBH. Or maybe I saw it and assumed it to be a list of all URL compoments, starting from the scheme till the very end.
Also I see there is a name attribute that is the last of parts. Couldn't find that in RFC. Maybe it's a convention? Could find evidence of that either.

>>> u = URL("https://example.com/foo/bar/baz.html")
>>> u.parts
('/', 'foo', 'bar', 'baz.html')

For reference, here is a "correct" implementation, minus immutability:

fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']

@asvetlov
Copy link
Member

parts and name are modeled after pathlib, I had no better idea at that moment.
Now the ship has sailed many years ago, .parts property is settled in stone.

I hear you, segments can have a little different behavior than parts.
I agree that / is not allowed segment name.
Regarding furl design -- yes, I'm aware about the library.
yarl.URL has no other public objects than URL, let's keep this principle. So, instead of fu.path.segments we can use just url.segments.

Another question is the root segment. Should we explicitly distinguish it? I think yes.
Instead of

fu = furl("https://example.com/foo/bar/baz.html")
>>> fu.path.segments
['foo', 'bar', 'baz.html']

I suggest the empty string for that (as pathlib does):

url = yarl.URL("https://example.com/foo/bar/baz.html")
>>> url.segments
('', 'foo', 'bar', 'baz.html')

By this, we can handle /foo//bar and /foo/bar/ as well.

What do you think?

@Diaoul
Copy link
Author

Diaoul commented Jul 19, 2020

If we go the single object route I would suggest the path_segments name to make it more explicit. It's also the name in the RFC (not that it matters that much) but since we have no intermediate object I think this is more obvious this way.

As for the root segment, I don't see the usage of that. Is there a possibility that we have path segments and no root segment? Could you elaborate about the use cases?

@asvetlov
Copy link
Member

A relative url has no root segment, e.g. blob:path/to.
We use them for our custom schemas.

@asvetlov
Copy link
Member

path_segments is quite a long name. Please use just segments. There are no other segments in URL than path parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants