New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behaviour of joinpath when the second path has protocol #213
Comments
Hello @KuribohG Thank you for opening the issue. Could you explain your initial use case? The example you provide seems a bit unusual regarding two points:
The solution here should be that (1) we should raise an exception if joinpath is used with paths of different protocols. (2) we should raise an exception for s3 uris with empty buckets (empty netloc) Andreas Extra notes:
>>> import fsspec
>>> fsspec.open("s3:///b")
<OpenFile '/b'>
>>> x=_
>>> x.open()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andreaspoehlmann/Development/loot/venv/lib/python3.11/site-packages/fsspec/core.py", line 135, in open
return self.__enter__()
^^^^^^^^^^^^^^^^
File "/Users/andreaspoehlmann/Development/loot/venv/lib/python3.11/site-packages/fsspec/core.py", line 103, in __enter__
f = self.fs.open(self.path, mode=mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/andreaspoehlmann/Development/loot/venv/lib/python3.11/site-packages/fsspec/spec.py", line 1293, in open
f = self._open(
^^^^^^^^^^^
File "/Users/andreaspoehlmann/Development/loot/venv/lib/python3.11/site-packages/s3fs/core.py", line 685, in _open
return S3File(
^^^^^^^
File "/Users/andreaspoehlmann/Development/loot/venv/lib/python3.11/site-packages/s3fs/core.py", line 2152, in __init__
raise ValueError("Attempt to open non key-like path: %s" % path)
ValueError: Attempt to open non key-like path: /b |
Using empty buckets for s3 uri is my mistake, but in this issue I am focusing on joining paths from different protocol. In my case, I want to implement a dataset, which can be constructed by a parameter for (1) relative path, (2) absolute path, (3) path with protocols. Each dataset has a working directory, so I need to join the working dir path with this path parameter. Does all the paths with protocols means somewhat an "absolute" path? Do we allow a path like |
If I understand correctly, you want paths within your dataset to be pointing to relative locations, and exchange the root of the dataset? rel_a = UPath("path/to/fileA.txt")
rel_b = UPath("path/to/somewhere/else/fileB.txt")
root_x = UPath("s3://mybucket/somepath")
root_y = UPath("file:///opt/someotherpath")
# and now
x_a = root_x.joinpath(rel_a)
x_b = root_x.joinpath(rel_b)
y_a = root_y.joinpath(rel_a)
y_b = root_y.joinpaht(rel_b) You might also want to checkout https://intake.readthedocs.io/en/latest/ if you want to describe your datasets declaratively and load from different locations.
Yes. All UPath's (with the exception of PosixUPath and WindowsUPath) are absolute.
I will interpret this as
I still think |
Sorry I didn't explain my case clearly enough. We have an argument for specifying the dataset location (like using cmd args). When we run the script, we need to join the workdir and this argument. Say, the workdir is
This may be implemented by |
Thank you for the clarification. If the workdir is the current working directory, you can just do: def cli(arg):
pth = UPath(arg).absolute() This will correctly prepend the cwd if the path is a relative local path, and because all other UPaths are absolute, will just return for example the S3 path if the user provides that. In the case that workdir is also configurable by the user you could do: def cli(arg, workdir):
pth = UPath(arg)
if not pth.is_absolute():
pth = UPath(workdir).joinpath(pth) Let me know if that solves your issue, |
Thanks, this will solve the problem in my use case. I think |
This call will return
PosixUPath('/b')
, butwill return
S3Path('s3:///b')
.Maybe in
joinpath
, we should return just the second path when the second path has protocol (like injoinuri
)?The text was updated successfully, but these errors were encountered: