-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When read_package is supplied a path to a directory, try datapackage.{json|yaml|yml} #158
base: main
Are you sure you want to change the base?
Conversation
…y loading datapackage.json (or .yaml or .yml) (frictionlessdata#157)
Refactor read_descriptor so that it reads easier
I have simplified this PR and added tests.
Alternatives to support |
Hmm, I see how this is simple from an implementation perspective, but I think it's going to potentially confuse users. A trailing / is very subtle, especially for long paths / urls. I think we should think about this as two separate behavior cases, local vs remote: If local, I think we can have an easy implementation with If remote, all bets are off... even a url like "https://example.com/example_package/" could actually point to a datapackage.json file! The only way to know what you get is to make the request... That's why I didn't include completion on urls in my implementation, I thought it was probably better to have all web requests be explicit. Alternatively, if we do want to support behavior like datahub for urls, I would suggest the following:
The downside of this approach is that it always makes a second web request with the first fails. Not pretty, but I don't think there's any other way of knowing what's on the other side without making the request. |
Agree, which is why I'm ambivalent about this PR.
Yeah, that is not great, since the function is downloading something the user didn't specifically asked for, so I agree with your "I thought it was probably better to have all web requests be explicit." I'll put a pin in this PR and not include it in 1.1.0. As you suggested on Slack, there is likely a more prevalent use case in supporting zipped packages. At least those are explicit files again, not directories with potentially a file. |
A possible way to implement #157 . It sounds like you're plan is to let the idea simmer to see how other implementations approach it, but figured I'd submit what I have so you'd have it in your back pocket. Notes:
When the data package path, is an URL, it does not attempt to try the candidate datapackage resource filenames. I figured that with url redirects, etc, a URL without a ".{json|yaml|yml}$" could still exist (no 404) and be pointing to a valid resource file. And we can't really rely on content-type... so maybe better to not mess with. Another reason to not mess with urls is that this matches the behavior of
frictionless-py
. (Although it appears that data-cli from datahub will happily add /package.json to urls, as I mentioned earlier...)In my (brief) testing, the
frictionless-py
cli will only try "datapackage.json" as a candidate file to load when pointed to a directory... it does not discover datapackage.yaml or datapackage.yml. If we wanted to match this behavior, it'd only require editing the autoload candidate list here.