Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP error code for missing chunks #289

Open
Alexander-Barth opened this issue Mar 15, 2024 · 5 comments
Open

HTTP error code for missing chunks #289

Alexander-Barth opened this issue Mar 15, 2024 · 5 comments

Comments

@Alexander-Barth
Copy link

Alexander-Barth commented Mar 15, 2024

I have a questions about this part of the spec

There is no need for all chunks to be present within an array store. If a chunk
is not present then it is considered to be in an uninitialized state. An
uninitialized chunk MUST be treated as if it was uniformly filled with the value
of the "fill_value" field in the array metadata. If the "fill_value" field is
null then the contents of the chunk are undefined.

What is the appropriate HTTP error code for telling an HTTP client that a chunks is missing ? While most servers seems to return 404 (not found), some servers return 403 (permission denied) which cause a problem for some client applications (JuliaIO/Zarr.jl#131).

Would it be possible to clarify which HTTP error codes can be used for missing chunks?

@jbms
Copy link
Contributor

jbms commented Mar 15, 2024

I think this is definitely outside the domain of the zarr v2 spec since it does not specify anything about the underlying storage, and I don't think we're inclined to spend more effort on the zarr v2 spec at this point.

For zarr v3 there could potentially be some sort of specification about using http access to storage.

As far as your specific question:

Neuroglancer does indeed treat 403 the same as 404 because of this problem -- that when using S3 or similar storage systems where ACLs are specified per-object, a missing object has no ACL and therefore you get 403.

However, it is somewhat unfortunate to treat 403 the same as 404 because it may hide a real permission error, and result in incorrect data being read. Because Neuroglancer is an interactive tool, missing data will generally at least be immediately visually apparent. However, for other use cases, it may not be wise to treat 403 the same as 404. Instead, it may be better to have an option for specifying how to treat 403. I haven't encountered a situation where an error other than 403 should be treated as 404.

I see in your Zarr.jl commit that you indeed expose this as an option --- that looks to be a good approach.

I don't know that a zarr http storage specification, if it existed, could really help with this issue too much, other than to suggest adding an option for treating 403 the same as 404.

@Alexander-Barth
Copy link
Author

Alexander-Barth commented Mar 18, 2024

I use Zarr in a generic non-interactive library, where the user can download datasets from different sources to prepare the initial conditions and boundary conditions for the geophysical ocean model. The fact that some data is stored in Zarr is an implementation detail that the user are typically not aware of.

It is good that julia's Zarr.jl has the option to treat 403 as 404. But I am wondering how a generic client can make this decision without having this aspect standardized in the specification.

@d-v-b
Copy link
Contributor

d-v-b commented Mar 18, 2024

I'm not sure if the spec can be authoritative on how applications should handle cases outside "chunk exists" and "chunk does not exist"

In the case of missing chunks, the spec should be clear -- fetching a missing chunk should return a chunk-sized array full of fill_value. Although, I checked the v3 spec, and I don't think there's any explicit text covering missing chunks 😨 I hope I'm wrong here!

That being said, the spec could explicitly define "chunk does not exist" to mean "we know the chunk is not there", which differs from "we are not permitted to learn anything about this chunk, which may exist or not" (the case resulting in the 403 errors from s3). The space of errors across all possible storage backends seems large, so it might be best for the spec to leave handling these situations to individual applications.

@jbms
Copy link
Contributor

jbms commented Mar 18, 2024

I think the geo-zarr community is working on standardizing some http-related things for zarr v3. It could indeed make sense to say something about this issue in a zarr http storage specification.

Ultimately though I don't think there is any way for zarr to just make this a non-issue, though, and the option will have to be exposed to users somehow.

@guigrpa
Copy link

guigrpa commented Mar 20, 2024

I think the issue is not so much in Zarr's domain, but rather in S3's domain. Some buckets allow LIST operations, in which a 404 error is returned. In other buckets, however, LISTs are forbidden, and they return a 403.

Only the 404 error is explicitly saying "the chunk you request does not exist". But the 403 can be very prevalent — after all, Zarr stores can contain millions of objects so LISTs are often disabled. So it would make sense to process 403 in the same way as 404, i.e. considering it an empty chunk.

Of course, the Zarr standard can (and probably should) determine how the individual error codes are interpreted. But in the meantime, if this is unspecified I guess 403 and 404 should be treated equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants