-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETag on artifacts seems inconsistent #15387
Comments
Either the blob storage or caching layer for files.pythonhosted.org is serving inconsistent etags for requests before and after the 3MB mark, so disable for now. More context is at pypi/warehouse#15387
I cannot find docs on the expected contents of our B2 backends ETag, but S3 is pretty explicit
In this case, I did confirm that the stored objects are identical at least. It is clear in your case that some responses are coming from S3 and some from B2 which leads to the ETag mismatch. Due to how our caches are populated there is a time where objects (or segments of objects) may be pulled from S3 before B2 has the object. It doesn't appear that there is any header which carries a common value we could use to ensure consistency of ETag across resources. Should we just stop serving the header to avoid confusion? |
I think we can set our own We would just need a value to set that
Or we could add a value here: https://github.com/pypi/warehouse/blob/main/warehouse/forklift/legacy.py#L1455-L1459 which would be available as |
I also think it's fine to remove |
Describe the bug
I'm trying to issue Range requests for some bytes out of pypi wheels. The ETag response header is not constant, and I think it should be.
The immediate application is some code to extract metadata (yes, I know it's being backfilled), but would like to use this same code as a stopgap to enable targeted malware scanning of large wheels without downloading the whole thing.
I need to figure out if this is something that can be fixed, or if I need to work around changing etags by sniffing the
Cache-Control: immutable
or something.Expected behavior
Requesting ranges from the beginning and end of wheels returns the same etag and content-type.
To Reproduce
The
last-modified
dates also differ by a few seconds, and theserver
andx-amz-server-side-encryption
response headers are only present when requesting the beginning of the file. The cutoff appears to be at 3145728 bytes (3 * 2**20
).I'm assuming this is either bad cached content (if the etag has definitively changed) or a misbehavior in s3 (the docs are unclear on whether it can change over time, and my read of the mozilla writeup on etags is that it shouldn't depend on the range). Happy to work around it if you have ideas for doing so safely. The caching layer doesn't appear to support
If-Range
requests so I went down the road of verifying them myself.The text was updated successfully, but these errors were encountered: