GetObject doesn't respect the `partNumber` Request URI parameter #56

amwolff · 2022-02-11T14:08:39Z

GetObjectNInfo needs adjustments to handle the partNumber parameter. This way, we will fully support GetObject action.

Reference: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_RequestSyntax

The text was updated successfully, but these errors were encountered:

wthorp · 2022-02-11T17:51:53Z

At the core we'd have to extend ObjectDownloadRequest / Endpoint.DownloadObject() to be capable of returning from only a specific part. This would guarantee that order limits are set correctly, etc.

It does seem that part numbers are currently being persisted. The SegmentPosition struct is serialized as uint64(pos.Part)<<32 | uint64(pos.Index) and saved in metabase DB's [segments].[position].

amwolff · 2022-02-12T15:41:02Z

Here are my findings from a quick test with the "real" S3:

partNumber parameter can be specified even for non-multipart-uploaded objects, but then x-amz-mp-parts-count is not present in the response. This scheme aligns well with imposed by AWS S3 limits, i.e., the maximum part size is 5GB, and the largest object that can be uploaded in a single PUT is 5 GB (so objects like that will only have one part);
If partNumber is specified for multipart-uploaded objects, then x-amz-mp-parts-count is present. I think this means that the client can effectively download an object aligned with how it was uploaded part-by-part.

In short, per GetObject request with this mighty parameter, we will need to gather the appropriate range for the part specified and the number of available parts—if any.

mniewrzal · 2022-02-14T08:36:02Z

Just an initial question. Is this our finding or is it's a real use case we have with one of external testsuites? I'm asking to understand the priority.

In regards to the solution, @wthorp is right and we would need to extend metainfo.DownloadObject to return the exact range for the specified part. That is the relatively easy part, usually, we have more trouble with defining how to expose this with uplink API. There I think we have 3 options:

expose it only for gatewa-mt as private code
extend DownloadObjectOptions to include partNumber
add new method like project.DownloadPart(ctx, bucket, key, partNumber, options)

If we need to have this fix sooner than later then we can also initially fix it only on uplink side. DownloadObject response at the moment should contain all necessary information to figure out part range in object, so we can use this range to download part without changing server-side. The main issue is that we will allocate bandwidth for the full object but that would be only a temporary solution.

If we agree that this is a high priority we can think to schedule time for it. As task fully for Metainfo team I don't think we will have the power to do this next sprint but if that would be cross-team effort I'm sure we can figure out something.

amwolff · 2022-02-14T10:06:49Z

Thanks so much for thorough response! Our test suite and external test suites we test against don't cover this case. This is our find at the moment. I don't think it's very high in priority (certainly good to have, though); this probably needs confirmation, but I believe it could be even scheduled for Q2 (for our compatibility-improvement milestone).

shaupt131 · 2022-02-15T19:25:50Z

Moved to backlog until S3 compatibility Improvement milestone work is underway.

wthorp · 2022-06-03T16:02:35Z

I had thought that this feature (beyond being part of the standard) would result in more optimized parallel downloads.

Most parallel download clients will ask for some arbitrary sized chunk. If these chunk sizes relatively but don't exactly align with segment sizes, we'd effectively require two requests to download each segment. However, this story is worse if downloaders grab small chunks; It could take arbitrarily many metainfo requests to download a segment.

Seeing as how gaining efficiency here would require changing how clients work (which frankly won't happen), this now seems low priority. I can see a cache of some sort being the more effective solution to this problem.

amwolff added the bug Something isn't working label Feb 11, 2022

shaupt131 added this to the S3 Compatibility Improvements milestone Feb 14, 2022

wthorp added the blocked label Feb 15, 2022

amwolff added the edge label Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GetObject doesn't respect the `partNumber` Request URI parameter #56

GetObject doesn't respect the `partNumber` Request URI parameter #56

amwolff commented Feb 11, 2022 •

edited

wthorp commented Feb 11, 2022 •

edited

amwolff commented Feb 12, 2022

mniewrzal commented Feb 14, 2022

amwolff commented Feb 14, 2022 •

edited

shaupt131 commented Feb 15, 2022

wthorp commented Jun 3, 2022

GetObject doesn't respect the partNumber Request URI parameter #56

GetObject doesn't respect the partNumber Request URI parameter #56

Comments

amwolff commented Feb 11, 2022 • edited

wthorp commented Feb 11, 2022 • edited

amwolff commented Feb 12, 2022

mniewrzal commented Feb 14, 2022

amwolff commented Feb 14, 2022 • edited

shaupt131 commented Feb 15, 2022

wthorp commented Jun 3, 2022

GetObject doesn't respect the `partNumber` Request URI parameter #56

GetObject doesn't respect the `partNumber` Request URI parameter #56

amwolff commented Feb 11, 2022 •

edited

wthorp commented Feb 11, 2022 •

edited

amwolff commented Feb 14, 2022 •

edited