Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-548] Chunked Uploads for CloudFiles #289

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

Johnetordoff
Copy link
Contributor

@Johnetordoff Johnetordoff commented Oct 24, 2017

Note (Added by Longze)

This PR contains code from: #283.

Ticket

https://openscience.atlassian.net/browse/SVCS-548

Purpose

This PR allows one to upload files larger than 5GBs

Changes

Adds a new method to clould files provider with tests and raises max body limit to arbitrarily high 1 Terra byte.

Side effects

None that I know of.

QA Notes

It will take a long time to upload a > 5 GB files, have something else to do while you wait.

Deployment Notes

None that I know of.

@coveralls
Copy link

coveralls commented Oct 24, 2017

Coverage Status

Coverage increased (+0.3%) to 89.511% when pulling a13d088 on Johnetordoff:cloudfiles-large-uploads into cc68aca on CenterForOpenScience:develop.

Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to mostly keep the review to things related to the chunked-upload commit. I think there were maybe 1 or two style things i commented on that may have been from your original CloudFiles ticket.

also maybe add QA notes for jira ticket. That way whoever is testing knows how to trigger the chunked uploads etc.

@@ -323,7 +323,6 @@ def request(self, *args, **kwargs):
"""
assert src_path.is_dir, 'src_path must be a directory'
assert asyncio.iscoroutinefunction(func), 'func must be a coroutine'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to delete this blank line

@pytest.mark.asyncio
@pytest.mark.aiohttpretty
async def test_chunked_upload_segment_size(self,
connected_provider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test testing over the one above it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiohttpretty.register_json_uri('GET', revision_url, body=revision_list)

metadata_url = connected_provider.build_url(path.path)
aiohttpretty.register_uri('HEAD', metadata_url, status=200, headers=file_header_metadata)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 blank lines

"""
if stream.size > self.SEGMENT_SIZE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does cloudfiles need a call to handle_naming?

async def delete(self, path, **kwargs):
async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True):

created = not (await self.exists(path)) if check_created else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in upload, you have it as

       if check_created:
            created = not (await self.exists(path))
        else:
            created = None

which is much more readable. Maybe go back to that version instead of the more confusing 1 liner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability is pretty subjective, if you recognize this as a ternary operator this is a pretty simple statement. You can read more about the ternary operators here: http://book.pythontips.com/en/latest/ternary_operators.html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I might agree with Addison here. If it's a simpler ternary I would agree, but because self.exists(path) is a little opaque and as soon as that async is added in it gets a little unclear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.

@coveralls
Copy link

coveralls commented Nov 6, 2017

Coverage Status

Coverage increased (+0.3%) to 89.429% when pulling a75e70f on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 6, 2017

Coverage Status

Coverage increased (+0.3%) to 89.401% when pulling 71ddc86 on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.3%) to 89.401% when pulling 2bfa324 on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Notes
Other than that ready for next phase.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 89.386% when pulling 680aa72 on Johnetordoff:cloudfiles-large-uploads into 473191c on CenterForOpenScience:develop.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 89.386% when pulling 680aa72 on Johnetordoff:cloudfiles-large-uploads into 473191c on CenterForOpenScience:develop.

Copy link
Contributor

@icereval icereval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to review this, i'll try later this week.

@cslzchen
Copy link
Contributor

cslzchen commented Nov 8, 2017

@icereval thanks. Just fyi that this PR contains code from #283 which adds Cloudfiles as a provider. You might want to take a look at that one as well.

@coveralls
Copy link

coveralls commented Nov 15, 2017

Coverage Status

Coverage increased (+0.2%) to 90.191% when pulling 03587f9 on Johnetordoff:cloudfiles-large-uploads into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.2%) to 90.191% when pulling 02f4cac on Johnetordoff:cloudfiles-large-uploads into 26bf209 on CenterForOpenScience:develop.

@cslzchen cslzchen changed the title [SVCS-548] Cloudfiles large uploads [SVCS-548] [Blocked] [ChunkedUploads] Update Cloudfiles provider to use chunked-uploads Dec 26, 2017

created = not (await self.exists(path)) if check_created else None

for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these happen sequntially, Would be better to happen in parallel

async def delete(self, path, **kwargs):
async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True):

created = not (await self.exists(path)) if check_created else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.


for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)):
data = await stream.read(self.SEGMENT_SIZE)
resp = await self.make_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment explaining what the function of this request is in the upload.

)
await resp.release()

resp = await self.make_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment explaining what the function of this request is in the upload.

if data.get('subdir'):
return CloudFilesFolderMetadata(data)
elif data['content_type'] == 'application/directory':
return CloudFilesFolderMetadata({'subdir': data['name'] + '/'})
return CloudFilesFileMetadata(data)

@ensure_connection
async def create_folder(self, path, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have anything to do with multipart upload or are there more than one ticket here?

@cslzchen cslzchen changed the title [SVCS-548] [Blocked] [ChunkedUploads] Update Cloudfiles provider to use chunked-uploads [SVCS-548] Chunked-uploads for Cloudiles Jun 19, 2018
@cslzchen cslzchen changed the title [SVCS-548] Chunked-uploads for Cloudiles [SVCS-548] Chunked Uploads for CloudFiles Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants