Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-486] Add Cloudfiles Provider #283

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

Johnetordoff
Copy link
Contributor

Ticket

https://openscience.atlassian.net/browse/SVCS-486

Purpose

Expand the cloudfiles provider to make more like a real provider,

Changes

Adds upload, download, fetch metadata, move, copy, folders creation etc to the Cloudfiles provider, also includes unit tests and moves fixtures to new file. Changes some methods to support unicode file names.

Side effects

None that I know of.

QA Notes

Should include testing for special characters in filenames.

Deployment Notes

This should be deployed be for the corrosponding osf.io PR, CenterForOpenScience/osf.io#7781

@Johnetordoff Johnetordoff changed the title Cloudfiles [WIP][SVCS-486]Add Cloudfiles Provider Oct 12, 2017
@coveralls
Copy link

coveralls commented Oct 13, 2017

Coverage Status

Coverage increased (+0.2%) to 89.217% when pulling 7a759d0 on Johnetordoff:cloudfiles into 481c9d9 on CenterForOpenScience:develop.

@Johnetordoff Johnetordoff changed the title [WIP][SVCS-486]Add Cloudfiles Provider [SVCS-486]Add Cloudfiles Provider Oct 17, 2017
@coveralls
Copy link

coveralls commented Oct 17, 2017

Coverage Status

Coverage increased (+0.2%) to 89.217% when pulling 9fef6c6 on Johnetordoff:cloudfiles into 481c9d9 on CenterForOpenScience:develop.

Copy link
Member

@felliott felliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a light pass, I will do a more thorough one once I've got my local docker set up. The thing about adding links to the API docs in the method docstrings should be done for all methods. It's a huge time-saver when trying to debug. I've started adding them as I go, and this would be a great chance to fill out RsCf.

@@ -32,7 +35,8 @@ def ensure_connection(func):
class CloudFilesProvider(provider.BaseProvider):
"""Provider for Rackspace CloudFiles.

API Docs: https://developer.rackspace.com/docs/cloud-files/v1/developer-guide/#document-developer-guide
API Docs: https://developer.rackspace.com/docs/cloud-files/v1/developer-guide/
#document-developer-guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: you don't need to wrap urls in docstrings. If flake complains about them, we can append a # noqa: E501 to it.


@property
def extra(self):
return {
'hashes': {
'md5': self.raw['etag'].replace('"', ''),
'md5': self.raw['ETAG'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did Cloudfiles stop wrapping their etags in double-quotes? If so, they need to be added for the etag property. etag should have them, hashes.md5 shouldn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should quotes be added? They don't appear naturally as part of the metadata response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -211,18 +232,23 @@ def sign_url(self, path, method='GET', endpoint=None, seconds=settings.TEMP_URL_
:param int seconds: Time for the url to live
:rtype str:
"""
from urllib.parse import unquote
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to top of file.


@ensure_connection
async def create_folder(self, path, folder_precheck=None):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring with a link to the API docs for the endpoint this uses.


@ensure_connection
async def revisions(self, path):
version_location = await self._get_version_location()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring w/ url to API docs.

"""Deletes the key at the specified path
:param str path: The path of the key to delete
:rtype ResponseStreamReader:
:param bool confirm_delete: not used in this provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the user tries to DELETE /resources/mst3k/providers/cloudfiles/? The user should receive a warning that deleting everything in the root folder is only allowed if confirm_delete=1 is given. It also should not attempt to delete the actual root folder. I don't know how that would play out on S3, but on a dropbox-like provider, that would break the addon as the user would no-longer be connected to their storage root

)
json_resp = await resp.json()
current = (await self.metadata(path)).to_revision()
return [current] + [CloudFilesRevisonMetadata(revision_data) for revision_data in json_resp]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this response not include the metadata for the latest revision? If so, that's worthy of a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment in the docstring.

import json
import pytest
import aiohttp
import aiohttpretty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: pytest, aiohttp* are third-party and should be separated from the core libs. unittest is a core lib.

@coveralls
Copy link

coveralls commented Oct 20, 2017

Coverage Status

Coverage increased (+0.4%) to 89.402% when pulling 03442e3 on Johnetordoff:cloudfiles into 481c9d9 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Oct 20, 2017

Coverage Status

Coverage increased (+0.4%) to 89.388% when pulling 87e5fca on Johnetordoff:cloudfiles into 481c9d9 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Oct 24, 2017

Coverage Status

Coverage increased (+0.2%) to 89.388% when pulling eaab942 on Johnetordoff:cloudfiles into cc68aca on CenterForOpenScience:develop.

@Johnetordoff Johnetordoff changed the title [SVCS-486]Add Cloudfiles Provider [SVCS-486] Add Cloudfiles Provider Nov 7, 2017
@coveralls
Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.2%) to 89.305% when pulling a48290b on Johnetordoff:cloudfiles into ba55731 on CenterForOpenScience:develop.

Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass, will for sure need to do another large review after these changes.
Lot of style changes and other things, here goes:

  1. Lots of extra blank lines to be fixed, especially in fixtures/tests
  2. other style changes in a few places (import orders etc)
  3. unused variable body in a test
  4. A few exception raising tests have no real tests beyond the exception raising. Suggestion: consider adding a few for robustness
    5.Cloud files api link doesn't work correctly
  5. Throughout the Cloudfiles.provider, update docstrings to reflect changes (I noted a few but there are probably more). Suggestion: Add default variable typing and default return typing where appropriate.
  6. Questions: Why are you removing **kwargs all over the place? Most providers seem to just leave it even if it isn't used.
  7. delete needs changes. It should default to 0, and look for confirm_delete == 1 rather than not confirm_delete also deleting a root should follow special steps over regular deleting. See how other providers handle it. How you have it set up, the actual root will be deleted (I think). It should just delete the contents of the root.
  8. in quite a few places, you call _metadata_<helper_function> over just calling metadata for style and consitency reasons, you should always be calling just metadata
  9. path_str variable is only used once. Just replace it with its value, path.path
  10. Move functions around, specifically revisions and create_folder. They are in the middle of the _<helper_function>'s and should be more near metadata
  11. conflict checking on create_folder. Other providers check for a 409 and respond accordingly
  12. the _get_version_location error has a link in it. Unsure if that is kosher.
  13. get_metadata_revision should be named in the form of _metadata_revision. Similar to the other metadata() helper functions

from waterbutler.providers.cloudfiles import CloudFilesProvider



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line. Should be 2 instead of 3.

return CloudFilesProvider(auth, credentials, settings)



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line. Should be 2 instead of 3.

return streams.FileStreamReader(file_like)



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line. Should be 2 instead of 3.

@pytest.fixture
def folder_root_empty():
return []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs another blank line. Should be 2 instead of 1.

def container_header_metadata_without_verision_location():
with open(os.path.join(os.path.dirname(__file__), 'fixtures/fixtures.json'), 'r') as fp:
return json.load(fp)['container_header_metadata_without_verision_location']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs another blank line. Should be 2 instead of 1.

'versions follow the instructions here: '
'https://developer.rackspace.com/docs/cloud-files/v1/'
'use-cases/additional-object-services-information/'
'#object-versioning')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be the best to add links in error messages. They get outdated, etc.
@felliott ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to @felliott about this. My understanding is that for now, we want to treat this more like other providers. If they don't have versioning set up, let's keep it simple. No error message, let's just not show versions, only current.

@ensure_connection
async def revisions(self, path):
"""Get past versions of the request file from special user designated version container,
if the user hasn't designated a version_location container in raises an infomative error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in raises" -> "it raises"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also revisions should probably be moved next to metadata, and not in the middle of the _<helper_function>'s

the revision list after other revisions are returned. More info about versioning with Cloud
Files here:

https://developer.rackspace.com/docs/cloud-files/v1/use-cases/additional-object-services-information/#object-versioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line length. Although idk how to fix line length in a docstring

return [current] + [CloudFilesRevisonMetadata(revision_data) for revision_data in json_resp]

@ensure_connection
async def get_metadata_revision(self, path, version=None, revision=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should follow the same naming scheme as the other metadata helper functions:
eg: _metadata_<for some thing>
so this should be called _metadata_revision

if data.get('subdir'):
return CloudFilesFolderMetadata(data)
elif data['content_type'] == 'application/directory':
return CloudFilesFolderMetadata({'subdir': data['name'] + '/'})
return CloudFilesFileMetadata(data)

@ensure_connection
async def create_folder(self, path, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_folder should probably be moved so it is not in the middle of the _<helper_function>'s

Johnetordoff and others added 2 commits November 15, 2017 16:58
…erbutler into cloudfiles

* 'develop' of https://github.com/CenterForOpenScience/waterbutler: (77 commits)
  final cleanups for onedrive
  make provider readonly
  start cleaning up and reorganizing onedrive
  implement Microsoft OneDrive provider using dropbox as base
  added onedrive provider -- copied from Dropbox
  don't assume file metadata has a modified/created date
  add mypy type annotations
  add modified/created dates to file metadata
  expand docstrings
  add ruby serialization workaround to download
  clean up commit_sha vs. branch_name handling
  add tests for revisions and uninit-ed repos
  add artificial test for missing folder
  remove unused and obsolete code
  rewrite test suite for provider changes and coverage
  update some more docstrings on the provider
  GL provider is read-only: folder creation is not allowed
  add workaround for non-existent directories not being reported in GL
  document workarounds in download & _fetch_file_contents
  remove unneeded error wrapping from metadata
  ...
@coveralls
Copy link

coveralls commented Nov 15, 2017

Coverage Status

Coverage increased (+0.2%) to 90.106% when pulling f7a3f60 on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.2%) to 90.104% when pulling 3e62ede on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 17, 2017

Coverage Status

Coverage increased (+0.2%) to 90.104% when pulling e4ad0a9 on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 27, 2017

Coverage Status

Coverage increased (+0.2%) to 90.127% when pulling 64c9a23 on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Dec 5, 2017

Coverage Status

Coverage increased (+0.05%) to 89.997% when pulling 9a58bed on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

Copy link
Member

@TomBaxter TomBaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to look at here. My notes/questions about revisions probably aren't actionable. Made a few comments about _delete_folder_contents . In addition to those comments, I tested deleting root and it returned 204 success, but it didn't actually delete the contents of root. Shaping up nicely, back to you.

"""Deletes the key at the specified path
:param str path: The path of the key to delete
:rtype ResponseStreamReader:
:param int confirm_delete: Must be 1 to confirm root folder delete, this deletes entire
container object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this delete container object? Folder deletes against root folder should delete the contents of the folder/container, but not the folder/container itself.


if path.is_root and not confirm_delete:
raise exceptions.DeleteError(
'query argument confirm_delete=1 is required for deleting the entire container.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, delete root folder shouldn't actually delete folder, only contents.

)

if path.is_dir:
await self._delete_folder_contents(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently _delete_folder_contents is being used only for the purpose of deleting the contents of the root folder, but not the folder itself. The if should probably be if path.is_root . Other (non-root) folders should be deleted along with their contents.
The thought behind not deleting root folder, even when specifically requested, is that it busts the osf.io addon config.

for item in metadata
]

delete_files.append(os.path.join('/', self.container, path.path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_delete_folder_contents shouldn't delete the folder itself.

await resp.release()

@ensure_connection
async def metadata(self, path, recursive=False, version=None, revision=None, **kwargs):
"""Get Metadata about the requested file or folder
:param str path: The path to a key or folder
:rtype dict:
:rtype list:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update doc strings for new args or maybe switch to type hints.

@@ -339,11 +381,12 @@ def _extract_endpoints(self, data):
)
data = await resp.json()

# no data and the provider path is not root, we are left with either a file or a directory marker
# no data and the provider path is not root, we are left with either a file or a directory
# marker
if not data and not path.is_root:
# Convert the parent path into a directory marker (file) and check for an empty folder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r/(file)/(item)/

if not data and not path.is_root:
# Convert the parent path into a directory marker (file) and check for an empty folder
dir_marker = path.parent.child(path.name, folder=False)
metadata = await self._metadata_file(dir_marker, is_folder=True, **kwargs)
dir_marker = path.parent.child(path.name, folder=path.is_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is path.parent.child(path.name, folder=path.is_dir) different then path? Can't wrap my head around it. How am I not my father's son?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function child is to create a new child path with a new name, so you are your father's child, but your sibling is your father's child with a different name.

resp = await self.make_request(
'PUT',
functools.partial(self.sign_url, path, 'PUT'),
expects=(200, 201),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both these codes possible, I would think that the response would always be one or the other, hopefully 201, on success.

'versions follow the instructions here: '
'https://developer.rackspace.com/docs/cloud-files/v1/'
'use-cases/additional-object-services-information/'
'#object-versioning')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to @felliott about this. My understanding is that for now, we want to treat this more like other providers. If they don't have versioning set up, let's keep it simple. No error message, let's just not show versions, only current.


@ensure_connection
async def revisions(self, path):
"""Get past versions of the request file from special user designated version container,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r/request/requested/

@coveralls
Copy link

coveralls commented Jan 10, 2018

Coverage Status

Coverage increased (+0.2%) to 90.177% when pulling 78230fc on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Jan 11, 2018

Coverage Status

Coverage increased (+0.2%) to 90.134% when pulling e63f8bf on Johnetordoff:cloudfiles into 26bf209 on CenterForOpenScience:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants