Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --copy-acl option to s3-to-s3 transfers #1535

Closed
wants to merge 13 commits into from

Conversation

nikmolnar
Copy link

This PR addresses issue #901 by adding a --copy-acl option to the s3 sync, s3 cp, and s3 mv commands.

@mtdowling mtdowling added the pr:needs-review This PR needs a review from a Member. label Oct 23, 2015
'--content-language', '--expires', '--grants',
'--only-show-errors', '--expected-size',
'--page-size', '--metadata-directive',
'--ignore-glacier-warnings']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really looked into the whole PR yet, but are you using TAB here? Using SPACE would save you from the indentation trouble like this. Besides, I would suggest you simply add a new line in the middle with your new addition, and leave all other pre-existing lines intact. Don't bother re-paragraph them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry that last line did end up with tabs somehow (I resolved a merge conflict and I think the merge tool may have been configured for tabs). Everywhere else is spaces.

I'll fix this, and add the new options to the end as you suggest.

@JordonPhillips
Copy link
Member

You'll need to add tests before we can accept this. In addition I'm -1 on adding both --copy-acl and --no-copy-acl. The current and default behavior is not to copy, so a lack of --copy-acl should be sufficient in that case. We don't have a --verify-ssl flag, for instance.

@nikmolnar
Copy link
Author

@JordonPhillips Having both ---copy-acl and --no-copy-acl seemed redundant to me as well, but it seemed to fit with some of the other options (symlinks, and mime type). I'm happy to remove it.

I'll look into adding some tests as well.

@kyleknap kyleknap added the s3 label Nov 19, 2015
@bobbus
Copy link

bobbus commented Nov 26, 2015

👍

@robguilfoyle
Copy link

@nikmolnar any movement on this PR?

@nikmolnar nikmolnar closed this Jan 9, 2016
@nikmolnar nikmolnar reopened this Jan 9, 2016
@nikmolnar
Copy link
Author

@robguilfoyle I've addressed all issues so far... just waiting for further feedback from the committers.

@robguilfoyle
Copy link

right on, thanks for you work on this project

@nikmolnar
Copy link
Author

I haven't heard anything on this for a while. Is there anything I can do to move it forward?

src_acl = self.client.get_object_acl(Bucket=src_bucket, Key=src_key)
acl = {'Grants': src_acl['Grants'], 'Owner': src_acl['Owner']}

# The `get_object_acl` method doesn't return 'Type', which is required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now in botocore, you should be able to access Type now.

@jamesls
Copy link
Member

jamesls commented Feb 3, 2016

@kyleknap @mtdowling @rayluo @JordonPhillips Thoughts on this? Code wise I think this looks find (with a few small changes). I think the main hesitation is that this essentially triples the number of API calls you make (copy_object vs. copy_object, get_object_acl, put_object_acl), i.e 3x more costly from a request price standpoint. I see the docs in this PR clearly call this out, but I'd like to see what others think.

We'd probably also want to add an integration test in tests/integration/customizations/s3/test_plugin.py to make sure the copy-acl works as expected.

Probably worth testing that --copy-acl has precedence over --acl.

@jamesls
Copy link
Member

jamesls commented Feb 3, 2016

Two more additional pieces of feedback:

  • It doesn't look like this works for multipart copies (where the file size is over the default 8mb threshold). There's a different code path for multipart copies.
  • We may need to consider using a more general name. If we start copying acls, it seems reasonable to then allow PRs for copying other things (encryption, storage class, etc). If that happens, would we rather have individual args such as --copy-acl, --copy-storage-class, --copy-sse, or a single arg with multiple values --not-sure-the-name acl,storage_class,sse which would also allow something like --not-sure-the-name EVERYTHING which would indicate you want everything about the object copied.

@jamesls jamesls added incorporating-feedback and removed pr:needs-review This PR needs a review from a Member. labels Feb 3, 2016
@nikmolnar
Copy link
Author

@jamesls Thanks for the feedback. I'll look into the problem with multipart copies, and incorporating the recent fixes to botocore with respect to Type. I'm out all next week and probably won't get to it before then.

As for the name, and the possibility for similar flags, it seems reasonable to look at rsync or other similar cli utilities as an example. rsync provides individual flags for copying specific attributes (modify times, permissions, etc.) and all provides flags like --archive which are essentially combinations of other flags. Perhaps we could do something similar here? Have individual flags such as --copy-acl, --copy-storage-class, et. al., and then have something like --copy-all?

@yanowitz
Copy link

I would love to have this functionality. Thank for you for writing it.

@nikmolnar
Copy link
Author

Ok, I've added support for multipart copies, and removed the workaround for grantee Type.

@yanowitz
Copy link

I, for one, welcome our new ACL overlords. Thanks @nikmolnar

@yanowitz
Copy link

yanowitz commented Aug 3, 2016

so what has to happen for this to get merged?

@wysekm
Copy link

wysekm commented Aug 16, 2016

Guys, seriously, why are you holding this back? I have no idea, why would you want to do that. It doesn't break any existing functionality, it doesn't change any core features or behavior. Instead, it adds functionality that is desirable by a lot of people. Yes, it also adds some overhead, but since the S3 is designed in a way, that copying keys with ACLs requires additional requests, that's what we get and what cannot be changed.

@yanowitz
Copy link

@jamesls do you know what needs to be done to get @nikmolnar 's useful work merged?

@kyleknap
Copy link
Member

@wysekm @yanowitz

Yes this is functionality that we do want in the CLI.

So I think the main issue is that the s3 command code base is in a bit of a flux right now, we are currently working to port s3transfer library into AWS CLI in an effort to fix many of the current bugs with the CLI, make the s3 command codebase more manageable, and enable us to implement some of the more heavily requested feature like this one. So a decision on timing needs to be made here.

There are two options:

Merge the current PR as is and do not wait on s3transfer:
Steps required for this:

  • Rebase PR so that it no longer has merge conflicts with the develop branch
  • Make a final decision on whether we want to call it --copy-acl or a more general --copy-extra that will allow you to copy whatever sort of data you requested such as storage class, acls. I am not sold on the idea of adding --copy-x for every possible metadata option because there are a bunch of possible ones (a lot more that other utilities like rsync) and I would like to avoid bloating the sync and cp commands. Any conversation on this idea would be welcomed.

Wait till s3transfer gets merged in:
Steps require for this:

  • Wait till the merging on s3transfer is complete
  • Take logic from this PR and convert it to s3transfer
  • Make a decision on the parameter naming.

In the end, I do not think it matters too much on what path we choose here, but things that must happen are the following:

  1. Make the code mergable with the current state of the develop branch
  2. Make a decision on whether to call it --copy-acl or try to generalize it

Once these are done, I think we will be in a good state to merge it. No matter in the end if the PR is used or not, the functionality will need to be ported to s3transfer, which will actually give users a speed up in using the parameter because the GetObjectAcl call will be happening in separate threads.

I think I would prefer we wait till s3transfer gets ported over as it will require less amount of moving parts when integrating. However if both of the points that are enumerated get complete, I do not see a reason why to not hold back the merging of the PR, especially if porting s3transfer takes longer than expected. It is just important to note that once the s3transfer is integrated a good amount of the code added in this PR will be reworked and I am not sure how much of the original code, other than maybe the tests and the parameter setup that will stay intact.

Let me know what you all think.

@yanowitz
Copy link

Thanks for the detailed response @kyleknap . I would defer to @nikmolnar about the best way to proceed, as I'm not the original author. In the meantime, I'll use this branch, which is, admittedly, a 😿 approach. Cheers.

@nikmolnar
Copy link
Author

@kyleknap What is the current ETA for porting s3transfer?

@kyleknap
Copy link
Member

@nikmolnar while I cannot give an exact date I would say the work is in terms of weeks. The current port branch is integrate-s3transfer. I am currently working on getting cp integrated right now. Once cp is integrated, this parameter can then be ported to the version of the s3 commands that are integrated with s3transfer even though the rest of the commands (such as sync, mv, etc.) may not have been ported yet.

@Blackduke77
Copy link

HI, I am looking for this exact functionality for a while now. Is this still on the road map, and if so any idea when it might be available?
Thanks

@baburdick
Copy link

👍 for merging this. Much needed functionality. Any updates?

@yanowitz
Copy link

Agreed with @baburdick -- it's been 3 more months, what's up?

@wysekm
Copy link

wysekm commented Jan 4, 2017

Is there any progress on this?

@anguo-wenz
Copy link

@nikmolnar if the target S3 not same with source S3, the owner field of acl maybe not work.
I suggest you to change
src_acl = self.client.get_object_acl(Bucket=src_bucket, Key=src_key) acl = {'Grants': src_acl['Grants'], 'Owner': src_acl['Owner']} self.client.put_object_acl( Bucket=bucket, Key=key, AccessControlPolicy=acl )

to
src_acl = self.client.get_object_acl(Bucket=src_bucket, Key=src_key) dest_acl = self.client.get_object_acl(Bucket=bucket, Key=key) acl = {'Grants': src_acl['Grants'], 'Owner': dest_acl['Owner']} self.client.put_object_acl( Bucket=bucket, Key=key, AccessControlPolicy=acl )

please review

@shaunbrady
Copy link

@kyleknap Re: "There are two options:" Must they be mutually exclusive? Or has progress been made in the s3transfer branch such that it's not worth doing the first option? Our use case uses 'sync' for what it's worth.

@nikmolnar Any status on there rebase?

@jsheth16
Copy link

Any status on this pull request? It would be super useful.

@nikmolnar
Copy link
Author

This PR should probably be closed. At this point, I'm guessing that s3transfer has been merged in, which means that these changes are now obsolete and the functionality will need to be re-implemented in s3transfer.

@baburdick
Copy link

@nikmolnar: Are you referring to these?: #2251 , c86e5e2 , #2239

@nikmolnar
Copy link
Author

@baburdick: Yes, and also @kyleknap's comments earlier in this thread:

No matter in the end if the PR is used or not, the functionality will need to be ported to s3transfer. . . .

@nikmolnar
Copy link
Author

This PR is now quite obsolete. Closing.

@nikmolnar nikmolnar closed this Aug 21, 2018
@aaronjensen
Copy link

For anyone else coming here, I was able to use this to copy one bucket to another and keep ACLs: https://github.com/cobbzilla/s3s3mirror/tree/2.1-stable

It's also significantly faster.

@Rovel
Copy link

Rovel commented Jan 2, 2019

Hi, @nikmolnar I searched for the s3transfer and found this repo but no docs or more info, not updated since aug 2018 any chance to have that s3 sync --acl-per-file back?
or maybe some docs in s3transfer, or maybe some official docs on how to transfer a bucket with a per file acl in the same account using the official aws cli?

@nikmolnar
Copy link
Author

@Rovel: unfortunately, this PR is obsolete. Since aws-cli has migrated to using s3transfer, this would need to be implemented using that, which is likely quite different from how it's implemented here. I don't have time for that at the moment.

@Rovel
Copy link

Rovel commented Jan 8, 2019

thanks for the info @nikmolnar

@djonatanb
Copy link

Could we have a comment on the solution to the problem irregardless if this PR is obsolete? maybe from someone in the development team ? I mean, that @nikmolnar PR would have solved a lot of problems. Seems like the problem has been acknowledged and going from "merging in the next weeks" to "sorry won't ever merge" should not be the end. Maybe there's already a solution in place that could be pointed at from this issue.
Cheers and thanks @nikmolnar anyways.

thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this pull request Feb 12, 2022
* feat: colors for deploy command

* lint: fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet