Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to have limited authorisation on Datasets uploaded to Swift/S3 on CESNET? #17

Open
guillaumeeb opened this issue Sep 17, 2022 · 17 comments

Comments

@guillaumeeb
Copy link
Member

Question by @tinaok:

Hi, I have a test pangeo config to run, but I need to make some part of data available only for a few user. Is it possible to create Zarr file on Cesnet's swift 'private' (the owner) 'group' (some of people who have access to pangeo-eosc platform) and 'public' internet open ?

@tinaok could you precise a bit your need?

  • Dataset should be only visible to some users?
  • Dataset should be only writable by some users, but can be viewed and read by anyone?
  • What do you mean between owner and group?
@tinaok
Copy link
Collaborator

tinaok commented Sep 19, 2022

Thank you @guillaumeeb

  • Dataset should be only visible to some users?
    yes.
  • Dataset should be only writable by some users, but can be viewed and read by anyone?
    no.
  • What do you mean between owner and group?

In case of linux system, in HPC center, we create unix group and add some users, who want to share share data, in that unix group. And there we control it with chmod g+r o-r toto.nc
In our case, other, is internet. group, can be the all the people have EGI autherised access for Pangeo cloud, or Pangeo cloud admin group. My question is how do we create a bucket only accessible with these group of people but not from internet, with our EGI authentification system.

@guillaumeeb
Copy link
Member Author

OK, so I think we'll need @sebastian-luna-valero's help on this one, and probably some of CESNET staff also. I can still try to answer some points.

There is no such thing of user:group concept in Cloud and object store, things are different. You've got user accounts (EGI here), projects or tenants (Pangeo VO I guess), and you can usually define roles and policies with all that. These policies are kind of ACL (Access Control List): they define who can perform which operation on a Project or on a Bucket/Container. I'm not sure how this is implemented in CESNET, but there I checked in the doc that it is possible to use something like this on Openstack.

By default, with Horizon interface or during bucket/container creation, we can only specify is a container is public (visible on internet) or not. So the situation is as below I think:

  • Pangeo VO admins have read/write access to any buckets.
  • Pangeo VO users have read access only to any bucket (? to be confirmed).
  • Everyone in the world has read access only to any public bucket.

Be careful: if you create an S3 Access/Secret keys pair, and give it to another person, it will be by default a admin keys pair.

So to know if we can set more precise rules, we'll need help from other people to know which Openstack command we could type, and if this is compatible with S3 or only Swift credentials.

@sebastian-luna-valero
Copy link
Collaborator

Hello,

Here is the current situation:

  • When you create buckets as private: only Pangeo VO admins have read/write access to any buckets.
  • When you create buckets as public: everyone has read access, and Pangeo VO admins have read/write access too.

Please note that currently:

  • Pangeo VO admins have access to DaskHub and OpenStack.
  • Pangeo VO users have access to DaskHub, but they don't have access to OpenStack. (i.e. they won't have access to private buckets)

If we need something intermediate, we will need to explore options in:
https://docs.openstack.org/swift/latest/overview_acl.html

Please let me know your thoughts.

Best regards,
Sebastian

@tinaok
Copy link
Collaborator

tinaok commented Sep 21, 2022

Hi Sebastian,
The use case I have in mind requires 'something intermediate'.
I'll have some users who does not require OpenStack dashboard access. But requires DaskHub, and requires 'private' buckets only for these users. It is ok that Pangeo VO admins access to these datas as they are admins.

@tinaok
Copy link
Collaborator

tinaok commented Sep 21, 2022

I have related questions to @sebastian-luna-valero. If we use s3 access through MinIO server proposed at IM Dashboard, do we have different type of user groups? Or as it will be anyway backed up with EGI check-in for user control, it is same as using openstack object storage directly from CESNET?

@sebastian-luna-valero
Copy link
Collaborator

Hi,

To address this issue I have opened: #23

Here is the status after merging that PR:

  1. Who can create/destroy VMs in the cloud (e.g. to deploy DaskHub)? Members of the pangeo.admins VO group in aai.egi.eu
  2. Who has access to DaskHub? Members of the vo.pangeo.eu VO in aai-dev.egi.eu. Ideally we want this to be moved to aai.egi.eu as well.
  3. Who has read/write access to object storage? Members of the vo.pangeo.eu VO in aai.egi.eu

Now, following instructions to configure awscli users that want private buckets should be able to do that using --acl private with aws s3 commands.

All of the above should address the comments from @tinaok

The use case I have in mind requires 'something intermediate'. I'll have some users who does not require OpenStack dashboard access. But requires DaskHub, and requires 'private' buckets only for these users. It is ok that Pangeo VO admins access to these datas as they are admins.

Regarding the question about MinIO. If you deploy it with IM Dashboard you have full control over it (i.e. you can decide to configure EGI Check-In or any other user accounts/groups). However, please bear in mind that it's not only about deploying and configuring MinIO, it will be also another service to be maintained by us. Therefore, I would leave this as last resort, and use the object storage at CESNET that is already managed.

@sebastian-luna-valero
Copy link
Collaborator

@guillaumeeb
Copy link
Member Author

Now, following instructions to configure awscli users that want private buckets should be able to do that using --acl private with aws s3 commands.

So what you are saying here, is that once we've setup our AWS S3 credentials, we can use aws s3 commands, following https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object-acl.html, to position specific ACLs on any storage bucket/container?

I'll try that later on this week or the next.

However, please bear in mind that it's not only about deploying and configuring MinIO, it will be also another service to be maintained by us. Therefore, I would leave this as last resort, and use the object storage at CESNET that is already managed.

👍 about this, handling our own object store would certainly be some work. And we'll also probably run into performance concerns.

@sebastian-luna-valero
Copy link
Collaborator

So what you are saying here, is that once we've setup our AWS S3 credentials, we can use aws s3 commands, following https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object-acl.html, to position specific ACLs on any storage bucket/container?

I have only tested the --acl private option. Being OpenStack Swift underneath I am not sure whether all the AWS S3 options will be supported. Please test and let us know.

@guillaumeeb
Copy link
Member Author

Could you just clarify a bit how you see the storage permissions using S3 interface after #23, so with containers/buckets created in another Openstack project?

  • Members of the vo.pangeo.eu VO in aai.egi.eu have read/write access on every buckets? Is there an admin group?
  • There is still the public possibility I guess: read access for the entire world?
  • So what is a "private" bucket: only accessible by the owner? (Sorry didn't try it for now).

@tinaok
Copy link
Collaborator

tinaok commented Oct 13, 2022

following #39 (comment)

What shall we tell students to do to avoid that one student delete another student's data ?

All students, I'll add them in member of vo.pangeo.eu in aai.eu , so that they can read/write in private bucket that I'll create for each working group.

But if I understood right, unlike HPC centres, that if one user make Zarr file, other user, they can delete this Zarr file by mistake?

Until we find solutions, I'll explain them to 'check the path' so do not touch other's file, but if we can find better solution it would be nice.
I wonder how Pangeo US cloud are dealing with this....

@sebastian-luna-valero
Copy link
Collaborator

Hi,

The problem is with the translation of the federated identity from Check-In into the local identity at CESNET. This issue is very specific to the federated AAI infrastructure that we are using for this deployment. If other deployments use other authentication/authorization methods, they won't have the same issue.

Indeed, the recommendation until the issue is solved is to be careful with the path. As long as everybody writes on their own bucket/path, everything should be fine. Maybe they can use their own user ID as a prefix? Hopefully that's unique to everyone.

Apologies, CESNET has been looking into the issue, but it's not an easy one to solve.

@sebastian-luna-valero
Copy link
Collaborator

I believe this has been fixed with MinIO. Do you want to test or should we directly close this?

@tinaok
Copy link
Collaborator

tinaok commented Jul 17, 2023

Thank you @sebastian-luna-valero, yes I would like to test it to understand the procedure, which documentation I should follow? Thank you for your help.

@sebastian-luna-valero
Copy link
Collaborator

Hi @tinaok

This is the starting point:
https://github.com/pangeo-data/pangeo-eosc/blob/main/users/users-getting-started.md#access-minio

Please give it a go and let us know how it goes.

Best regards,
Sebastian

@tinaok
Copy link
Collaborator

tinaok commented Jul 17, 2023

Thank you @sebastian-luna-valero, I couldn't create a bucket, may be because I'm not connected as administrator?
Tina

@sebastian-luna-valero
Copy link
Collaborator

Could you try following these steps?

https://github.com/pangeo-data/pangeo-eosc/blob/main/users/how-to/TestMinIO.ipynb

I think we should link the example from the getting started guide: #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants