-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725
base: master
Are you sure you want to change the base?
Conversation
@chensun @zijianjoy @jacobmalmberg @ca-scribner i finally have something that is worth merging for namespace isolation. It is also quite easy to test on your own instances, because i only changed 4 configuration files. |
/assign @zijianjoy |
@chensun can you provide some feedback? I will happily implement any changes if needed. |
Works nicely now. |
@Fkawala do you want to help with AWS S3 or GCS? |
@FKkawala @surajkota @zijianjoy @ca-scribner This is what i have discussed yesterday with the AWS distribution maintainainer (@surajkota ) The current implementation here is perfectly suited for MinIO and MinIO only. We could try to adapt it to GCS and AWS, but there are some serious concerns from the AWS developers regarding that.
And in addition to three different architectures we also have a 3 year old MinIO without security fixes as pointed out by @zijianjoy in #7878. So we start with the most important philosophical questions in Life: What is the purpose of Life and which S3 compatible storage shall i choose. One of the main requirements form the AWS developers are:
That means we have to keep the storage permission handling inside of Kubeflow and abstract it away, e.g. with Istio authorizationpolicies. From all the S3 compatible storages so far only Zenko cloudserver and S3proxy seem simple enough to deploy and have a gateway mode availabe. Zenko seems to be a bit more mature https://s3-server.readthedocs.io/en/latest/CLIENTS.html and tested than https://github.com/gaul/s3proxy#limitations.Tthey also list the gcs S3 backend as supported as well https://s3-server.readthedocs.io/en/latest/developers/getting-started.html?highlight=google-cloud-storage#add-support-for-a-new-backend. We can easily deploy a single container for the different backends similar to the current minio gateway as shown in https://s3-server.readthedocs.io/en/latest/USING_PUBLIC_CLOUDS.html?highlight=backends#run-the-server-as-a-docker-container-with-the-ability-to-write-to-aws-s3. So to summarize we try Zenko cloudserver (Apache 2 licence and actively maintained and proposed by james Liu) in gateway mode for AWS and GCP and with the file backend for on-premises deployments. But how to we abstract the permissions now. How can we get rid of a lot of compley configuration in Kubeflow?
This is what wee need to implement with the istio authorizationpolicy (or another sidecar as for example nginx) so we need to allow exactly
{Allowedpaths} are just the paths from the Resource field in the bucket policy. For example google is using the following in KFP:
and we could use something similar to
The kubeflow-pipelines-profile-controller can easily manage the necessary authorizationpolicies. I guess @surajkota as employee of the S3 protocol company can provide the community with a beutiful s3 browser similar to https://github.com/nricklin/gbdx-s3-browser that i can directly integrate into the kubeflow sidebar as i did with the minio UI. Please let it consume the url parameter ?ns=example-namespace as the other KF components do, such that it displays the right folders (instead of permission errors) without user intervention. A simple list/upload/download functionality should be enough. This solves the long standing storage backend problems by abstracting similar to the current GCP and minio gateway deployment and is in line with the current Kubeflow architecture. It uses the namespace/profile to determine what the user can access as the other Kubeflow components already do. We could also drop mlpipeline-minio-artifact and the mplipeline-ui artifact proxy from all namespaces. |
reading these prefixes allows reading other peoples artifacts. Then the minio UI is a bit more cumbersome, but we want security.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
must only show your own artifacts. |
The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the kubeflow org. You can then trigger verification by writing
|
/retest |
We also have to think about switching the prefixes to /artifacts, /namespace/artifacts and /shared only to make it easier for users and graphical s3 browsers, but that would mean that we have to prevent usernames that collide with other top level folders e.g. "pipelines". |
@zijianjoy, @chensun this is an important feature especially in enterprise contexts. Can we push this a bit? |
@lehrig feel free to join our security wg meetings and push this in the KFP meetings as well. We have to discuss a lot. |
@lehrig i am thinking of providing this PR as overlay in the manifests repository. |
Very nice changes @juliusvonkohout . Here's a few of my personal thoughts on the proposed changes: The PR includes many valuable changes; however, it introduces too many changes in a single PR. Due to the breaking changes and some alterations that require further discussion, several small fixes have remained unresolved for over two years, even though they are not blocked for any specific reason.
I might be able to contribute the profile controller we have created based on this PR if you are interested. Please let me know if you prefer it as a PR to https://github.com/juliusvonkohout/pipelines. I will need to discuss this with my organization first, though. |
@AndersBennedsgaard are you in #kubeflow-platform on the new slack? We are considering apache ozone as well as minio replacement. I think
as done in this PR here is the first step. Please create a separate PR for that and link to this PR here and the corresponding issue #4649 and tag @rimolive as well. |
@AndersBennedsgaard i just added you to https://github.com/juliusvonkohout/pipelines as collaborator. |
Yes, I am part of the new Slack channels. We will most likely not adopt to using Apache Ozone in our project since we already use MinIO in other projects, so I will probably not be able to contribute with much in that regard. But I might take a look at some of the other stuff if I get approval on my end. |
Fixes #4649
Due to newly discovered functionality in argo we will not use multiple buckets but just multiple folders and the following iam policy. Furthermore e.g. AWS has a limit of 100 buckets so we should definitely stick with 1 bucket. This is a follow up of #7406
By the way I am open to change the folder structure etc. as you prefer.
One bucket multiple folder namespace isolation
change the workflow-controller configmap by adding workflow.namespace to
keyFormat: "private-artifacts/{{workflow.namespace}}/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
We do not need the per namespace artifactrepositories configmaps anymore, but users can still use them to change the default artifact storage defined above.
We still need to create a minio user in sync.py and and save the individual credentials to the per namespace secret. But we now need only one policy although this policy must still be attached to every user. So just some small changes in sync.py. We do not need to create additional buckets anymore
We exclude
"arn:aws:s3:::%s/pipelines/*" % shared_bucket_name, # shared pipelines
because the the apiserver, not the user should manage pipeline definitions on minio. Hopefully @zijianjoy will remove them entirely from MinIO soon.So changing the environment variable
pipelines/backend/src/apiserver/client_manager.go
Line 43 in 49cdb42
We can revert to the upstream images for apiserver, persistenceagent etc. Only ml-pipeline-ui changes from WIP namespace isolation for artifacts and pipeline definitions #7406 must be kept for managing namespaced pipeline definitions in the UI, but this can be done with another PR. The pipeline-ui from the PR 7406 does not need to be changed since it uses the apiserver properly according to gabors changes in frontend/src/pages/PipelineList.tsx . The proposal from arrikto or feat(frontend, sdk): towards namespaced pipelines. Part of #4197 #7447 could also be used.
i removed the DEPRECATED (https://www.kubeflow.org/docs/components/pipelines/sdk/python-based-visualizations/) visualizationserver (ml-pipeline-visualizationserver). Please use the supported stuff and "viewers" https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/
I added a network policy that complements https://github.com/kubeflow/manifests/tree/master/contrib/networkpolicies
I added a fix for [bug] pod is forbidden: failed quota: kf-resource-quota: must specify cpu,memory #7699
I added the pod default such that on jupyterlab creation one can enable easy authentication to KFP according to https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode
I added a kubernetes NetworkAttachmentDefinition for istio-cni. This will be necessary to run Kubeflow completly rootless (as i am already doing with istio-cni and without NET_ADMIN and NET_RAW capabilities) on Openshift https://istio.io/latest/docs/setup/platform-setup/openshift/#additional-requirements-for-the-application-namespace . Vanilla Kubernetes might not need this, but it does no harm.
There is also a security issue in the UI server because i can read other peoples artifacts just by removing
=namespace=xxx
from the end of the artifact URL. Also there an attacker has to guess the right file names.The lifecycle policy that i have added works wonderfully with the minio storage backend and proper cache settings (feat(frontend): caching may now be disabled when starting pipeline runs #8177) to delete old cache artifacts after 30 days.
Checklist: