Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why in BucketClass CRD, namespacePolicy.multi.writeResource is string #1150

Open
Alansyf opened this issue Jul 4, 2023 · 13 comments
Open

why in BucketClass CRD, namespacePolicy.multi.writeResource is string #1150

Alansyf opened this issue Jul 4, 2023 · 13 comments
Labels
enhancement New feature or request

Comments

@Alansyf
Copy link

Alansyf commented Jul 4, 2023

Environment info

  • NooBaa Operator Version: 5.11.0
  • Platform: minikube 1.29.0

Actual behavior

  1. In here it is a string.

Expected behavior

  1. Why can't be array as readResources
    Then i can specify more namespacestore to allow write access.
spec:
  namespacePolicy:
    type: Multi
    multi:
      writeResource: alansun-servicenow
      readResources:
      - alansun-main
@nimrod-becker
Copy link
Contributor

At this point, we support a single write target for an NS bucket.
We'll be happy to help with reviewing a PR if you have something and you want it in to enable multiple write targets

@Alansyf
Copy link
Author

Alansyf commented Jul 4, 2023

With pleasure.
Meanwhile, is there any other existing way I can enable such.

We are using one object storage as single endpoint serving 200+ data scientists, data volume might be 10TB.
Around 100 buckets will be created for hosting, so different data scientist may have different priviledges to read from multi buckets, as well as write to multi buckets, depends on business case.

Currently after reading noobaa CRD and prototyping, it looks like NamespaceStore + BucketClass + OBC fit our needs, except this multi write policy.

@Alansyf
Copy link
Author

Alansyf commented Jul 6, 2023

Hi @nimrod-becker ,
Do we have any document explain how to setup local development env in detail?

@nimrod-becker
Copy link
Contributor

Regarding local env, @dannyzaken @romayalon do we have something updated?

Regarding the original question, you can set up replication between 2 NS buckets, this way you will achieve the same result of having the objects written to bucket1 also be in bucket2 (with a replication rule of bucket1 -> bucket2)

@dannyzaken
Copy link
Contributor

@Alansyf by local development env, do you mean not in Kuberentes? let me know if this helps https://github.com/noobaa/noobaa-core/blob/master/docs/standalone.md

@nimrod-becker
Copy link
Contributor

Thinking about it further and consulting the team, I think that replication is the better way to go.
Consider a temporary (or not-so-temporary) access issue to one of the write resources, would you want the write to fail? guaranteed to be copied from the other write resources eventually? It sounds a lot like bucket replication...

@nimrod-becker nimrod-becker added the enhancement New feature or request label Jul 17, 2023
@dannyzaken
Copy link
Contributor

dannyzaken commented Jul 17, 2023

@Alansyf I have a few questions regarding the multiple write targets:

  • What is the use case for that?
  • What is the expected behavior?
  • Assuming the expectation is that when writing to the NS_Bucket, the object will be written to all write targets, what is the expected behavior when one of the writes fails?

@Alansyf
Copy link
Author

Alansyf commented Jul 17, 2023

Hi @dannyzaken ,

We think when I say

multi:
      writeResource: ns1
      readResources:
      - ns2
      - ns3

this means, this bucket policy allow user:

 write to ns1
 read from ns2 ns3

Now, i want allow user:

write to ns1, ns4 
read from ns2, ns3

I am NOT meaning writeResources: ns1, ns4 --> write to both buckets;
We are just seeking a place to define the allow access.

Our use case is, we are building a datalake platform which hosting around 8TB customer's data on storage. These 8TB data are separated by business.
Make it simple, let's say 8TB data coming from marking team, sales team, supply chain team etc.

Around 500 data scientist will use this datalake, but they are coming from different team.
Someone will ingest data into supply chain team bucket, but he will needs read data from marking team and sales team. And vice verse, someone might want to write to marking and sales team buckets, but just read data from supply chain bucket.

Thus somewhere we need to define the access. We choose the gateway to define policy is because we have some technical difficulty / limitation from our internal S3-compatible storage.

You can imaging what we need is a way to define something like, user / group/ roles(READ/WRITE) in database.

@Alansyf
Copy link
Author

Alansyf commented Jul 18, 2023

Hi @dannyzaken ,

Also we tried the new way but also got problem as reported in #1150.

Please let us know how we can move forward.

Our plan to use noobaa is completed blocked due to not find solution for access control yet.

@Alansyf
Copy link
Author

Alansyf commented Jul 18, 2023

Hi @dannyzaken ,
Sorry for chasing, we really want to find a way to define access control very similar to "user / group / roles" concept in database. Allow user to read from different tables, allow user to write to different tables etc.

Can you please share anything?

@dannyzaken
Copy link
Contributor

@Alansyf, Namespace buckets are not a mechanism to provide access control for external data sources. The main purpose of it (at least for multi-NS) is to aggregate multiple data source and present it as a single readable\writeable bucket.

so, in your example:

multi:
      writeResource: ns1
      readResources:
      - ns2
      - ns3

you get a S3 bucket (let's call it foo) that "contains" all the objects in ns1, ns2, ns3, and the user can read it by performing getObject on foo. the user can also write new objects to foo and it will be actually written to ns1 (the data is not stored in NooBaa).
by default, access to foo is given to the bucket owner (creator). you can provide access to other accounts (that have their own access\secret key) by setting a bucket policy on the bucket. (see AWS docs. NooBaa has partial support for the different bucket policy options, and we keep adding support over time).

you can also create a namespace bucket (bar), with the following setup:

multi:
      writeResource: ""
      readResources:
      - ns1
      - ns2
      - ns3

bar will "contain" the same objects as foo, but it is not writeable, so you can grant access to bar for users that needs only read access. makes sense?

as for the second example

write to ns1, ns4 
read from ns2, ns3

I'm trying to understand what you need. the bucket in this example will "contain" the objects in ns1, ns2, ns3, ns4, so a client can read through it. but what should happen on writes?

@Alansyf
Copy link
Author

Alansyf commented Jul 18, 2023

@dannyzaken , I draw a diagram and hope you can get what i need.

image

So in the diagram you can see, our py code will connect to your endpoint providing a secrets pair.
The question is how I can achieve above, how can i create an account allows me
write: coll-1-ns, coll-2-ns
read: coll-1-ns, coll-2-ns, coll-4-ns

@Alansyf
Copy link
Author

Alansyf commented Jul 20, 2023

Hi @dannyzaken , if you have any chance to take a look on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants