Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we used minio as S3 compatible for apache iceberg #6

Open
zainal-abidin-assegaf opened this issue May 17, 2021 · 5 comments
Open
Labels
enhancement New feature or request
Projects

Comments

@zainal-abidin-assegaf
Copy link

Is your feature request related to a problem? Please describe.
Can we used minio as S3 compatible for apache iceberg

Describe the solution you'd like
Can we used minio as S3 compatible for apache iceberg

Describe alternatives you've considered
If we can use minio, need the steps to configure minio with cuelake

Additional context
Can we used minio as S3 compatible for apache iceberg

@vikrantcue
Copy link
Contributor

We have not used Minio yet, but as we can see in the Minio documentation that it is compatible with S3 APIs and also configurable with Spark applications, so it should work fine with Cuelake as well.

Steps for custom configurations will be updated soon, we are still figuring out the best way to support custom configurations.

Will keep this issue open until we update the documentation for custom configurations like this.

@sachinkbansal sachinkbansal added the enhancement New feature or request label May 17, 2021
@zainal-abidin-assegaf
Copy link
Author

zainal-abidin-assegaf commented May 17, 2021

if We can use minio with cuelake, what about AWS glue. Can we use Hive Metastore ??

@zainal-abidin-assegaf
Copy link
Author

zainal-abidin-assegaf commented May 17, 2021

We still confused how zeppelin connect to spark cluster ??
Are we just deploy spark with cuelake namespace is enough ??
Or maybe we can predefined :

  • spark master endpoint
  • redis endpoint
  • minio endpoint
    in the configmap ??

@vikrantcue
Copy link
Contributor

if We can use minio with cuelake, what about AWS glue. Can we use Hive Metastore ??

Yes, you can use both AWS Glue and Hive as metastore for Iceberg.

Cuelake's default configuration is hive metastore with postgres as backend database.

We still confused how zeppelin connect to spark cluster ??
Are we just deploy spark with cuelake namespace is enough ??
Or maybe we can predefined :

  • spark master endpoint
  • redis endpoint
  • minio endpoint
    in the configmap ??
  1. Spark driver and executors are created by Zeppelin when any notebook is run, hence the spark master endpoint is set to k8s://https://kubernetes.default.svc
  2. Redis in cuelake is being used for maintaining celery jobs queue and the endpoint is set by default as http://redis:6379
  3. Minio endpoint can be passed as spark config. Spark config can be configured via either zeppelin interpreter settings or defined in a notebook before starting the interpreter. Not so sure about this as we haven't tested Minio yet.

@zainal-abidin-assegaf
Copy link
Author

@vikrantcue , thank you for your confirmation.

Looking forward to hear update for minio sucessful integration and tested with cuelake,

If it possible, minio endpoint config in the configmap for better user experience,

Cuelake can be one of the fastest etl/elt due to spark cluster and iceberg with object storage,

We are looking forward for minio update, thank you

vikrantcue added a commit that referenced this issue May 25, 2021
Notebook Objects and edit functionality
@vikrantcue vikrantcue added this to To do in CueLake Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
CueLake
To do
Development

No branches or pull requests

3 participants