Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenHouse on DeltaLake Table Format [Feature] #39

Open
2 of 8 tasks
HotSushi opened this issue Mar 3, 2024 · 0 comments
Open
2 of 8 tasks

OpenHouse on DeltaLake Table Format [Feature] #39

HotSushi opened this issue Mar 3, 2024 · 0 comments
Labels
feat New feature or request

Comments

@HotSushi
Copy link
Collaborator

HotSushi commented Mar 3, 2024

Willingness to contribute

{"label"=>"Yes. I would be willing to contribute a fix for this bug with guidance from the OpenHouse community.", "value"=>"with_guidance"}

Feature Request Proposal

OpenHouse has a versatile design that can supports different table formats. It already works with Apache Iceberg and can be extended to others like DeltaLake. A proof of concept Draft PR demonstrates this capability.

This feature request proposes to enable OpenHouse to support DeltaLake. This will enable you to access OpenHouse's features such as Declarative Table Management, Autonomous Data Services, and Secure Table Sharing for the open-source Delta format.

Motivation

What is the use case for this feature?

Enriching Delta tables with the OH control plane unveils a comprehensive set of features that can
elevate open-source Delta Lake to a fully managed data lake solution. The Feature-set includes:

  • Declarative table management: Declaratively specify the table policies (retention,
    replication, sharing) using SQL APIs.
  • Automatic data services: Keeps the tables in managed (e.g., retention, replication),
    optimal (e.g., storage compaction, sorting, clustering) and compliant state (e.g., GDPR,
    DMA).
  • Secure table sharing: Provides a way to securely share the tables, with built in role-based
    access control for table operations.
  • Data quality enforcement: Provides a gateway to enforce data quality constraints,
    governance rules, and data modeling standards.

What component(s) does this feature affect?

  • Table Service: This is the RESTful catalog service that stores table metadata. :services:tables
  • Jobs Service: This is the job orchestrator that submits data services for table maintenance. :services:jobs
  • Data Services: This is the jobs that performs table maintenance. apps:spark
  • Iceberg internal catalog: This is the internal Iceberg catalog for OpenHouse Catalog Service. :iceberg:openhouse
  • Spark Client Integration: This is the Apache Spark integration for OpenHouse catalog. :integration:spark
  • Documentation: This is the documentation for OpenHouse. docs
  • Local Docker: This is the local Docker environment for OpenHouse. infra/recipes/docker-compose
  • Other: Please specify the component.

Details

In order to learn more about this feature request, please give the following whitepaper a read. OpenHouse On Delta: Control Plane Unlocked!

A draft PR is filed with title: Adding Delta Table Format to OpenHouse, a proof-of-concept

@HotSushi HotSushi added the feat New feature or request label Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant