Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse the impact of Delete operation in Qbeast Index #327

Open
4 tasks
osopardo1 opened this issue May 7, 2024 · 0 comments
Open
4 tasks

Analyse the impact of Delete operation in Qbeast Index #327

osopardo1 opened this issue May 7, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@osopardo1
Copy link
Member

This issue is to clarify the status of Delete Operation in Qbeast Spark library and which are the further steps on the roadmap.

DELETE is a basic Data Management operation supported in all Open Table Formats (Delta, Iceberg, and Hudi). It allows the removal of specific rows from a Table and can usually can be done in 2 strategies:

  • Merge On Read. The rows are marked as deleted and are discarded at read time.
  • Copy on Write. The files where the records are placed would be deleted and the data is rewritten again without the removed records.

As a consequence of interoperability between Formats and Qbeast, this operation can be executed through Delta's interface.

dt = delta.DeltaTable.forPath(spark, "tmp/qbeast-table")
dt.delete(F.col("age") > 75)

As a default strategy, Delta would use Copy on Write mechanism: delete files and add new ones. Deleting files means that the AddFile entry with the corresponding Qbeast Metadata would no longer be available in the Snapshot, and the newly written file would neither contain the appropriate tags to rebuild the OTree.

Or, in other words: the operation could potentially harm the index structure.

Things to do:

  • Add an entry in the Documentation that addresses the current limitations.
  • Analyze the impact of missing blocks.
  • Analyze the impact of missing cubes.
  • Develop a mechanism to maintain a correct structure even if some files are missing OR develop a mechanism to ensure deletes maintain the index in a correct shape.
@osopardo1 osopardo1 added the enhancement New feature or request label May 7, 2024
@osopardo1 osopardo1 changed the title Analyse the impact of Delete operation in Qbeast OTree Index Analyse the impact of Delete operation in Qbeast Index May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant