Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add deltaOps set metadata operation #2474

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HawaiianSpork
Copy link

Description

Allow for the explicit changing of the metadata of a delta table. This allows for simple schema migrations like changing the metadata of a column or adding new nullable columns. The code doesn't currently do any checks that the table would still be readable after changing the metadata. The setMetadata operation is similar to mergeSchema but doesn't require a write at the same time so it can be run and tested as part of a deployment instead of on the next write of data.

Note: you used to be able to do this by recalling DeltaOps::create with overwrite on an existing table but since that was recently fixed to delete old data this allows for recreating that original behavior.

Allow for the changing of the metadata of a delta table.  This allows for simple schema migrations like changing the metadata of a column or adding new nullable columns.

Note: you used to be able to do this by recalling DeltaOps create with overwrite on an existing table but since that was recently fixed to delete old data this allows for recreating that original behavior.
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label May 2, 2024
Copy link

github-actions bot commented May 2, 2024

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@HawaiianSpork HawaiianSpork changed the title feat: Add deltaOps set metadata operation feat: add deltaOps set metadata operation May 2, 2024
@ion-elgreco
Copy link
Collaborator

Unfortunately it isn't that simple. If you do it like this you could put the table in an invalid state because the metadata contains schema, partitionColumns and configuration. For each one of them you need to do many checks before you can change it.

For the configuration part I have 2 PRs open: #2264 #2075

For partitionColumns, you can't change that, at this point we don't allow evolving the partition columns of a table. And with respect to schema evolution or changes to it. That all needs to go into operations such as ALTER table DROP COLUMN, ALTER table ADD COLUMN

@HawaiianSpork
Copy link
Author

Unfortunately it isn't that simple. If you do it like this you could put the table in an invalid state because the metadata contains schema, partitionColumns and configuration. For each one of them you need to do many checks before you can change it.

For the configuration part I have 2 PRs open: #2264 #2075

For partitionColumns, you can't change that, at this point we don't allow evolving the partition columns of a table. And with respect to schema evolution or changes to it. That all needs to go into operations such as ALTER table DROP COLUMN, ALTER table ADD COLUMN

Thank you @ion-elgreco , I was not aware that you had added support for setting table properties with #2264. If this operation added more checking that the old and new metadata were compatible would that be acceptable? ADD COLUMN feature would be great but is missing the ability to modify existing columns (to add nested fields to structs) that I would like to use.

@ion-elgreco
Copy link
Collaborator

@HawaiianSpork I don't see how you wouldn't be able to add a nested field in a struct column with ADD COLUMN

I think it's still safe since you add something. But probably good to verify what happens when you read two parquet with partially different struct schema

@HawaiianSpork
Copy link
Author

Good point, I had assumed ADD COLUMN only worked top level columns but at least in the Spark world nested columns are supported. So I guess I have to add ADD COLUMN support to delta-rs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants