Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust,python): add unset tblproperties operation #2075

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

ion-elgreco
Copy link
Collaborator

@ion-elgreco ion-elgreco commented Jan 13, 2024

Description

It adds the unset tblproperties operation.

I'm currently parsing the string value as DeltaConfigKey to check if the properties that are going to be unset are valid. The reason for that is to prevent unsetting of other configurations like delta.constraints etc.

The downside here is that our ConfigKey enum should be up to date, on the other hand we could also check each property against a "protected" list of properties that can only be changed by their respective operation. If a user tries to unset a property that is in this protected list it will raise.

While writing this, I noticed that ChangeDataFeed and AppendOnly are WriterFeatures and part of the DeltaConfigKey, not sure if that can be problematic

@roeap can you give some feedback here on the approach?

I will add the rust tests once it's a bit clear which way want to go :)

Related Issue(s)

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate crate/core labels Jan 13, 2024
@roeap
Copy link
Collaborator

roeap commented Jan 28, 2024

While writing this, I noticed that ChangeDataFeed and AppendOnly are WriterFeatures and part of the DeltaConfigKey, not sure if that can be problematic

@ion-elgreco, not entirely sure if this is what you are referring to, but generally speaking we have to distinguish two things.

  1. If a table supports a feature, which means readers / writers have to adhere to the specs of that feature in the protocol. This is what is reflected via readerFeatures and writerFeatures.

  2. If a feature is enabled - this is what is in some cases reflected via table properties - like CDC or deletion vectors.

The downside here is that our ConfigKey enum should be up to date

I expect that the configuration keys are eventually up-streamed to kernel, where they always have to be valid anyhow. A related question would be if we want to allw setting things that delta-rs does not yet support. E.g. setting delta.deletionVecotrsEnabled to true would make the table un-readable for us right now. (though hopefully not for too long :))

@ion-elgreco
Copy link
Collaborator Author

@roeap on your last question I think we should allow that because I can think of usecases where users quickly adjust table features and properties with delta-rs to then just write with spark-delta.

Back on the main point of my note, what order is done to check for example whether CDC is enabled below table feature protocol and above?

Let's take this example:

If we are below V7 protocol we only check the config so we can safely unset the property.

But what if you have a table at V6 with enableChangeDataFeed=true, then we upgrade to V7. Will that config setting be translated into a CDC writer feature enabled?

Also what if we have a table with table feature CDC enabled, then I assume any setting or unsetting the table property: enableChangeDataFeed will not affect it

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few small comments.

crates/core/src/operations/unset_tbl_properties.rs Outdated Show resolved Hide resolved
crates/core/src/operations/unset_tbl_properties.rs Outdated Show resolved Hide resolved
crates/core/src/protocol/mod.rs Outdated Show resolved Hide resolved
crates/core/src/operations/unset_tbl_properties.rs Outdated Show resolved Hide resolved
Comment on lines 46 to 49
pub fn with_property(mut self, name: String) -> Self {
match self.properties {
Some(mut vec_name) => {
vec_name.push(name);
self.properties = Some(vec_name);
}
None => self.properties = Some(vec![name]),
}
self
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is was a little bit unsure what behaviour I would expect. From the name, my first instinct was that this would set the property, not append it to any existing ones.

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I was thinking people could either chain multiple .with_property().with_property() to add the configkeys or they use the with_properties to pass a Vec.

crates/core/src/operations/unset_tbl_properties.rs Outdated Show resolved Hide resolved
@ion-elgreco ion-elgreco marked this pull request as draft March 5, 2024 19:53
@ion-elgreco
Copy link
Collaborator Author

Putting this in draft until I address some properties that are apparently not allowed to be unset :)

@ion-elgreco ion-elgreco marked this pull request as ready for review March 5, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add UNSET TBLPROPERTIES
2 participants