Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adopt kernel schema types #2495

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented May 9, 2024

Description

First pass adopting delta_kernel in delta-rs.

depends on delta-incubator/delta-kernel-rs#189 being merged and released.

This PR focusses on the schema types. Adopting the action types would be a follow up, but might have a even greater blast radius then this one.

Related Issue(s)

part of #2489

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels May 9, 2024
ArrowDataType::Decimal128(p, s) => {
Ok(DataType::Primitive(PrimitiveType::Decimal(*p, *s)))
}
ArrowDataType::Decimal256(p, s) => DataType::decimal(*p, *s).map_err(|_| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arrow type is missing in kernel, I think that should be upstreamed there as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, given that precision/scale are limited to 38, does that make sense? not entirely sure anymore but I think the 256 bit type only makes sense for larger p/s values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly for convenience. Users might have their source data using decimal256, but still precision/scale below 38.

@@ -1458,7 +1458,7 @@ def test_invalid_decimals(tmp_path: pathlib.Path, engine):

with pytest.raises(
SchemaMismatchError,
match=re.escape("Invalid data type for Delta Lake: decimal(39,1)"),
match=re.escape("Invalid data type for Delta Lake: Decimal256(39, 1)"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is a bit more significant then meets the eye - i.e. we are no longer trying to convert 256 bit decimals to a compliant type in delta. main reason being, for precisions that fit into 128 bit decimals, the user should really be using these, if the larger type is required, we cannot store it in the table anyhow.

Comment on lines +187 to +211
Struct(fields) => {
let struct_fields = fields
.iter()
.flat_map(|f| TryFrom::try_from(f.as_ref()))
.collect::<Vec<_>>();
let values = arr
.as_any()
.downcast_ref::<StructArray>()
.and_then(|struct_arr| {
struct_fields
.iter()
.map(|f: &StructField| {
struct_arr
.column_by_name(f.name())
.and_then(|c| Self::from_array(c.as_ref(), index))
})
.collect::<Option<Vec<_>>>()
})?;
if struct_fields.len() != values.len() {
return None;
}
Some(Self::Struct(
StructData::try_new(struct_fields, values).ok()?,
))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich - here is an example of creating a struct scalar in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants