feat: adopt kernel schema types #2495

roeap · 2024-05-09T14:11:01Z

Description

First pass adopting delta_kernel in delta-rs.

depends on delta-incubator/delta-kernel-rs#189 being merged and released.

This PR focusses on the schema types. Adopting the action types would be a follow up, but might have a even greater blast radius then this one.

Related Issue(s)

part of #2489

Documentation

ion-elgreco · 2024-05-09T14:17:50Z

crates/core/src/kernel/arrow/mod.rs

-            ArrowDataType::Decimal128(p, s) => {
-                Ok(DataType::Primitive(PrimitiveType::Decimal(*p, *s)))
-            }
-            ArrowDataType::Decimal256(p, s) => DataType::decimal(*p, *s).map_err(|_| {


This arrow type is missing in kernel, I think that should be upstreamed there as well.

hmm, given that precision/scale are limited to 38, does that make sense? not entirely sure anymore but I think the 256 bit type only makes sense for larger p/s values.

It's mostly for convenience. Users might have their source data using decimal256, but still precision/scale below 38.

roeap · 2024-05-24T14:01:33Z

python/tests/test_writer.py

@@ -1458,7 +1458,7 @@ def test_invalid_decimals(tmp_path: pathlib.Path, engine):

    with pytest.raises(
        SchemaMismatchError,
-        match=re.escape("Invalid data type for Delta Lake: decimal(39,1)"),
+        match=re.escape("Invalid data type for Delta Lake: Decimal256(39, 1)"),


this change is a bit more significant then meets the eye - i.e. we are no longer trying to convert 256 bit decimals to a compliant type in delta. main reason being, for precisions that fit into 128 bit decimals, the user should really be using these, if the larger type is required, we cannot store it in the table anyhow.

roeap · 2024-05-24T19:42:38Z

crates/core/src/kernel/scalars.rs

+            Struct(fields) => {
+                let struct_fields = fields
+                    .iter()
+                    .flat_map(|f| TryFrom::try_from(f.as_ref()))
+                    .collect::<Vec<_>>();
+                let values = arr
+                    .as_any()
+                    .downcast_ref::<StructArray>()
+                    .and_then(|struct_arr| {
+                        struct_fields
+                            .iter()
+                            .map(|f: &StructField| {
+                                struct_arr
+                                    .column_by_name(f.name())
+                                    .and_then(|c| Self::from_array(c.as_ref(), index))
+                            })
+                            .collect::<Option<Vec<_>>>()
+                    })?;
+                if struct_fields.len() != values.len() {
+                    return None;
+                }
+                Some(Self::Struct(
+                    StructData::try_new(struct_fields, values).ok()?,
+                ))
+            }


@scovich - here is an example of creating a struct scalar in the wild.

github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels May 9, 2024

ion-elgreco reviewed May 9, 2024

View reviewed changes

roeap added 3 commits May 24, 2024 01:52

feat: adopt kernel schema types

6d08ba3

fix: remove tests upstreamed to kernel

2124e82

fix: test cleanup

13e3ac8

roeap force-pushed the feature/kernelize branch from cd589f9 to 13e3ac8 Compare May 24, 2024 07:37

roeap added 3 commits May 24, 2024 13:29

feat: adopt more kernel

0d7c1c5

fix: bing back python expresssions

29d1aec

fix: python tests

3853eaa

roeap commented May 24, 2024

View reviewed changes

fix: update to ScalarData

022f95c

roeap commented May 24, 2024

View reviewed changes

fix: convert tests

4ab555d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adopt kernel schema types #2495

feat: adopt kernel schema types #2495

roeap commented May 9, 2024 •

edited

ion-elgreco May 9, 2024

roeap May 9, 2024

ion-elgreco May 9, 2024

roeap May 24, 2024

roeap May 24, 2024

feat: adopt kernel schema types #2495

Are you sure you want to change the base?

feat: adopt kernel schema types #2495

Conversation

roeap commented May 9, 2024 • edited

Description

Related Issue(s)

Documentation

ion-elgreco May 9, 2024

Choose a reason for hiding this comment

roeap May 9, 2024

Choose a reason for hiding this comment

ion-elgreco May 9, 2024

Choose a reason for hiding this comment

roeap May 24, 2024

Choose a reason for hiding this comment

roeap May 24, 2024

Choose a reason for hiding this comment

roeap commented May 9, 2024 •

edited