Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder #1072

norbitek · 2022-04-15T07:15:58Z

Last version of Databricks added support for identity column in Delta table.
It is possible to define GENERATED ALWAYS AS IDENTITY in column specification.

It would be nice to do the same using DeltaTableBuilder for example:

DeltaTable.create(spark)
.tableName("default.people10m")
.addColumn("id", "BIGINT", generatedAlwaysAs="IDENTITY(START WITH 10 INCREMENT BY 10)")
.addColumn("firstName", "STRING")
.addColumn("middleName", "STRING")
.addColumn("lastName", "STRING", comment = "surname")
.addColumn("gender", "STRING")
.addColumn("birthDate", "TIMESTAMP")
.addColumn("dateOfBirth", DateType(), generatedAlwaysAs="CAST(birthDate AS DATE)")
.addColumn("ssn", "STRING")
.addColumn("salary", "INT")
.partitionedBy("gender")
.execute()

allisonport-db · 2022-04-15T20:51:45Z

Hi @norbitek thanks for opening this issue. This is definitely in the plan for Delta Lake but we're currently prioritizing other features on the roadmap #920 like OPTIMIZE ZORDER and CDF.

keen85 · 2022-08-12T08:53:17Z

@norbitek, it's on the roadmap for 2022 H2 🥳
#1307

wedesoft · 2022-09-30T15:05:41Z

Tried to add a generated column using SQL. So I understand it is not supported yet in pyspark?

zsxwing · 2022-09-30T15:18:45Z

@wedesoft Spark doesn't support it yet. The sql syntax supported for GENERATED COLUMN is tracked by #1100

jasperp97 · 2023-05-15T16:06:14Z

Is this still on the roadmap?

thebaz73 · 2023-10-10T13:44:51Z

Any news on this issue status?

shahkalpan07 · 2023-11-05T04:10:09Z

Any update on release date ?

bart-samwel · 2023-11-10T14:39:44Z

This is definitely still on the roadmap! However, at the moment all the focus is on completing Deletion Vectors, which is in high demand. We will only get to this item after that work is complete.

keen85 · 2024-02-07T16:17:14Z

Since Delta Lake 3.1.0 (with deletion vectors) is out now, would you consider working on it for 3.2, @bart-samwel 😇

bart-samwel · 2024-02-08T09:16:22Z

@keen85

Since Delta Lake 3.1.0 (with deletion vectors) is out now, would you consider working on it for 3.2

Thank you for the reminder! It is near the top of our list now. I can't make any hard guarantees, but I'm hopeful that we'll get to this pretty soon.

norbitek · 2024-02-08T09:23:01Z

@bart-samwel
What is the reason that features in Standalone version are implemented with such big latency?
Does it means that for every new features (like for example liquid clustering) we will wait for about 2 years?

bart-samwel · 2024-02-08T10:12:58Z

@norbitek

What is the reason that features in Standalone version are implemented with such big latency?

Just to make sure there's no confusion here: Delta Standalone is different from the Spark connector for of Delta Lake. Standalone is a library that can be used to implement connectors for non-Spark systems, and it is not really getting the new features anymore -- its design is not really suitable to support many of the new features easily. All of the new efforts are going into Delta Kernel, which is the new library for building connectors. It makes it a lot easier to keep up with new features, and we intend to keep it up to date.

Identity columns is a feature where we have unfortunately dropped the ball even for support in the Spark connector. It's the exception though, not the rule!

Does it means that for every new features (like for example liquid clustering) we will wait for about 2 years?

Certainly not! Like I said, identity columns is an exception. Liquid clustering is actually released in Delta Lake 3.1 which came out last week! https://github.com/delta-io/delta/releases

SYOGESH045 · 2024-05-26T14:15:11Z

Hi, currently in my company, I'm not using Spark SQL anywhere. Here I wanted to utilize DeltaTableBuilderAPI. So wanted to ask whether is this resolved, if no, when will we get this update?

Many thanks,
Yogesh S

norbitek added the enhancement New feature or request label Apr 15, 2022

nkarpov mentioned this issue Aug 16, 2022

Roadmap 2022 H2 (discussion) #1307

Open

zsxwing mentioned this issue Sep 26, 2022

The schema of your delta table has changed in an incompatible way since your dataframe or deltatable object was created. please redefine your dataframe or deltatable object. #689

Closed

allisonport-db mentioned this issue Nov 8, 2022

[Feature Request] Support additional generation expressions for automatic data skipping #1442

Open

felipepessoto mentioned this issue Apr 6, 2023

[Feature Request] SQL syntax for GENERATED columns in OSS #1100

Open

keen85 mentioned this issue Feb 8, 2024

[Feature Request] Identity Column #1959

Open

5 tasks

c27kwan linked a pull request May 3, 2024 that will close this issue

[WIP][Spark] Python DeltaTableBuilder API for Identity Columns #3044

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder #1072

Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder #1072

norbitek commented Apr 15, 2022

allisonport-db commented Apr 15, 2022

keen85 commented Aug 12, 2022

wedesoft commented Sep 30, 2022

zsxwing commented Sep 30, 2022

jasperp97 commented May 15, 2023

thebaz73 commented Oct 10, 2023

shahkalpan07 commented Nov 5, 2023

bart-samwel commented Nov 10, 2023

keen85 commented Feb 7, 2024

bart-samwel commented Feb 8, 2024

norbitek commented Feb 8, 2024

bart-samwel commented Feb 8, 2024 •

edited

SYOGESH045 commented May 26, 2024

Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder #1072

Add support for GENERATED ALWAYS AS IDENTITY in DeltaTableBuilder #1072

Comments

norbitek commented Apr 15, 2022

allisonport-db commented Apr 15, 2022

keen85 commented Aug 12, 2022

wedesoft commented Sep 30, 2022

zsxwing commented Sep 30, 2022

jasperp97 commented May 15, 2023

thebaz73 commented Oct 10, 2023

shahkalpan07 commented Nov 5, 2023

bart-samwel commented Nov 10, 2023

keen85 commented Feb 7, 2024

bart-samwel commented Feb 8, 2024

norbitek commented Feb 8, 2024

bart-samwel commented Feb 8, 2024 • edited

SYOGESH045 commented May 26, 2024

bart-samwel commented Feb 8, 2024 •

edited