[BUG] [SPARK] Unintended Rewrite of Other Partitions During Partition-Level Delta Table Update #3054
Open
2 of 8 tasks
Labels
bug
Something isn't working
Bug
Which Delta project/connector is this regarding?
Describe the problem
We have a Delta table partitioned by the country column. When running a Delta table update job to update and insert new data for a specific partition value, e.g., country = 'c1', we observed that the data in other partitions is also being rewritten, effectively rewriting the entire table, including partitions that should remain untouched.
Earlier too we were using the same DeltaMergeBuilder construct(delta version: 2.1.0) except the whenNotMatchedBySource clause and that used to work as per expectations.
(originally reported on: https://delta-users.slack.com/archives/CJ70UCSHM/p1714990170284469)
Steps to reproduce
This is the DeltaMergeBuilder I am using. Just to put into context my updates dataframe only contains data for the single country partition I am processing
Observed results
The delta log json when using whenNotMatchedBySourceDelete
The delta log json when using whenNotMatchedBySourceDelete is not used (commented out)
Expected results
Only the partition being updated should be rewritten instead of the whole table
Environment information
Delta Lake version: io.delta:delta-core_2.12:2.3.0
Spark version: 3.3.1
Scala version: 2.12
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: