{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":182849188,"defaultBranch":"master","name":"delta","ownerLogin":"delta-io","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2019-04-22T18:56:51.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/49767398?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715901877.0","currentOid":""},"activityList":{"items":[{"before":"ff5b36fbcc3bb894b9a885eaa05338460c8173d6","after":"3cd9529b6a9fcc1fd6d72e2574760b1c622e12bb","ref":"refs/heads/master","pushedAt":"2024-05-24T18:53:35.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Add CREATE TABLE LIKE compatibility with user-provided table properties (#3138)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\n\r\nUser provided properties when performing CREATE LIKE commands were being\r\nignored and only the properties from source table were being added. This\r\nPR adds/overwrites any applicable properties with the user provided\r\nones.\r\n\r\n## How was this patch tested?\r\n\r\n\r\n\r\nUnit tests were created replicating the customer issue for CREATE LIKE\r\ncommands both originating in Delta tables and other formats.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Add CREATE TABLE LIKE compatibility with user-provided table …"}},{"before":"bfb5c94aa818495f35ed007bd4566cd2f7fecf42","after":"ff5b36fbcc3bb894b9a885eaa05338460c8173d6","ref":"refs/heads/master","pushedAt":"2024-05-24T16:34:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Allow type widening for all supported type changes with Spark 4.0 (#3024)\n\nThis PR adds shims to ungate the remaining type changes that only work\r\nwith Spark 4.0 / master. Spark 4.0 contains the required changes to\r\nParquet readers to be able to read the data after applying the type\r\nchanges.\r\n\r\n## Description\r\nExtend the list of supported type changes for type widening to include\r\nchanges that can be supported with Spark 4.0:\r\n- (byte, short, int) -> long\r\n- float -> double\r\n- date -> timestampNTZ\r\n- (byte, short, int) -> double\r\n- decimal -> decimal (with increased precision/scale that doesn't cause\r\nprecision loss)\r\n- (byte, short, int, long) -> decimal\r\n\r\nShims are added to support these changes when compiling against Spark\r\n4.0/master and to only allow `byte` -> `short` - > `int` when compiling\r\nagainst Spark 3.5.\r\n\r\n## How was this patch tested?\r\nAdding test cases for the new type changes in the existing type widening\r\ntest suites. The list of supported / unsupported changes covered in\r\ntests differs between Spark 3.5 and Spark 4.0, shims are also provided\r\nto handle this.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nYes: allow using the listed type changes with type widening, either via\r\n`ALTER TABLE CHANGE COLUMN TYPE` or during schema evolution in MERGE and\r\nINSERT.","shortMessageHtmlLink":"[Spark] Allow type widening for all supported type changes with Spark…"}},{"before":"039a29abb4abc72ac5912651679233dc983398d6","after":"bfb5c94aa818495f35ed007bd4566cd2f7fecf42","ref":"refs/heads/master","pushedAt":"2024-05-24T16:33:34.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Validate the expression in AlterTableAddConstraintDeltaCommand (#3143)\n\n#### Which Delta project/connector is this regarding?\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nThis PR fixes an internal error thrown from\r\n`AlterTableAddConstraintDeltaCommand`. This error is thrown when adding\r\na CHECK constraint with a non-existent column in the expression. The\r\nerror is thrown when we check if the expressions returns a boolean. This\r\nworks correctly for most expressions, but will result in an exception if\r\nthe data type of the unresolved column is checked.\r\n\r\nThis PR fixes this issue by making the analyzer responsible for checking\r\nwhether the expression returns a boolean by wrapping the expression with\r\na `Filter` node.\r\n\r\n## How was this patch tested?\r\n\r\nAdded a test\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nYes, `ALTER TABLE ... ADD CONSTRAINT ... CHECK` will now throw a\r\n`UNRESOLVED_COLUMN` error instead of a `INTERNAL_ERROR` in the case\r\ndescribed above.","shortMessageHtmlLink":"[Spark] Validate the expression in AlterTableAddConstraintDeltaCommand ("}},{"before":"1609c38782f21c06fe3b24b32d0f4c97c3c2c755","after":"039a29abb4abc72ac5912651679233dc983398d6","ref":"refs/heads/master","pushedAt":"2024-05-23T22:54:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Test type widening compatibility with other Delta features (#3053)\n\n## Description\r\nAdditional tests covering type widening and:\r\n- Reading CDF\r\n- Column mapping\r\n- Time travel\r\n- RESTORE\r\n- CLONE\r\n\r\n## How was this patch tested?\r\nTest only","shortMessageHtmlLink":"[Spark] Test type widening compatibility with other Delta features (#…"}},{"before":"b043f5d7c2655c27866d4c33e2255e076f7598a2","after":"1609c38782f21c06fe3b24b32d0f4c97c3c2c755","ref":"refs/heads/master","pushedAt":"2024-05-23T16:06:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Update OptimizeGeneratedSuite to apply constant folding (#3141)\n\n## Description\r\nThe following change in Spark master broke tests in\r\n`OptimizeGeneratedColumnSuite`:\r\nhttps://github.com/apache/spark/commit/7974811218c9fb52ac9d07f8983475a885ada81b\r\n\r\nIt added an execution of the `ConstantFolding` rule after\r\n`PrepareDeltaScan`, causing constant expressions in filters on generated\r\ncolumns to be simplified, which `OptimizeGeneratedColumnSuite` heavily\r\nused.\r\n\r\nThis change:\r\n- updates the expected results in `OptimizeGeneratedColumnSuite` to\r\nsimplify constant expressions\r\n- adds a pass of `ConstantFolding` after `PrepareDeltaScan` so that\r\nDelta on spark 3.5 behaves the same as Delta on spark master.\r\n\r\n## How was this patch tested?\r\nUpdated tests","shortMessageHtmlLink":"[Spark] Update OptimizeGeneratedSuite to apply constant folding (#3141)"}},{"before":"0deef042b18689fd4b73f4b252700dd2f1ab94f8","after":"b043f5d7c2655c27866d4c33e2255e076f7598a2","ref":"refs/heads/master","pushedAt":"2024-05-23T15:57:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Make ManagedCommit a preview feature (#3137)\n\n## Description\r\nRenames the ManagedCommit feature and config names by replacing -dev\r\nwith -preview to indicate that it is in preview phase.\r\n\r\n## How was this patch tested?\r\nNo new tests.","shortMessageHtmlLink":"[Spark] Make ManagedCommit a preview feature (#3137)"}},{"before":"35c7536a70c2d0ba57e140704d3e213e4e75a516","after":"0deef042b18689fd4b73f4b252700dd2f1ab94f8","ref":"refs/heads/master","pushedAt":"2024-05-23T15:45:39.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel][Expressions] Add support for LIKE expression (#3103)\n\n## Description\r\nAdd SQL `LIKE` expression support in Kernel list of supported expressions and a default implementation.\r\n\r\nAddresses part of https://github.com/delta-io/delta/issues/2539 (where `STARTS_WITH` as `LIKE 'str%'`)\r\n\r\n## How was this patch tested?\r\nadded unit tests\r\n\r\nSigned-off-by: Krishnan Paranji Ravi ","shortMessageHtmlLink":"[Kernel][Expressions] Add support for LIKE expression (#3103)"}},{"before":"420d9e059db18845a49b85cb1571752667d39dc6","after":"35c7536a70c2d0ba57e140704d3e213e4e75a516","ref":"refs/heads/master","pushedAt":"2024-05-22T21:58:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[INFRA] Improve the java style checks log the errors to sbt console (#3115)\n\n## Description\r\nResolves #3067.\r\n\r\n## How was this patch tested?\r\nOn local machine, intentionally create checkstyle errors in module\r\n`kernelDefaults` (for experimental), then run the `build/sbt compile`\r\nand `build/sbt kernelDefaults/test`.\r\n\r\nSigned-off-by: Tai Le Manh ","shortMessageHtmlLink":"[INFRA] Improve the java style checks log the errors to sbt console (#…"}},{"before":"a5263cc02b450b16d80b65f358dbf5ff092355bf","after":"420d9e059db18845a49b85cb1571752667d39dc6","ref":"refs/heads/master","pushedAt":"2024-05-22T16:43:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Standalone] Introduce FileAction.tagsOrEmpty (#3132)\n\n## Description\r\nThis PR introduces Introduce `FileAction.tagsOrEmpty` to factor out the\r\ncommon pattern `Option(tags).getOrElse(Map.empty)`.\r\n\r\n## How was this patch tested?\r\nExisting unit tests.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nNo\r\n\r\nSigned-off-by: Sergiu Pocol ","shortMessageHtmlLink":"[Standalone] Introduce FileAction.tagsOrEmpty (#3132)"}},{"before":"d5e9a26195742728ea4693a6abca493c5e6e2241","after":"a5263cc02b450b16d80b65f358dbf5ff092355bf","ref":"refs/heads/master","pushedAt":"2024-05-22T16:42:20.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Standalone] AddFile Long Tags Accessor + Memory Optimization (#3131)\n\n## Description\r\nThis PR introduces AddFile.longTag which factors out the pattern\r\n`tag(...).map(_.toLong)` and also converts the insertion time tag lazy\r\nval to a method in order to save memory.\r\n\r\n## How was this patch tested?\r\nExisting unit tests.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nNo\r\n\r\nSigned-off-by: Sergiu Pocol ","shortMessageHtmlLink":"[Standalone] AddFile Long Tags Accessor + Memory Optimization (#3131)"}},{"before":"0c35eea4100d83040a11417f66016b48c246c466","after":"d5e9a26195742728ea4693a6abca493c5e6e2241","ref":"refs/heads/master","pushedAt":"2024-05-22T16:40:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] DynamoDBCommitOwner: add logging, get dynamic confs from sparkSession (#3130)\n\n## Description\r\nUpdates DynamoDBCommitOwner:\r\n\r\n - Added logging around table creation flow\r\n - Get wcu, rcu, and awsCredentialsProvider from SparkSession\r\n- Return -1 as the table version if registerTable has already been\r\ncalled but no actual commits have gone through the owner. This is done\r\nby tracking an extra flag in DynamoDB.\r\n\r\n## How was this patch tested?\r\nExisting tests\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nYes, introduces new configs (see DeltaSQLConf changes) which can be used\r\nto configure the DynamoDBCommitOwner.","shortMessageHtmlLink":"[Spark] DynamoDBCommitOwner: add logging, get dynamic confs from spar…"}},{"before":"529717bb6f171ff5252e3a913dd3667d53a2095c","after":"0c35eea4100d83040a11417f66016b48c246c466","ref":"refs/heads/master","pushedAt":"2024-05-22T13:07:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Column Mapping DROP FEATURE (#3124)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\nAllow column mapping feature to be dropped.\r\n\r\n```\r\nALTER TABLE DROP FEATURE columnMapping\r\n```\r\nFeature is hidden behind a flag.\r\n\r\n\r\n## How was this patch tested?\r\nnew unit tests\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Column Mapping DROP FEATURE (#3124)"}},{"before":"f2d6c8b4e1ccdd1fcdffc44f87536f1f56408d31","after":"529717bb6f171ff5252e3a913dd3667d53a2095c","ref":"refs/heads/master","pushedAt":"2024-05-22T05:46:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Metadata Cleanup for Unbackfilled Delta Files (#3094)\n\n#### Which Delta project/connector is this regarding?\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nUnbackfilled deltas eligible for deletion:\r\n- Version <= max(backfilled-delta-deleted-versions)\r\n\r\n## How was this patch tested?\r\n\r\nUnit tests\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Metadata Cleanup for Unbackfilled Delta Files (#3094)"}},{"before":"0ee9fd0996e2d34630cd094123f54562211570af","after":"f2d6c8b4e1ccdd1fcdffc44f87536f1f56408d31","ref":"refs/heads/master","pushedAt":"2024-05-22T05:46:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Apply filters pushed down into DeltaCDFRelation (#3127)\n\n#### Which Delta project/connector is this regarding?\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nThis PR modifies `DeltaCDFRelation` to apply the filters that are pushed\r\ndown into this. This enables both partition pruning and row group\r\nskipping to happen when reading the Change Data Feed.\r\n\r\n## How was this patch tested?\r\n\r\nUnit tests\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Apply filters pushed down into DeltaCDFRelation (#3127)"}},{"before":"699df388f977f936a0b2ecc5462a5e811dafb09b","after":"0ee9fd0996e2d34630cd094123f54562211570af","ref":"refs/heads/master","pushedAt":"2024-05-22T01:10:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Skip reading log entries beyond endOffset, if specified while getting file changes for CDC in streaming queries (#3110)\n\n#### Which Delta project/connector is this regarding?\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\nSkip reading log entries beyond endOffset, if specified while getting\r\nfile changes for CDC in streaming queries\r\n\r\n## How was this patch tested?\r\nExisting unit tests\r\n\r\nAlso verified using logs to ensure that additional Delta logs are not\r\nread\r\n\r\n```\r\n\r\n24/05/16 01:21:01 INFO StateStore: StateStore stopped\r\nRun completed in 54 seconds, 237 milliseconds.\r\nTotal number of tests run: 1\r\nSuites: completed 1, aborted 0\r\nTests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0\r\nAll tests passed.\r\n```\r\n\r\nBefore:\r\n```\r\n10457:24/05/16 01:38:37 INFO DeltaSource: [queryId = 199ce] [batchId = 0] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-c3309e79-80cf-4819-8b03-2fc607cc2679/_delta_log with startVersion=0, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"75194ee3-4ff6-431e-8e7c-32006684d5ad\",\"reservoirVersion\":1,\"index\":-1,\"isStartingVersion\":false} took timeMs=52 ms\r\n11114:24/05/16 01:38:39 INFO DeltaSource: [queryId = 199ce] [batchId = 1] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-c3309e79-80cf-4819-8b03-2fc607cc2679/_delta_log with startVersion=1, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"75194ee3-4ff6-431e-8e7c-32006684d5ad\",\"reservoirVersion\":2,\"index\":-1,\"isStartingVersion\":false} took timeMs=25 ms\r\n11518:24/05/16 01:38:39 INFO DeltaSource: [queryId = 199ce] [batchId = 2] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-c3309e79-80cf-4819-8b03-2fc607cc2679/_delta_log with startVersion=2, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"75194ee3-4ff6-431e-8e7c-32006684d5ad\",\"reservoirVersion\":3,\"index\":-1,\"isStartingVersion\":false} took timeMs=24 ms\r\n```\r\n\r\nAfter:\r\n```\r\n10498:24/05/16 01:32:10 INFO DeltaSource: [queryId = ede3f] [batchId = 0] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-270c3d6e-40df-4e6f-b1da-c293af5d6741/_delta_log with startVersion=0, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"516bafe0-e0ea-4380-afcb-44e416302a07\",\"reservoirVersion\":1,\"index\":-1,\"isStartingVersion\":false} took timeMs=39 ms\r\n11155:24/05/16 01:32:11 INFO DeltaSource: [queryId = ede3f] [batchId = 1] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-270c3d6e-40df-4e6f-b1da-c293af5d6741/_delta_log with startVersion=1, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"516bafe0-e0ea-4380-afcb-44e416302a07\",\"reservoirVersion\":2,\"index\":-1,\"isStartingVersion\":false} took timeMs=14 ms\r\n11579:24/05/16 01:32:12 INFO DeltaSource: [queryId = ede3f] [batchId = 2] Getting CDC dataFrame for delta_log_path=file:/tmp/spark-270c3d6e-40df-4e6f-b1da-c293af5d6741/_delta_log with startVersion=2, startIndex=-100, isInitialSnapshot=false, endOffset={\"sourceVersion\":1,\"reservoirId\":\"516bafe0-e0ea-4380-afcb-44e416302a07\",\"reservoirVersion\":3,\"index\":-1,\"isStartingVersion\":false} took timeMs=13 ms\r\n```\r\n\r\nDifference is even more if we are processing/reading through large\r\nnumber of backlog versions.\r\n\r\nIn Cx setup, before the change - batches are taking > 300s. After the\r\nchange, batches complete is < 15s.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nNo","shortMessageHtmlLink":"[Spark] Skip reading log entries beyond endOffset, if specified while…"}},{"before":"9a349e47f93c6a1e80d5cf49d977b89e836b07ff","after":"d62f6b2f789df17f0ae9d1a517652969b0dbbd7d","ref":"refs/heads/branch-3.2","pushedAt":"2024-05-21T22:20:27.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[DOCS] Add page on BigQuery Delta Lake integration (#3123)\n\n\n\n#### Which Delta project/connector is this regarding?\n\n\n- [ ] Spark\n- [ ] Standalone\n- [ ] Flink\n- [ ] Kernel\n- [X] Other (fill in here)\n\n## Description\n\nAdds a page on BigQuery Delta Lake integration to the \"integrations\"\npage.\n\nPage is like this:\n\"Screenshot\n\nAnd it is indexed as part of the integrations page here:\n\"Screenshot\n\n## How was this patch tested?\n\nLocal build.\n\n## Does this PR introduce _any_ user-facing changes?\n\nNo.\n\n---------\n\nCo-authored-by: Tathagata Das \n(cherry picked from commit 699df388f977f936a0b2ecc5462a5e811dafb09b)","shortMessageHtmlLink":"[DOCS] Add page on BigQuery Delta Lake integration (#3123)"}},{"before":"9a0a2826f46418d1a97b289f6c9a756bc42621d3","after":"699df388f977f936a0b2ecc5462a5e811dafb09b","ref":"refs/heads/master","pushedAt":"2024-05-21T20:25:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[DOCS] Add page on BigQuery Delta Lake integration (#3123)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [ ] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [X] Other (fill in here)\r\n\r\n## Description\r\n\r\nAdds a page on BigQuery Delta Lake integration to the \"integrations\"\r\npage.\r\n\r\nPage is like this:\r\n\"Screenshot\r\n\r\nAnd it is indexed as part of the integrations page here:\r\n\"Screenshot\r\n\r\n## How was this patch tested?\r\n\r\nLocal build.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo.\r\n\r\n---------\r\n\r\nCo-authored-by: Tathagata Das ","shortMessageHtmlLink":"[DOCS] Add page on BigQuery Delta Lake integration (#3123)"}},{"before":"d22a4da23970e68def84226b745a7c2310f532b5","after":"9a0a2826f46418d1a97b289f6c9a756bc42621d3","ref":"refs/heads/master","pushedAt":"2024-05-21T20:16:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Enable column mapping removal feature flag (#3114)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\nEnable column mapping removal feature flag to allow user to run \r\n```\r\nALTER TABLE table_name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'none')\r\n```\r\nor\r\n```\r\nALTER TABLE table_name UNSET TBLPROPERTIES ('delta.columnMapping.mode')\r\n```\r\nto remove column mapping property from their table and rewrite physical\r\nfiles to match the logical column names.\r\n\r\nAlso allows column mapping feature to be dropped with\r\n\r\n```\r\nALTER TABLE DROP FEATURE columnMapping\r\n```\r\n\r\n\r\n\r\n## How was this patch tested?\r\nexisting tests\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\nYes\r\n\r\n Allows user to run \r\n```\r\nALTER TABLE table_name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'none')\r\n```\r\nor\r\n```\r\nALTER TABLE table_name UNSET TBLPROPERTIES ('delta.columnMapping.mode')\r\n```\r\nto remove column mapping from a Delta table.\r\nPreviously, this commands would not run successfully and would return an\r\nexception stating such an operation is prohibited.\r\n\r\nAlso allows column mapping feature to be dropped with\r\n\r\n```\r\nALTER TABLE DROP FEATURE columnMapping\r\n```","shortMessageHtmlLink":"[Spark] Enable column mapping removal feature flag (#3114)"}},{"before":"063c71d99ece90081caf20d10cdb5ccd63f3c27c","after":"d22a4da23970e68def84226b745a7c2310f532b5","ref":"refs/heads/master","pushedAt":"2024-05-21T18:56:18.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel] Return `tags` in `Scan.getScanFiles()` output (#3119)\n\nNow, the file scan result didn't contain a tags field. In our case, we\r\ndefine some custom properties in the tags filed, so we want to filter\r\nthe `AddFile` according to the custom tags.","shortMessageHtmlLink":"[Kernel] Return tags in Scan.getScanFiles() output (#3119)"}},{"before":"611ad13c88833332d78e624ec2ecec15b77f36b3","after":"063c71d99ece90081caf20d10cdb5ccd63f3c27c","ref":"refs/heads/master","pushedAt":"2024-05-21T17:51:03.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[Spark] Integrate ICT into Managed Commits (#3108)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\n\r\nResolves TODOs around ICT <-> MC integration.\r\n\r\n## How was this patch tested?\r\n\r\n\r\nUpdated existing tests.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Integrate ICT into Managed Commits (#3108)"}},{"before":"25a42df8b68a944490a38dcd838acff0c438d517","after":"611ad13c88833332d78e624ec2ecec15b77f36b3","ref":"refs/heads/master","pushedAt":"2024-05-21T15:59:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Support predicates for stats that are not at the top level (#3117)\n\n## Description\r\n\r\nThis refactoring adds support for nested statistics columns. So far, all\r\nstatistics are keys in the stats struct in AddFiles. This PR adds\r\nsupport for statistics that are part of nested structs. This is a\r\nprerequisite for file skipping on collated string columns ([Protocol\r\nRFC](https://github.com/delta-io/delta/pull/3068)). Statistics for\r\ncollated string columns will be wrapped in a struct keyed by the\r\nversioned collation that was used to generate them. For example:\r\n\r\n```\r\n\"stats\": { \"statsWithCollation\": { \"icu.en_US.72\": { \"minValues\": { ...} } } }\r\n```\r\n\r\nThis PR replaces statType in StatsColumn with pathToStatType, which can\r\nbe used to represent a path. This way we can re-use all of the existing\r\ndata skipping code without changes.\r\n\r\n## How was this patch tested?\r\nIt is not possible to test this change without altering\r\n[statsSchema](https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/stats/StatisticsCollection.scala#L285).\r\nI would still like to ship this PR separately because the change is big\r\nenough in itself. There is existing test coverage for stats parsing and\r\nfile skipping, but none of them uses nested statistics yet.\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\nNo","shortMessageHtmlLink":"[Spark] Support predicates for stats that are not at the top level (#…"}},{"before":"2c444a2071161f208db64a716be97d6f381822af","after":"9a349e47f93c6a1e80d5cf49d977b89e836b07ff","ref":"refs/heads/branch-3.2","pushedAt":"2024-05-20T21:55:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[Spark][3.2] Fall back to zordering when clustering on a single column (#3121)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\nFall back to zorder when clustering on a single column, because hilbert\r\nclustering doesn't support 1 column.\r\n\r\nResolves #3087 \r\n## How was this patch tested?\r\n\r\n\r\nNew unit test.\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\nNo","shortMessageHtmlLink":"[Spark][3.2] Fall back to zordering when clustering on a single column ("}},{"before":"7b4ee63ac02f9a11664689c664116f11c29bed86","after":"25a42df8b68a944490a38dcd838acff0c438d517","ref":"refs/heads/master","pushedAt":"2024-05-20T17:18:20.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"allisonport-db","name":"Allison Portis","path":"/allisonport-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/89107911?s=80&v=4"},"commit":{"message":"[SPARK] Add more testing for variant + delta features (#3102)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [x] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\n\r\nadds testing for auto compaction and deletion vectors\r\n\r\n## How was this patch tested?\r\n\r\ntest only change\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n","shortMessageHtmlLink":"[SPARK] Add more testing for variant + delta features (#3102)"}},{"before":"57df2c0c5d1e25e70aac1d5ce5c6ac7dba54d0f9","after":"7b4ee63ac02f9a11664689c664116f11c29bed86","ref":"refs/heads/master","pushedAt":"2024-05-20T16:40:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Managed Commits: add a DynamoDB-based commit owner (#3107)\n\n## Description\r\nTaking inspiration from https://github.com/delta-io/delta/pull/339, this\r\nPR adds a Commit Owner Client which uses DynamoDB as the backend. Each\r\nDelta table managed by a DynamoDB instance will have one corresponding\r\nentry in a DynamoDB table. The table schema is as follows:\r\n\r\n* tableId: String --- The unique identifier for the entry. This is a\r\nUUID.\r\n* path: String --- The fully qualified path of the table in the file\r\nsystem. e.g. s3://bucket/path.\r\n* acceptingCommits: Boolean --- Whether the commit owner is accepting\r\nnew commits. This will only\r\n* be set to false when the table is converted from managed commits to\r\nfile system commits.\r\n* tableVersion: Number --- The version of the latest commit.\r\n* tableTimestamp: Number --- The inCommitTimestamp of the latest commit.\r\n* schemaVersion: Number --- The version of the schema used to store the\r\ndata.\r\n* commits: --- The list of unbackfilled commits.\r\n - version: Number --- The version of the commit.\r\n - inCommitTimestamp: Number --- The inCommitTimestamp of the commit.\r\n - fsName: String --- The name of the unbackfilled file.\r\n - fsLength: Number --- The length of the unbackfilled file.\r\n- fsTimestamp: Number --- The modification time of the unbackfilled\r\nfile.\r\n\r\nFor a table to be managed by DynamoDB, `registerTable` must be called\r\nfor that Delta table. This will create a new entry in the db for this\r\nDelta table. Every `commit` invocation appends the UUID delta file\r\nstatus to the `commits` list in the table entry. `commit` is performed\r\nthrough a conditional write in DynamoDB.\r\n\r\n## How was this patch tested?\r\nAdded a new suite called `DynamoDBCommitOwnerClient5BackfillSuite` which\r\nuses a mock DynamoDB client. + plus manual testing against a DynamoDB\r\ninstance.","shortMessageHtmlLink":"[Spark] Managed Commits: add a DynamoDB-based commit owner (#3107)"}},{"before":"3af433517bf5a42b1774cb63a8bd1d262e7d933d","after":"57df2c0c5d1e25e70aac1d5ce5c6ac7dba54d0f9","ref":"refs/heads/master","pushedAt":"2024-05-20T04:04:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Handle case when Checkpoints.findLastCompleteCheckpoint is passed MAX_VALUE (#3105)\n\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\nFixes an issue where `Checkpoints.findLastCompleteCheckpoint` goes into\r\nan almost infinite loop if it is passed a Checkpoint.MAX_VALUE.\r\n\r\n## How was this patch tested?\r\n\r\nUT\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Handle case when Checkpoints.findLastCompleteCheckpoint is pa…"}},{"before":"eca5a7f439f8039207705ee03b66916c6a987b79","after":"3af433517bf5a42b1774cb63a8bd1d262e7d933d","ref":"refs/heads/master","pushedAt":"2024-05-17T23:15:34.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tdas","name":"Tathagata Das","path":"/tdas","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/663212?s=80&v=4"},"commit":{"message":"[Spark] Pass sparkSession to commitOwnerBuilder (#3112)\n\n\r\n\r\n#### Which Delta project/connector is this regarding?\r\n\r\n\r\n- [X] Spark\r\n- [ ] Standalone\r\n- [ ] Flink\r\n- [ ] Kernel\r\n- [ ] Other (fill in here)\r\n\r\n## Description\r\n\r\n\r\nUpdates CommitOwnerBuilder.build so that it can take in a sparkSession\r\nobject. This allows it to read CommitOwner-related dynamic confs from\r\nthe sparkSession while building it.\r\n\r\n\r\n## Does this PR introduce _any_ user-facing changes?\r\n\r\n\r\nNo","shortMessageHtmlLink":"[Spark] Pass sparkSession to commitOwnerBuilder (#3112)"}},{"before":"e15132b021f3d4ec5b2f359aaf682fa29fadbb46","after":"eca5a7f439f8039207705ee03b66916c6a987b79","ref":"refs/heads/master","pushedAt":"2024-05-17T17:51:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] InCommitTimestamp: Use clock.currentTimeMillis() instead of nanoTime() in commitLarge (#3111)\n\n## Description\r\nWe currently use NANOSECONDS.toMillis(System.nanoTime()) for generating\r\nthe ICT when `commitLarge` is called. However, this usage of\r\nSystem.nanoTime() is not correct as it should only be used for measuring\r\ntime difference, not to get an approximate wall clock time. This leads\r\nto scenarios where the ICT becomes very small (e.g. 1 Jan 1970)\r\nsometimes because some systems return a very small number when\r\nSystem.nanoTime() is called. This PR changes this so that\r\nclock.getCurrentTimeMillis() is used instead.\r\n\r\n## How was this patch tested?\r\nAdded a test case to ensure that `clock.getCurrentTimeMillis()` is being\r\nused.","shortMessageHtmlLink":"[Spark] InCommitTimestamp: Use clock.currentTimeMillis() instead of n…"}},{"before":"8a8e757eba08abe83a4ddbb328442a9c6125ce03","after":"e15132b021f3d4ec5b2f359aaf682fa29fadbb46","ref":"refs/heads/master","pushedAt":"2024-05-17T14:55:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Spark] Fall back to zordering when clustering on a single column (#3109)\n\n## Description\r\nFall back to zorder when clustering on a single column, because hilbert\r\nclustering doesn't support 1 column.\r\n\r\nResolves #3087 \r\n\r\n## How was this patch tested?\r\nNew unit test.","shortMessageHtmlLink":"[Spark] Fall back to zordering when clustering on a single column (#3109"}},{"before":null,"after":"069a1f9f4c3e2558b2eeac68b917f07ee7f6f60f","ref":"refs/heads/branch-3.2-crc-optimization","pushedAt":"2024-05-16T23:24:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[Kernel] Support for loading protocol and metadata from checksum file in DeltaLog\n\nInitial read current version CRC w test to verify\n\nwip\n\nend-2-end working + tests\n\nCo-authored-by: Allison Portis \nCo-authored-by: Venki Korukanti ","shortMessageHtmlLink":"[Kernel] Support for loading protocol and metadata from checksum file…"}},{"before":"a7e65c9b5cc289e1468e403619116d4335177a33","after":"2c444a2071161f208db64a716be97d6f381822af","ref":"refs/heads/branch-3.2","pushedAt":"2024-05-16T18:33:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vkorukanti","name":"Venki Korukanti","path":"/vkorukanti","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1719945?s=80&v=4"},"commit":{"message":"[3.2][Kernel] Handle `KernelEngineException` when reading the `_last_checkpoint` file (#3086)\n\nThere is an issue with the `CloseableIterator` interface that Kernel is\nusing. Currently, it extends Java's `iterator`, which doesn't throw any\nexceptions. We use `CloseableIterator` when returning data read from a\nfile or any incremental data access. Any `IOException` in `hasNext` or\n`next` is wrapped in a `UncheckedIOException` or `RuntimeException`.\nUsers of the `CloseableIterator` need to catch for\n`UncheckedIOException` or `RuntimeException` explicitly and look at the\ncause if they are interested in the `IOException`. This is not\nconsistent and causes problems for the code that want to handle\nexceptions like `FileNotFoundException` (subclass of `IOException`) and\ntake further actions.\n\n* Change the `CloseableIterator.{next, hasNext}` contract to expect\n`KernelEngineException` for any exceptions that occur while executing in\nthe `Engine`.\n* Update the `DefaultParquetHandler` and `DefaultJsonHandler` to throw\n`KernelEngineException` instead of `UncheckedIOException` or\n`RuntimeException`.\n* In the checkpoint metadata loading method, catch\n`KernelEngineException` and see if the cause is `FileNotFoundException.`\nIf yes, don't retry loading.","shortMessageHtmlLink":"[3.2][Kernel] Handle KernelEngineException when reading the `_last_…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEU1PQ3QA","startCursor":null,"endCursor":null}},"title":"Activity · delta-io/delta"}