{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1713428330.0","currentOid":""},"activityList":{"items":[{"before":"bf2e25459fe46ca2b1d26e1c98c873923fc135e1","after":"32ba5c1db62caaaa2674e8acced56f89ed840bf9","ref":"refs/heads/master","pushedAt":"2024-05-05T20:19:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs\n\n### What changes were proposed in this pull request?\n\nThis PR aims to run `sparkr` only in PR builder and Daily Python CIs. In other words, only the commit builder will skip it by default.\n\n### Why are the changes needed?\n\nTo reduce GitHub Action usage to meet ASF INFRA policy.\n- https://infra.apache.org/github-actions-policy.html\n\n > All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46389 from dongjoon-hyun/SPARK-48133.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48133][INFRA] Run sparkr only in PR builders and Daily CIs"}},{"before":"2f2347f3b74f1478fb583de9378427b3e45bd980","after":"45befc07d2a064ab2a279a113489ed5c66f7a69d","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-05T13:50:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist\n\n### What changes were proposed in this pull request?\n\nThis is a followup to https://github.com/apache/spark/pull/46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.\n\nPrimitive arrays are faster than object arrays and won't create unnecessary objects.\n\n### Why are the changes needed?\n\nThis will improve performance and memory usage, when nulls are present in the `ColumnarArray`.\n\n### Does this PR introduce _any_ user-facing change?\n\nThis is expected to be faster when copying `ColumnarArray`.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46372 from gene-db/primitive-nulls.\n\nAuthored-by: Gene Pang \nSigned-off-by: Wenchen Fan \n(cherry picked from commit bf2e25459fe46ca2b1d26e1c98c873923fc135e1)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays …"}},{"before":"a0f62393d69a40ddd49b034b3ce332e6fa6bfb13","after":"bf2e25459fe46ca2b1d26e1c98c873923fc135e1","ref":"refs/heads/master","pushedAt":"2024-05-05T13:50:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist\n\n### What changes were proposed in this pull request?\n\nThis is a followup to https://github.com/apache/spark/pull/46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.\n\nPrimitive arrays are faster than object arrays and won't create unnecessary objects.\n\n### Why are the changes needed?\n\nThis will improve performance and memory usage, when nulls are present in the `ColumnarArray`.\n\n### Does this PR introduce _any_ user-facing change?\n\nThis is expected to be faster when copying `ColumnarArray`.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46372 from gene-db/primitive-nulls.\n\nAuthored-by: Gene Pang \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays …"}},{"before":"b924e689942d735f165d31660d26efad057f4827","after":"a0f62393d69a40ddd49b034b3ce332e6fa6bfb13","ref":"refs/heads/master","pushedAt":"2024-05-05T05:55:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48132][INFRA] Run `k8s-integration-tests` only in PR builder and Daily CIs\n\n### What changes were proposed in this pull request?\n\nThis PR aims to run `k8s-integration-tests` only in PR builder and Daily Python CIs. In other words, only the commit builder will skip it by default.\n\nPlease note that\n- K8s unit tests will be covered by the commit builder still.\n- All PR builders are not consuming ASF resources and they provide lots of test coverage everyday also.\n\n### Why are the changes needed?\n\nTo reduce GitHub Action usage to meet ASF INFRA policy.\n- https://infra.apache.org/github-actions-policy.html\n\n > All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46388 from dongjoon-hyun/SPARK-48132.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48132][INFRA] Run k8s-integration-tests only in PR builder a…"}},{"before":"8443672b1ab1195278a73a9ec487af8e02e3a8de","after":"b924e689942d735f165d31660d26efad057f4827","ref":"refs/heads/master","pushedAt":"2024-05-05T05:47:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48124][CORE] Disable structured logging for Connect-Repl by default\n\n### What changes were proposed in this pull request?\nThe pr is followup https://github.com/apache/spark/pull/46383, to `disable` structured logging for` Connect-Repl` by default.\n\n### Why are the changes needed?\nBefore:\n\"image\"\n\nAfter:\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46387 from panbingkun/SPARK-48124_FOLLOWUP.\n\nAuthored-by: panbingkun \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48124][CORE] Disable structured logging for Connect-Repl by de…"}},{"before":"9a45da21dd1c7dd93152f7126c8c611b8ba031e7","after":"8443672b1ab1195278a73a9ec487af8e02e3a8de","ref":"refs/heads/master","pushedAt":"2024-05-05T00:33:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48131][CORE] Unify MDC key `mdc.taskName` and `task_name`\n\n### What changes were proposed in this pull request?\n\nCurrently there are two MDC keys for task name:\n* `mdc.taskName`, which is introduced in https://github.com/apache/spark/pull/28801. Before the change, it was `taskName`.\n* `task_name`: introduce from the structured logging framework project.\n\nTo make the MDC keys unified, this PR renames the `mdc.taskName` as `task_name`. This MDC is showing frequently in logs when running Spark application.\nBefore change:\n```\n\"context\":{\"mdc.taskName\":\"task 19.0 in stage 0.0 (TID 19)”}\n```\nafter change\n```\n\"context\":{“task_name\":\"task 19.0 in stage 0.0 (TID 19)”}\n```\n\n### Why are the changes needed?\n\n1. Make the MDC names consistent\n2. Minor upside: this will allow users to query task names with `SELECT * FROM logs where context.task_name = ...`. Otherwise, querying with `context.mdc.task_name` will result in an analysis exception. Users will have to query with `context['mdc.task_name']`\n\n### Does this PR introduce _any_ user-facing change?\n\nNo really. The MDC key is used by developers for debugging purpose.\n\n### How was this patch tested?\n\nManual test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46386 from gengliangwang/unify.\n\nAuthored-by: Gengliang Wang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48131][CORE] Unify MDC key mdc.taskName and task_name"}},{"before":"356aca5af5b88570d43d1c0f2b417aa87b86d323","after":"9a45da21dd1c7dd93152f7126c8c611b8ba031e7","ref":"refs/heads/master","pushedAt":"2024-05-04T18:54:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48129][PYTHON] Provide a constant table schema in PySpark for querying structured logs\n\n### What changes were proposed in this pull request?\n\nSimilar to https://github.com/apache/spark/pull/46375/, this PR provides a constant table schema in PySpark for querying structured logs.\nThe doc of logging configuration is also updated.\n\n### Why are the changes needed?\n\nProvide a convenient way to query Spark logs using PySpark.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nManual test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46384 from gengliangwang/pythonLog.\n\nAuthored-by: Gengliang Wang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48129][PYTHON] Provide a constant table schema in PySpark for …"}},{"before":"96f65c950064d330245dc53fcd50cf6d9753afc8","after":"356aca5af5b88570d43d1c0f2b417aa87b86d323","ref":"refs/heads/master","pushedAt":"2024-05-04T18:51:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-46009][SQL][FOLLOWUP] Remove unused golden file\n\n### What changes were proposed in this pull request?\nThis PR propose to remove unused golden file.\n\n### Why are the changes needed?\nhttps://github.com/apache/spark/pull/46272 removed unused `PERCENTILE_CONT` and `PERCENTILE_DISC` in g4.\nBut I made a mistake and submitted my local test code.\n\n### Does this PR introduce _any_ user-facing change?\n'No'.\n\n### How was this patch tested?\nGA\n\n### Was this patch authored or co-authored using generative AI tooling?\n'No'.\n\nCloses #46385 from beliefer/SPARK-46009_followup3.\n\nAuthored-by: beliefer \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-46009][SQL][FOLLOWUP] Remove unused golden file"}},{"before":"681a1de72bdf749e0a0782dde9bddfcbb3248d99","after":"2974e625aae16e8711a1d115731fdfe516752899","ref":"refs/heads/branch-3.4","pushedAt":"2024-05-04T18:49:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codegen syntax error for boolean type inputs\n\n### What changes were proposed in this pull request?\n\nThis PR fixes an issue where `BitwiseCount` / `bit_count` of boolean inputs would cause codegen to generate syntactically invalid Java code that does not compile, triggering errors like\n\n```\n java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Unexpected token \"if\" in primary\n```\n\nEven though this code has test cases in `bitwise.sql` via the query test framework, those existing test cases were insufficient to find this problem: I believe that is because the example queries were constant-folded using the interpreted path, leaving the codegen path without test coverage.\n\nThis PR fixes the codegen issue and adds explicit expression tests to ensure that the same tests run on both the codegen and interpreted paths.\n\n### Why are the changes needed?\n\nFix a rare codegen to interpreted fallback issue, which may harm query performance.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded new test cases to BitwiseExpressionsSuite.scala, copied from the existing `bitwise.sql` query test case file.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46382 from JoshRosen/SPARK-48128-bit_count_codegen.\n\nAuthored-by: Josh Rosen \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 96f65c950064d330245dc53fcd50cf6d9753afc8)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codeg…"}},{"before":"71cb9306085b07b63f2474e05144334cb7e4109d","after":"2f2347f3b74f1478fb583de9378427b3e45bd980","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-04T18:49:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codegen syntax error for boolean type inputs\n\n### What changes were proposed in this pull request?\n\nThis PR fixes an issue where `BitwiseCount` / `bit_count` of boolean inputs would cause codegen to generate syntactically invalid Java code that does not compile, triggering errors like\n\n```\n java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Unexpected token \"if\" in primary\n```\n\nEven though this code has test cases in `bitwise.sql` via the query test framework, those existing test cases were insufficient to find this problem: I believe that is because the example queries were constant-folded using the interpreted path, leaving the codegen path without test coverage.\n\nThis PR fixes the codegen issue and adds explicit expression tests to ensure that the same tests run on both the codegen and interpreted paths.\n\n### Why are the changes needed?\n\nFix a rare codegen to interpreted fallback issue, which may harm query performance.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded new test cases to BitwiseExpressionsSuite.scala, copied from the existing `bitwise.sql` query test case file.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46382 from JoshRosen/SPARK-48128-bit_count_codegen.\n\nAuthored-by: Josh Rosen \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 96f65c950064d330245dc53fcd50cf6d9753afc8)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codeg…"}},{"before":"ca8c269a15037ce716449b5bba581e46aa8d7fea","after":"96f65c950064d330245dc53fcd50cf6d9753afc8","ref":"refs/heads/master","pushedAt":"2024-05-04T18:49:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codegen syntax error for boolean type inputs\n\n### What changes were proposed in this pull request?\n\nThis PR fixes an issue where `BitwiseCount` / `bit_count` of boolean inputs would cause codegen to generate syntactically invalid Java code that does not compile, triggering errors like\n\n```\n java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 11: Unexpected token \"if\" in primary\n```\n\nEven though this code has test cases in `bitwise.sql` via the query test framework, those existing test cases were insufficient to find this problem: I believe that is because the example queries were constant-folded using the interpreted path, leaving the codegen path without test coverage.\n\nThis PR fixes the codegen issue and adds explicit expression tests to ensure that the same tests run on both the codegen and interpreted paths.\n\n### Why are the changes needed?\n\nFix a rare codegen to interpreted fallback issue, which may harm query performance.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded new test cases to BitwiseExpressionsSuite.scala, copied from the existing `bitwise.sql` query test case file.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46382 from JoshRosen/SPARK-48128-bit_count_codegen.\n\nAuthored-by: Josh Rosen \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48128][SQL] For BitwiseCount / bit_count expression, fix codeg…"}},{"before":"3b8c0049a5b58f26eb16c2d42070aea31e37a6c3","after":"ca8c269a15037ce716449b5bba581e46aa8d7fea","ref":"refs/heads/master","pushedAt":"2024-05-04T18:48:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48124][CORE] Disable structured logging for Interpreters by default\n\n### What changes were proposed in this pull request?\n\nFor interpreters, structured logging should be disabled by default to avoid generating mixed plain text and structured logs on the same console.\n\nspark-shell output with mixed plain text and structured logs:\n```\nUsing Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.9)\n\nType in expressions to have them evaluated.\n\nType :help for more information.\n\n{\"ts\":\"2024-05-04T01:11:03.797Z\",\"level\":\"WARN\",\"msg\":\"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\",\"logger\":\"NativeCodeLoader\"} {\"ts\":\"2024-05-04T01:11:04.104Z\",\"level\":\"WARN\",\"msg\":\"Service 'SparkUI' could not bind on port 4040. Attempting port 4041.\",\"logger\":\"Utils\"}\nSpark context Web UI available at http://10.10.114.155:4041/\n\nSpark context available as 'sc' (master = local[*], app id = local-1714785064155).\n\nSpark session available as 'spark'.\n```\n\nAfter changes, all the output are plain text:\n```\nType :help for more information.\n\n24/05/03 18:11:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n\n24/05/03 18:11:35 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.\n\nSpark context Web UI available at http://10.10.114.155:4041/\n\nSpark context available as 'sc' (master = local[*], app id = local-1714785095892).\n\nSpark session available as 'spark'.\n```\n\nNote that submitting a spark application using `spark-submit` will still generates structured logs.\n### Why are the changes needed?\n\nTo avoid tgenerating mixed plain text and structured logs on the same console when using the Interpreters.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, this reverts to the behavior of Spark 3.5\n\n### How was this patch tested?\n\nManual test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46383 from gengliangwang/disableStructuredLogInRepl.\n\nAuthored-by: Gengliang Wang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48124][CORE] Disable structured logging for Interpreters by de…"}},{"before":"e4453b480f988bf6683930ae14b7043a2cecffc4","after":"3b8c0049a5b58f26eb16c2d42070aea31e37a6c3","ref":"refs/heads/master","pushedAt":"2024-05-04T10:51:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48116][INFRA][FOLLOW-UP] Deduplicate pyspark.pandas skipping logic\n\n### What changes were proposed in this pull request?\n\nThis PR is another try of https://github.com/apache/spark/pull/46380 that is a followup of https://github.com/apache/spark/pull/46367 that simplifies the build and deduplicate them.\n\n### Why are the changes needed?\n\nTo fix the condition, and make it deduplicated.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, dev-only.\n\n### How was this patch tested?\n\nWill test in my own fork: https://github.com/HyukjinKwon/spark/actions/runs/8948215777\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46381 from HyukjinKwon/SPARK-48116-followup2.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48116][INFRA][FOLLOW-UP] Deduplicate pyspark.pandas skipping l…"}},{"before":"111529e0dea68bf5343cc6aabba53b59e5d21830","after":"e4453b480f988bf6683930ae14b7043a2cecffc4","ref":"refs/heads/master","pushedAt":"2024-05-04T07:18:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48123][CORE] Provide a constant table schema for querying structured logs\n\n### What changes were proposed in this pull request?\n\nProviding a table schema LOG_SCHEMA, so that users can load structured logs with the following code:\n\n```\nimport org.apache.spark.util.LogUtils.LOG_SCHEMA\n\nval logDf = spark.read.schema(LOG_SCHEMA).json(\"path/to/logs\")\n```\n\n### Why are the changes needed?\n\nProvide a convenient way to query Spark logs using Spark SQL.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nNew UT\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46375 from gengliangwang/logSchema.\n\nAuthored-by: Gengliang Wang \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48123][CORE] Provide a constant table schema for querying stru…"}},{"before":"454fcbb277b932427f12ae2ffa2e22894619600e","after":"111529e0dea68bf5343cc6aabba53b59e5d21830","ref":"refs/heads/master","pushedAt":"2024-05-04T06:33:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework\n\n### What changes were proposed in this pull request?\n\nMigrate logWarning with variables of the Spark Core module to structured logging framework. This transforms the logWarning entries of the following API\n```\ndef logWarning(msg: => String): Unit\n```\nto\n```\ndef logWarning(entry: LogEntry): Unit\n```\n\n### Why are the changes needed?\n\nTo enhance Apache Spark's logging system by implementing structured logging.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, Spark core logs will contain additional MDC\n\n### How was this patch tested?\n\nCompiler and scala style checks, as well as code review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46309 from dtenedor/spark-core-log-warn.\n\nAuthored-by: Daniel Tenedorio \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-47578][CORE] Migrate logWarning with variables to structured l…"}},{"before":"2be447f89ea846c10dcd993de74d06f87e61c1f3","after":"454fcbb277b932427f12ae2ffa2e22894619600e","ref":"refs/heads/master","pushedAt":"2024-05-04T05:07:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48116][INFRA][FOLLOWUP] Fix `if` statement to check repository","shortMessageHtmlLink":"[SPARK-48116][INFRA][FOLLOWUP] Fix if statement to check repository"}},{"before":"2cb6ea721fe0c649d70f82d28a5058ae93c20831","after":"2be447f89ea846c10dcd993de74d06f87e61c1f3","ref":"refs/heads/master","pushedAt":"2024-05-04T05:06:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition\"\n\nThis reverts commit 2cb6ea721fe0c649d70f82d28a5058ae93c20831.","shortMessageHtmlLink":"Revert \"[SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing…"}},{"before":"c0ef52760de90a5d843e40c2fa990599d01bc798","after":"2cb6ea721fe0c649d70f82d28a5058ae93c20831","ref":"refs/heads/master","pushedAt":"2024-05-04T05:04:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition\n\n### What changes were proposed in this pull request?\n\nThis PR is a followup of https://github.com/apache/spark/pull/46367 that simplifies the build and deduplicate them.\n\n### Why are the changes needed?\n\nTo fix the condition, and make it deduplicated.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, dev-only.\n\n### How was this patch tested?\n\nManually.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46380 from HyukjinKwon/SPARK-48116-followup.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if …"}},{"before":"1904dee475d735533ff5d0d2d3580e4e83b7520b","after":"c0ef52760de90a5d843e40c2fa990599d01bc798","ref":"refs/heads/master","pushedAt":"2024-05-04T05:01:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules\n\n### What changes were proposed in this pull request?\n\nThis PR aims to fix `dev/scalastyle` to check `hadoop-cloud` and `jam-profiler` modules.\nAlso, the detected scalastyle issues are fixed.\n\n### Why are the changes needed?\n\nTo prevent future scalastyle issues.\n\nScala style violation was introduced here, but we missed because we didn't check all optional modules.\n- https://github.com/apache/spark/pull/46022\n\n`jvm-profiler` module was added newly at Apache Spark 4.0.0 but we missed to add this to `dev/scalastyle`. Note that there was no scala style issues in that `module` at that time.\n- #44021\n\n`hadoop-cloud` module was added at Apache Spark 2.3.0.\n- #17834\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs with newly revised `dev/scalastyle`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46376 from dongjoon-hyun/SPARK-48127.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48127][INFRA] Fix dev/scalastyle to check hadoop-cloud and…"}},{"before":"be998221fc934ba7de9b6233f315f3f59bbc8435","after":"1904dee475d735533ff5d0d2d3580e4e83b7520b","ref":"refs/heads/master","pushedAt":"2024-05-04T04:15:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs\n\n### What changes were proposed in this pull request?\n\nThis PR aims to run `pyspark-pandas*` only in PR builder and Daily Python CIs. In other words, only the commit builder will skip it by default. Please note that all PR builders is not consuming ASF resources and they provides lots of test coverage everyday.\n- https://github.com/apache/spark/actions/workflows/build_python.yml\n\n### Why are the changes needed?\n\nTo reduce GitHub Action usage to meet ASF INFRA policy.\n- https://infra.apache.org/github-actions-policy.html\n\n > All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices.\n\nAlthough `pandas` is an **optional** package in PySpark, this is essential for PySpark users and we have **6 test pipelines** which requires lots of resources. We need to optimize the job concurrently level to `less than or equal to 20` while keeping the test capability as much as possible.\n\nhttps://github.com/apache/spark/blob/f450272a9aac812d735eb5f741eec1f6cf1c837c/dev/requirements.txt#L4-L8\n\n- pyspark-pandas\n- pyspark-pandas-slow\n- pyspark-pandas-connect-part0\n- pyspark-pandas-connect-part1\n- pyspark-pandas-connect-part2\n- pyspark-pandas-connect-part3\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46367 from dongjoon-hyun/SPARK-48116.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48116][INFRA] Run pyspark-pandas* only in PR builder and Dai…"}},{"before":"5d1f976f85fe1ee39ca3cc4f0f2e6afa8b43e5ea","after":"be998221fc934ba7de9b6233f315f3f59bbc8435","ref":"refs/heads/master","pushedAt":"2024-05-04T03:47:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use `assertEqual` instead of `assertEquals` for Python 3.12\n\n### What changes were proposed in this pull request?\n\nThis is a follow-up of\n- #46297\n\nThis PR aims to use `assertEqual` instead of `assertEquals` for Python 3.12.\n\n### Why are the changes needed?\n\nTo recover Python CI,\n- https://github.com/apache/spark/actions/workflows/build_python.yml\n\nFrom Python 3.12, `assertEquals` doesn't exist.\n\nhttps://docs.python.org/3/library/unittest.html#assert-methods\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46377 from dongjoon-hyun/SPARK-48056.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use assertEqual instead o…"}},{"before":"7f08df4af95d20f3fd056588b5a3cfa5f5c57654","after":"5d1f976f85fe1ee39ca3cc4f0f2e6afa8b43e5ea","ref":"refs/heads/master","pushedAt":"2024-05-04T03:42:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47969][PYTHON][TESTS][FOLLOWUP] Make Test `test_creation_index` deterministic\n\n### What changes were proposed in this pull request?\nfollowup https://github.com/apache/spark/pull/46200\n\n### Why are the changes needed?\nthere is still non-deterministic codes in this test:\n```\nTraceback (most recent call last):\n File \"/home/jenkins/python/pyspark/testing/pandasutils.py\", line 91, in _assert_pandas_equal\n assert_frame_equal(\n File \"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py\", line 1257, in assert_frame_equal\n assert_index_equal(\n File \"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py\", line 407, in assert_index_equal\n raise_assert_detail(obj, msg, left, right)\n File \"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py\", line 665, in raise_assert_detail\n raise AssertionError(msg)\nAssertionError: DataFrame.index are different\nDataFrame.index values are different (75.0 %)\n[left]: DatetimeIndex(['2022-09-02', '2022-09-03', '2022-08-31', '2022-09-05'], dtype='datetime64[ns]', freq=None)\n[right]: DatetimeIndex(['2022-08-31', '2022-09-02', '2022-09-03', '2022-09-05'], dtype='datetime64[ns]', freq=None)\n\n```\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46378 from zhengruifeng/ps_test_create_index.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47969][PYTHON][TESTS][FOLLOWUP] Make Test `test_creation_index…"}},{"before":"5c01f196afc3ba75f10c4aedf2c8405b6f59336a","after":"7f08df4af95d20f3fd056588b5a3cfa5f5c57654","ref":"refs/heads/master","pushedAt":"2024-05-03T23:54:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute` for `interrupt tag` test\n\n### What changes were proposed in this pull request?\n\nThis is a follow-up to increase `timeout` from `30s` to `1 minute` like the other timeouts of the same test case.\n- #45173\n\n### Why are the changes needed?\n\nTo reduce the flakiness more. The following is the recent failure on `master` branch.\n- https://github.com/apache/spark/actions/runs/8944948827/job/24572965877\n- https://github.com/apache/spark/actions/runs/8945375279/job/24574263993\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46374 from dongjoon-hyun/SPARK-47097.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute…"}},{"before":"c3a462ce2966d42a3cebf238b809e2c2e2631c08","after":"5c01f196afc3ba75f10c4aedf2c8405b6f59336a","ref":"refs/heads/master","pushedAt":"2024-05-03T23:30:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-48059][CORE] Implement the structured log framework on the java side\n\n### What changes were proposed in this pull request?\nThe pr aims to implement the structured log framework on the `java side`.\n\n### Why are the changes needed?\nCurrently, the structured log framework on the `scala side` is basically available, but the`Spark Core` code also includes some `Java code`, which also needs to be connected to the structured log framework.\n\n### Does this PR introduce _any_ user-facing change?\nYes, only for developers.\n\n### How was this patch tested?\n- Add some new UT.\n- Pass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46301 from panbingkun/structured_logger_java.\n\nAuthored-by: panbingkun \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-48059][CORE] Implement the structured log framework on the jav…"}},{"before":"b42d235c29302b9faa4254d07db1282207345f69","after":"c3a462ce2966d42a3cebf238b809e2c2e2631c08","ref":"refs/heads/master","pushedAt":"2024-05-03T23:25:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi`\n\n### What changes were proposed in this pull request?\n\nThis PR aims to promote `KubernetesDriverConf` to `DeveloperApi`\n\n### Why are the changes needed?\n\nSince Apache Spark Kubernetes Operator requires this, we had better maintain it as a developer API officially from Apache Spark 4.0.0.\n\nhttps://github.com/apache/spark-kubernetes-operator/pull/10\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nPass the CIs\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46373 from jiangzho/driver_conf.\n\nAuthored-by: zhou-jiang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48121][K8S] Promote KubernetesDriverConf to DeveloperApi"}},{"before":"85902880d709a66ef89bd6a5e0e7f1233f4d4fec","after":"b42d235c29302b9faa4254d07db1282207345f69","ref":"refs/heads/master","pushedAt":"2024-05-03T22:15:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48114][CORE] Precompile template regex to avoid unnecessary work\n\n### What changes were proposed in this pull request?\nError message template regex is now precompiled to avoid unnecessary work\n\n### Why are the changes needed?\n`SparkRuntimeException` uses `SparkThrowableHelper`, which uses `ErrorClassesJsonReader` to create error message string from templates in `error-conditions.json`, but template regex is compiled on every `SparkRuntimeException` constructor invocation. This slows down error construction, in particular `UnivocityParser` + `FailureSafeParser`, where it's a hot path.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\n- `testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite`\n- Manually checked csv parsing error\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46365 from vladimirg-db/vladimirg-db/precompile-regexes-in-error-classes-json-reader.\n\nAuthored-by: Vladimir Golubev \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48114][CORE] Precompile template regex to avoid unnecessary work"}},{"before":"d6ca2c5c3c4b42d8ddfd2bc50057c9b14ef7ae1e","after":"85902880d709a66ef89bd6a5e0e7f1233f4d4fec","ref":"refs/heads/master","pushedAt":"2024-05-03T22:02:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48119][K8S] Promote `KubernetesDriverSpec` to `DeveloperApi`\n\n### What changes were proposed in this pull request?\n\nThis PR aims to promote ` KubernetesDriverSpec` to `DeveloperApi`\n\n### Why are the changes needed?\n\nSince Apache Spark Kubernetes Operator requires this, we had better maintain it as a developer API officially from Apache Spark 4.0.0.\n\nhttps://github.com/apache/spark-kubernetes-operator/pull/10\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nPass the CIs\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46371 from jiangzho/k8s_dev_apis.\n\nAuthored-by: zhou-jiang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48119][K8S] Promote KubernetesDriverSpec to DeveloperApi"}},{"before":"aa00b00c18e6a714dc02e9444576e063c8e49db7","after":"d6ca2c5c3c4b42d8ddfd2bc50057c9b14ef7ae1e","ref":"refs/heads/master","pushedAt":"2024-05-03T22:00:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48118][SQL] Support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable\n\n### What changes were proposed in this pull request?\n\nThis PR aims to support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable to provide users an easier migration path.\n\n### Why are the changes needed?\n\nLike `SPARK_ANSI_SQL_MODE` for `spark.sql.ansi.enabled`, the platform providers can control the default value of `spark.sql.legacy.createHiveTableByDefault` configuration.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46369 from dongjoon-hyun/SPARK-48118.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48118][SQL] Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env v…"}},{"before":"f450272a9aac812d735eb5f741eec1f6cf1c837c","after":"aa00b00c18e6a714dc02e9444576e063c8e49db7","ref":"refs/heads/master","pushedAt":"2024-05-03T21:10:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48115][INFRA] Remove `Python 3.11` from `build_python.yml`\n\n### What changes were proposed in this pull request?\n\nThis PR aims to remove `Python 3.11` from `build_python.yml` Daily CI because `Python 3.11` is the main python version in the PR and commit build.\n- https://github.com/apache/spark/actions/workflows/build_python.yml\n\n### Why are the changes needed?\n\nTo reduce GitHub Action usage to meet ASF INFRA policy.\n- https://infra.apache.org/github-actions-policy.html\n\n > The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours).\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46366 from dongjoon-hyun/SPARK-48115.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48115][INFRA] Remove Python 3.11 from build_python.yml"}},{"before":"cd789acb5e51172e43052b59c4b610e64f380a16","after":"f450272a9aac812d735eb5f741eec1f6cf1c837c","ref":"refs/heads/master","pushedAt":"2024-05-03T10:23:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-43046][FOLLOWUP][SS][CONNECT] Remove not used line in deduplicateWithinWatermark\n\n### What changes were proposed in this pull request?\n\nAn extra assignment was added when we first introduce `dropDuplicatesWithinWatermark` in https://github.com/apache/spark/commit/4d765114d6e5dd1a78a7ad798750e7bc400a72a6. We don't need this line.\n\n### Why are the changes needed?\n\nCode cleanup\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nExisting CI\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46345 from WweiL/deduplicate-within-wm-connect.\n\nAuthored-by: Wei Liu \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-43046][FOLLOWUP][SS][CONNECT] Remove not used line in deduplic…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEQgBk4QA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}