{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1717372277.0","currentOid":""},"activityList":{"items":[{"before":"cf3051b181318bb05486fbe37c94c0ef70e355be","after":"9de0a2eca9ce0965350d19d207f6db01b7081b27","ref":"refs/heads/master","pushedAt":"2024-06-07T00:15:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-42944][FOLLOWUP][SS][CONNECT] Reenable ApplyInPandasWithState tests\n\n### What changes were proposed in this pull request?\n\nThe tests for ApplyInPandasWithState was skipped in connect before. This was because the tests uses foreachBatch, which was not ready when the development is done. So they were skipped. This PR reenables them.\n\n### Why are the changes needed?\n\nNecessary tests\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nTest only addition.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46853 from WweiL/apply-in-pandas-with-state-test.\n\nAuthored-by: Wei Liu <wei.liu@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-42944\">SPARK-42944</a>][FOLLOWUP][SS][CONNECT] Reenable ApplyInPandasWithState …"}},{"before":"3c4cb407b5ca6fa9ec960cf4a0667ea39ff59393","after":"cf3051b181318bb05486fbe37c94c0ef70e355be","ref":"refs/heads/master","pushedAt":"2024-06-07T00:10:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][PYTHON] add export class and fix doc typo for python streaming data source\n\n### What changes were proposed in this pull request?\nAdd SimpleDataSourceStreamReader to default export class.\nFix the typo in python_data_source.dst\n\n### Why are the changes needed?\nTo improve user experience.\n\n### Does this PR introduce _any_ user-facing change?\nno.\n\n### How was this patch tested?\nCovered by existing test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46903 from chaoqin-li1123/fix_export.\n\nAuthored-by: Chaoqin Li <chaoqin.li@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[MINOR][PYTHON] add export class and fix doc typo for python streamin…"}},{"before":"edb9236ea688ca0627e3c8a68f4a87e5689e2f9a","after":"3c4cb407b5ca6fa9ec960cf4a0667ea39ff59393","ref":"refs/heads/master","pushedAt":"2024-06-07T00:02:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn\n\n### What changes were proposed in this pull request?\n1. Add configuration `spark.connect.grpc.port.maxRetries` (default 0, no retries).\n\n   [Before this PR]: The SparkConnectService would fail to start in case of port conflicts on Yarn.\n   [After this PR]: Allow the internal GRPC server to retry new ports until it finds an available port before reaching the maxRetries.\n\n2. Post SparkListenerEvent containing the location of the remote SparkConnectService on Yarn.\n\n   [Before this PR]: We needed to manually find the final location (host and port) of the SparkConnectService on Yarn and then use the SparkConnect Client to connect.\n   [After this PR]: The location will be posted via SparkListenerEvent\n                                                  `SparkListenerConnectServiceStarted`\n                                                  `SparkListenerConnectServiceEnd`\n Allowing users to register a listener to receive this event and expose it by some way like sending it to a third coordinator server.\n\n3. Shutdown SparkPlugins before stopping the ListenerBus.\n\n   [Before this PR]: If the SparkConnectService was launched in the SparkConnectPlugin way, currently the SparkPlugins would be shutdown after the stop of ListenerBus, causing events posted during the shutdown to not be delivered to the listener.\n   [After this PR]: The SparkPlugins will be shutdown before the stop of ListenerBus, ensuring that the ListenerBus remains active during the shutdown and the listener can receive the SparkConnectService stop event.\n\n4. Minor method refactoring for 1~3.\n\n### Why are the changes needed?\n#User Story:\nOur data analysts and data scientists use Jupyter notebooks provisioned on Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark for interactively development via terminal under Yarn Client mode.\n\nHowever, Yarn Client mode consumes significant local memory if the job is heavy, and the total resource pool of k8s for notebooks is limited.\n\nTo leverage the abundant resources of our Hadoop cluster for scalability purposes, we aim to utilize SparkConnect.\nThis allows the driver on Yarn with SparkConnectService started and uses SparkConnect client to connect to the remote driver.\n\nTo provide a seamless experience with one command startup for both server and client, we've wrapped the following processes in one script:\n\n1) Start a local coordinator server (implemented by us internally, not in this PR) in the host of jupyter notebook.\n2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with user-input Spark configurations and the local coordinator server's address and port.\n   Append an additional listener class in the configuration for SparkConnectService callback with the actual address and port on Yarn to the coordinator server.\n3) Wait for the coordinator server to receive the address callback from the SparkConnectService on Yarn and export the real address.\n4) Start the client (pyspark --remote $callback_address) with the remote address.\n\nFinally, a remote SparkConnect Server is started on Yarn with a local SparkConnect client connected. Users no longer need to start the server beforehand and connect to the remote server after they manually explore the address on Yarn.\n\n#Problem statement of this change:\n1) The specified port for the SparkConnectService GRPC server might be occupied on the node of the Hadoop Cluster.\n   To increase the success rate of startup, it needs to retry on conflicts rather than fail directly.\n2) Because the final binding port could be uncertain based on 1) when retry and the remote address is also unpredictable on Yarn, we need to retrieve the address and port programmatically and inject it automatically on the start of 'pyspark --remote'. To get the address of SparkConnectService on Yarn programmatically, the SparkConnectService needs to communicate its location back to the launcher side.\n\n### Does this PR introduce _any_ user-facing change?\n1. Add configuration `spark.connect.grpc.port.maxRetries` to enable port retries until an available port is found before reaching the maximum number of retries.\n\n3. The start and stop events of the SparkConnectService are observable through the SparkListener.\n   Two new events have been introduced:\n   - SparkListenerConnectServiceStarted: the SparkConnectService(with address and port) tis online for serving\n   - SparkListenerConnectServiceEnd: the SparkConnectService(with address and port) is offline\n\n### How was this patch tested?\nBy UT and verified the feature in our production environment by our binary build\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46182 from TakawaAkirayo/SPARK-47952.\n\nLead-authored-by: tatian <tatian@ebay.com>\nCo-authored-by: TakawaAkirayo <153728772+TakawaAkirayo@users.noreply.github.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47952\">SPARK-47952</a>][CORE][CONNECT] Support retrieving the real SparkConnect…"}},{"before":"ce1b08f6e30b1c6fba364a41a75b979017ffe7ef","after":"edb9236ea688ca0627e3c8a68f4a87e5689e2f9a","ref":"refs/heads/master","pushedAt":"2024-06-06T23:30:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48504][PYTHON][CONNECT][FOLLOW-UP] Code clean up\n\n### What changes were proposed in this pull request?\nCode clean up\n\n### Why are the changes needed?\nCode clean up\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46898 from zhengruifeng/win_refactor.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48504\">SPARK-48504</a>][PYTHON][CONNECT][FOLLOW-UP] Code clean up"}},{"before":"0f21df0b29cc18f0e0c7b12543f3a037e4032e65","after":"ce1b08f6e30b1c6fba364a41a75b979017ffe7ef","ref":"refs/heads/master","pushedAt":"2024-06-06T23:15:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48553][PYTHON][CONNECT] Cache more properties\n\n### What changes were proposed in this pull request?\nCache more properties:\n\n- Dataframe.isEmpty\n- Dataframe.isLocal\n- Dataframe.inputFiles\n- Dataframe.semanticHash\n- Dataframe.explain\n- SparkSession.version\n\n### Why are the changes needed?\nto avoid unnecessary RPCs\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46896 from zhengruifeng/df_cache_more.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48553\">SPARK-48553</a>][PYTHON][CONNECT] Cache more properties"}},{"before":"d3a324d63f82ffc4a4818bb1bfe7485d12f1dada","after":"a00c11546273089dbfa993fa4c170eb70beecbc3","ref":"refs/heads/branch-3.5","pushedAt":"2024-06-06T20:10:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48286] Fix analysis of column with exists default expression - Add user facing error\n\nFIRST CHANGE\n\nPass correct parameter list to `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` when it is invoked from `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`.\n\n`org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` method accepts 3 parameter\n\n1) Field to analyze\n2) Statement type - String\n3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT\n\nMethod `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`\npass `fieldToAnalyze` and `EXISTS_DEFAULT` as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed.\n\nPull requests where original change was introduced\nhttps://github.com/apache/spark/pull/40049 - Initial commit\nhttps://github.com/apache/spark/pull/44876 - Refactor that did not touch the issue\nhttps://github.com/apache/spark/pull/44935 - Another refactor that did not touch the issue\n\nSECOND CHANGE\nAdd user facing exception when default value is not foldable or resolved. Otherwise, user would see message \"You hit a bug in Spark ...\".\nIt is needed to pass correct value to `Column` object\n\nYes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column.\n\nUnit test\n\nNo\n\nCloses #46594 from urosstan-db/SPARK-48286-Analyze-exists-default-expression-instead-of-current-default-expression.\n\nLead-authored-by: Uros Stankovic <uros.stankovic@databricks.com>\nCo-authored-by: Uros Stankovic <155642965+urosstan-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>\n(cherry picked from commit 0f21df0b29cc18f0e0c7b12543f3a037e4032e65)\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48286\">SPARK-48286</a>] Fix analysis of column with exists default expression -…"}},{"before":"84fa0527834b947ad12e4a6398512c75929cc99b","after":"0f21df0b29cc18f0e0c7b12543f3a037e4032e65","ref":"refs/heads/master","pushedAt":"2024-06-06T20:08:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48286] Fix analysis of column with exists default expression - Add user facing error\n\n### What changes were proposed in this pull request?\n\nFIRST CHANGE\n\nPass correct parameter list to `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` when it is invoked from `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`.\n\n`org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` method accepts 3 parameter\n\n1) Field to analyze\n2) Statement type - String\n3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT\n\nMethod `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`\npass `fieldToAnalyze` and `EXISTS_DEFAULT` as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed.\n\nPull requests where original change was introduced\nhttps://github.com/apache/spark/pull/40049 - Initial commit\nhttps://github.com/apache/spark/pull/44876 - Refactor that did not touch the issue\nhttps://github.com/apache/spark/pull/44935 - Another refactor that did not touch the issue\n\nSECOND CHANGE\nAdd user facing exception when default value is not foldable or resolved. Otherwise, user would see message \"You hit a bug in Spark ...\".\n### Why are the changes needed?\nIt is needed to pass correct value to `Column` object\n\n### Does this PR introduce _any_ user-facing change?\nYes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column.\n\n### How was this patch tested?\nUnit test\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46594 from urosstan-db/SPARK-48286-Analyze-exists-default-expression-instead-of-current-default-expression.\n\nLead-authored-by: Uros Stankovic <uros.stankovic@databricks.com>\nCo-authored-by: Uros Stankovic <155642965+urosstan-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48286\">SPARK-48286</a>] Fix analysis of column with exists default expression -…"}},{"before":"b5a4b32003624261c4a9175b2f5eed748f948cbf","after":"84fa0527834b947ad12e4a6398512c75929cc99b","ref":"refs/heads/master","pushedAt":"2024-06-06T19:05:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE\n\n### What changes were proposed in this pull request?\nString comparison and hashing in UTF8_BINARY_LCASE is now context-unaware, and uses ICU root locale rules to convert string to lowercase at code point level, taking into consideration special cases for one-to-many case mapping. For example: comparing \"ΘΑΛΑΣΣΙΝΟΣ\" and \"θαλασσινοσ\" under UTF8_BINARY_LCASE now returns true, because Greek final sigma is special-cased in the new comparison implementation.\n\n### Why are the changes needed?\n1. UTF8_BINARY_LCASE should use ICU root locale rules (instead of JVM)\n2. comparing strings under UTF8_BINARY_LCASE should be context-insensitive\n\n### Does this PR introduce _any_ user-facing change?\nYes, comparing strings under UTF8_BINARY_LCASE will now give different results in two kinds of special cases (Turkish dotted letter \"i\" and Greek final letter \"sigma\").\n\n### How was this patch tested?\nUnit tests in `CollationSupportSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46700 from uros-db/lcase-casing.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48283\">SPARK-48283</a>][SQL] Modify string comparison for UTF8_BINARY_LCASE"}},{"before":"3878b57e6e88631826c1c8690eb9052e5efa5aa1","after":"b5a4b32003624261c4a9175b2f5eed748f948cbf","ref":"refs/heads/master","pushedAt":"2024-06-06T19:02:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48435][SQL] UNICODE collation should not support binary equality\n\n### What changes were proposed in this pull request?\nCollationFactory has been updated to no longer mark UNICODE as collation that supportsBinaryCollation. To reflect these changes, various tests have been updated.\n\nHowever, some tests have been (temporarily) removed because StringTrim no longer supports UNICODE collation given the new UNICODE definition in CollationFactory. At this time, StringTrim expression only supports UTF8_BINARY & UTF8_BINARY_LCASE, but not ICU collations. This work is in progress (https://github.com/apache/spark/pull/46762), so we'll ensure appropriate test coverage with those changes.\n\n### Why are the changes needed?\nUNICODE collation should not support binary collation. Note: in the future, we may want to consider a collation such as UNICODE_BINARY, which will support binary equality, but also maintain UNICODE ordering.\n\n### Does this PR introduce _any_ user-facing change?\nYes, UNICODE is no longer treated as a binary collation. This affects how equality works for UNICODE, and also which codepath is taken for various collation-aware string expression given UNICODE collated string arguments.\n\n### How was this patch tested?\nUpdated existing unit and e2e sql test for UNICODE collation.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46772 from uros-db/fix-unicode.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48435\">SPARK-48435</a>][SQL] UNICODE collation should not support binary equality"}},{"before":"9f4007f3d89eb442df63a3b8bd9fd510bf7e2edd","after":"3878b57e6e88631826c1c8690eb9052e5efa5aa1","ref":"refs/heads/master","pushedAt":"2024-06-06T18:19:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48526][SS] Allow passing custom sink to testStream()\n\n### What changes were proposed in this pull request?\nUpdate `StreamTest:testStream()` to allow passing a custom sink. This allows writing better tests covering streaming sinks, in particular:\n- reusing a sink across calls to testStream.\n- passing a custom sink implementation.\n\n### Why are the changes needed?\nBetter testing infrastructure.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46866 from johanl-db/allow-passing-custom-sink-stream-test.\n\nAuthored-by: Johan Lasperas <johan.lasperas@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48526\">SPARK-48526</a>][SS] Allow passing custom sink to testStream()"}},{"before":"7cba1ab4d6acef4e9d73a8e6018b0902aac3a18d","after":"9f4007f3d89eb442df63a3b8bd9fd510bf7e2edd","ref":"refs/heads/master","pushedAt":"2024-06-06T18:18:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48546][SQL] Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression\n\n### What changes were proposed in this pull request?\n\nIn https://github.com/apache/spark/pull/46793, we replaced NullPointerExceptions with proper error classes in AssertNotNull expression. However, that PR forgot to update the `ExpressionEncoder` to check for these new error classes. This PR fixes it to make sure we use the new error classes in all cases.\n\n### Why are the changes needed?\n\nSee above\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, see above\n\n### How was this patch tested?\n\nThis PR updates tests with the new error classes\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46888 from dtenedor/fix-expr-encoder.\n\nAuthored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48546\">SPARK-48546</a>][SQL] Fix ExpressionEncoder after replacing NullPointerE…"}},{"before":"65db87697949ec247bcd38a38369c31bc2cdf3f1","after":"7cba1ab4d6acef4e9d73a8e6018b0902aac3a18d","ref":"refs/heads/master","pushedAt":"2024-06-06T16:42:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHub Action Window job\n\n### What changes were proposed in this pull request?\nThis PR aims to use R 4.4.0 in `windows` R GitHub Action job.\n\n### Why are the changes needed?\nR 4.4.0 is the latest release which is released on 2024-04-24.\nhttps://www.r-project.org/\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46897 from panbingkun/SPARK-48554.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Dongjoon Hyun <dhyun@apple.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48554\">SPARK-48554</a>][INFRA] Use R 4.4.0 in <code>windows</code> R GitHub Action Window job"}},{"before":"ab00533221e27bf5b1082baca33a3a64426dee8a","after":"65db87697949ec247bcd38a38369c31bc2cdf3f1","ref":"refs/heads/master","pushedAt":"2024-06-06T12:52:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring\n\n### What changes were proposed in this pull request?\nAdd error class for state schema compatibility and minor refactoring\n\n### Why are the changes needed?\nAdd error class for state schema compatibility and minor refactoring so that these errors can be tracked using the NERF framework\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nAdded new unit tests\n\n```\n[info] Run completed in 8 seconds, 250 milliseconds.\n[info] Total number of tests run: 29\n[info] Suites: completed 1, aborted 0\n[info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0\n[info] All tests passed.\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46856 from anishshri-db/task/SPARK-48513.\n\nAuthored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>\nSigned-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48513\">SPARK-48513</a>][SS] Add error class for state schema compatibility and …"}},{"before":"8cb78a7811f337309a72ff0ef29eefbb31d8bdc4","after":"ab00533221e27bf5b1082baca33a3a64426dee8a","ref":"refs/heads/master","pushedAt":"2024-06-06T11:51:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-47933][PYTHON][TESTS][FOLLOW-UP] Enable doctest `pyspark.sql.connect.column`\n\n### What changes were proposed in this pull request?\nEnable doctest `pyspark.sql.connect.column`\n\n### Why are the changes needed?\ntest coverage\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nmanually check:\n\nI manually broke some doctests in `Column`, then found `pyspark.sql.connect.column` didn't fail:\n```\n(spark_dev_312) ➜  spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.classic.column'\nRunning PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log\nWill test against the following Python executables: ['python3']\nWill test the following Python tests: ['pyspark.sql.classic.column']\npython3 python_implementation is CPython\npython3 version is: Python 3.12.2\nStarting test(python3): pyspark.sql.classic.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/4bdd14b8-92ba-43ba-a7fb-655e6769aeb9/python3__pyspark.sql.classic.column__i2_c1zct.log)\nWARNING: Using incubator modules: jdk.incubator.vector\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n**********************************************************************\nFile \"/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py\", line 385, in pyspark.sql.column.Column.contains\nFailed example:\n    df.filter(df.name.contains('o')).collect()\nDifferences (ndiff with -expected +actual):\n    - [Row(age=5, name='Bobx')]\n    ?                      -\n    + [Row(age=5, name='Bob')]\n**********************************************************************\n   1 of   2 in pyspark.sql.column.Column.contains\n***Test Failed*** 1 failures.\n\nHad test failures in pyspark.sql.classic.column with python3; see logs.\n(spark_dev_312) ➜  spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.connect.column'\nRunning PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log\nWill test against the following Python executables: ['python3']\nWill test the following Python tests: ['pyspark.sql.connect.column']\npython3 python_implementation is CPython\npython3 version is: Python 3.12.2\nStarting test(python3): pyspark.sql.connect.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/2acaff3c-ef1d-41eb-b63e-509f3e0192c0/python3__pyspark.sql.connect.column__66td62h9.log)\nFinished test(python3): pyspark.sql.connect.column (3s)\nTests passed in 3 seconds\n```\n\nafter this PR, it fails as expected:\n```\n(spark_dev_312) ➜  spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.connect.column'\nRunning PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log\nWill test against the following Python executables: ['python3']\nWill test the following Python tests: ['pyspark.sql.connect.column']\npython3 python_implementation is CPython\npython3 version is: Python 3.12.2\nStarting test(python3): pyspark.sql.connect.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/390ff7ae-7683-425c-b0d2-ee336e1ad452/python3__pyspark.sql.connect.column__f69b3smc.log)\nWARNING: Using incubator modules: jdk.incubator.vector\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\norg.apache.spark.SparkSQLException: [INVALID_CURSOR.DISCONNECTED] The cursor is invalid. The cursor has been disconnected by the server. SQLSTATE: HY109\n\tat org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender.execute(ExecuteGrpcResponseSender.scala:281)\n\tat org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender$$anon$1.run(ExecuteGrpcResponseSender.scala:101)\n**********************************************************************\nFile \"/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py\", line 385, in pyspark.sql.column.Column.contains\nFailed example:\n    df.filter(df.name.contains('o')).collect()\nExpected:\n    [Row(age=5, name='Bobx')]\nGot:\n    [Row(age=5, name='Bob')]\n**********************************************************************\n   1 of   2 in pyspark.sql.column.Column.contains\n***Test Failed*** 1 failures.\n\nHad test failures in pyspark.sql.connect.column with python3; see logs.\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46895 from zhengruifeng/fix_connect_column_doc_test.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47933\">SPARK-47933</a>][PYTHON][TESTS][FOLLOW-UP] Enable doctest `pyspark.sql.c…"}},{"before":"f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd","after":"8cb78a7811f337309a72ff0ef29eefbb31d8bdc4","ref":"refs/heads/master","pushedAt":"2024-06-06T11:16:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48550][PS] Directly use the parent Window class\n\n### What changes were proposed in this pull request?\nDirectly use the parent Window class\n\n### Why are the changes needed?\nthe `get_window_class` method is no longer needed\n\n### Does this PR introduce _any_ user-facing change?\nno, refactor only\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46892 from zhengruifeng/del_get_win_class.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48550\">SPARK-48550</a>][PS] Directly use the parent Window class"}},{"before":"b3700ac09861cf436bb5c5424d55ce70288dd921","after":"f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd","ref":"refs/heads/master","pushedAt":"2024-06-06T06:35:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48540][CORE] Avoid ivy output loading settings to stdout\n\n### What changes were proposed in this pull request?\nThis PR aims to avoid ivy output loading settings to stdout.\n\n### Why are the changes needed?\nNow `org.apache.spark.util.MavenUtils#getModuleDescriptor` will output the following string to stdout.\n\nThis is due to the modified code order in SPARK-32596 .\n\n```\n:: loading settings :: url = jar:file:xxxx/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n```\n\nStack trace\n```java\n\tat org.apache.ivy.core.settings.IvySettings.load(IvySettings.java:404)\n\tat org.apache.ivy.core.settings.IvySettings.loadDefault(IvySettings.java:443)\n\tat org.apache.ivy.Ivy.configureDefault(Ivy.java:435)\n\tat org.apache.ivy.core.IvyContext.getDefaultIvy(IvyContext.java:201)\n\tat org.apache.ivy.core.IvyContext.getIvy(IvyContext.java:180)\n\tat org.apache.ivy.core.IvyContext.getSettings(IvyContext.java:216)\n\tat org.apache.ivy.core.module.status.StatusManager.getCurrent(StatusManager.java:40)\n\tat org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.<init>(DefaultModuleDescriptor.java:206)\n\tat org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:107)\n\tat org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:66)\n\tat org.apache.spark.deploy.SparkSubmitUtils$.getModuleDescriptor(SparkSubmit.scala:1413)\n\tat org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1460)\n\tat org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)\n\tat org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:327)\n\tat org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:942)\n\tat org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:181)\n```\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nlocal test\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46882 from cxzl25/SPARK-48540.\n\nAuthored-by: sychen <sychen@ctrip.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48540\">SPARK-48540</a>][CORE] Avoid ivy output loading settings to stdout"}},{"before":"966c3d9ef1edc8b2f7d53b8a592ff4e2a2f9b80b","after":"b3700ac09861cf436bb5c5424d55ce70288dd921","ref":"refs/heads/master","pushedAt":"2024-06-06T06:16:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6\n\n### What changes were proposed in this pull request?\n\nUpgrades docker-java to 3.3.6\n\n### Why are the changes needed?\n\nThere are some bug fixes and enhancements:\nhttps://github.com/docker-java/docker-java/releases/tag/3.3.6\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPassed GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46881 from wayneguow/docker_upgrade.\n\nAuthored-by: Wei Guo <guow93@gmail.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48539\">SPARK-48539</a>][BUILD][TESTS] Upgrade docker-java to 3.3.6"}},{"before":"31ce2db6d20828844d0acab464346d7e3a4206e8","after":"966c3d9ef1edc8b2f7d53b8a592ff4e2a2f9b80b","ref":"refs/heads/master","pushedAt":"2024-06-06T03:49:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric\n\n### What changes were proposed in this pull request?\n\nThis is a followup of https://github.com/apache/spark/pull/45710 . Some custom `FileSystem` implementations read the `hadoop.fs.s3a.connection.establish.timeout` config as numeric, and do not support the `30s` syntax. To make it safe, this PR proposes to set this conf to `30000` instead of `30s`. I checked the doc page and this config is milliseconds.\n\n### Why are the changes needed?\n\nmore compatible with custom `FileSystem` implementations.\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nmanual\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46874 from cloud-fan/follow.\n\nAuthored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47552\">SPARK-47552</a>][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.esta…"}},{"before":"d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32","after":"31ce2db6d20828844d0acab464346d7e3a4206e8","ref":"refs/heads/master","pushedAt":"2024-06-06T02:22:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp\n\n### What changes were proposed in this pull request?\n\nAs described in [HIVE-15551](https://issues.apache.org/jira/browse/HIVE-15551), HMS will memory leak when directsql is enabled for MySQL metastore DB.\n\nAlthough HIVE-15551 has been resolved already, the bug can still occur on our side as we have multiple hive version supported.\n\nConsidering bonecp has been removed from hive since 4.0.0 and HikariCP is not supported by all hive versions we support, we replace bonecp with `DBCP` to avoid memory leak\n\n### Why are the changes needed?\n\nfix memory leak of HMS\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nRun `org.apache.spark.sql.hive.execution.SQLQuerySuite` and pass without linkage errors\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46879 from yaooqinn/SPARK-48538.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48538\">SPARK-48538</a>][SQL] Avoid HMS memory leak casued by bonecp"}},{"before":"490a4b3b1fdf47991b5a6588df14e63c3dd8b211","after":"d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32","ref":"refs/heads/master","pushedAt":"2024-06-05T21:38:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations\n\n### What changes were proposed in this pull request?\n\nThis is a followup of https://github.com/apache/spark/pull/46617 .  Subquery expression has a bunch of correlation checks which need to match certain plan shapes. We broke this by leaving `WithCTE` in the plan for un-referenced CTE relations. This PR fixes the issue by skipping CTE plan nodes in correlated subquery expression checks.\n\n### Why are the changes needed?\n\nbug fix\n### Does this PR introduce _any_ user-facing change?\n\nno bug is not released yet.\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46869 from cloud-fan/check.\n\nAuthored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48307\">SPARK-48307</a>][SQL][FOLLOWUP] Allow outer references in un-referenced …"}},{"before":"34ac7de897115caada7330aed32f03aca4796299","after":"490a4b3b1fdf47991b5a6588df14e63c3dd8b211","ref":"refs/heads/master","pushedAt":"2024-06-05T20:01:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48498][SQL] Always do char padding in predicates\n\n### What changes were proposed in this pull request?\n\nFor some data sources, CHAR type padding is not applied on both the write and read sides (by disabling `spark.sql.readSideCharPadding`), as a different SQL flavor, which is similar to MySQL: https://dev.mysql.com/doc/refman/8.0/en/char.html\n\nHowever, there is a bug in Spark that we always pad the string literal when comparing CHAR type and STRING literals, which assumes the CHAR type columns are always padded, either on the write side or read side. This is not always true.\n\nThis PR makes Spark always pad the CHAR type columns when comparing with string literals, to satisfy the CHAR type semantic.\n\n### Why are the changes needed?\n\nbug fix if people disable read side char padding\n\n### Does this PR introduce _any_ user-facing change?\n\nYes. After this PR, comparing CHAR type with STRING literals follows the CHAR semantic, while before it mostly returns false.\n\n### How was this patch tested?\n\nnew tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46832 from cloud-fan/char.\n\nAuthored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48498\">SPARK-48498</a>][SQL] Always do char padding in predicates"}},{"before":"88b8dc29e100a51501701ffdffbcd0eff1f97c98","after":"34ac7de897115caada7330aed32f03aca4796299","ref":"refs/heads/master","pushedAt":"2024-06-05T12:42:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow\n\n### What changes were proposed in this pull request?\nCache user specified schema in applyInPandas and applyInArrow\n\n### Why are the changes needed?\nto avoid extra RPCs\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46877 from zhengruifeng/cache_schema_apply_in_x.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48536\">SPARK-48536</a>][PYTHON][CONNECT] Cache user specified schema in applyIn…"}},{"before":"c4f720dfb41919dade7002b49462b3ff6b91eb22","after":"88b8dc29e100a51501701ffdffbcd0eff1f97c98","ref":"refs/heads/master","pushedAt":"2024-06-05T09:41:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement\n\n### What changes were proposed in this pull request?\n\nA followup of https://github.com/apache/spark/pull/44976 . `ConcurrentHashMap#put` has a different semantic than the scala map, and it returns null if the key is new. We should update the checking code accordingly.\n\n### Why are the changes needed?\n\navoid wrong warning messages\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nmanual\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46876 from cloud-fan/log.\n\nAuthored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-46937\">SPARK-46937</a>][SQL][FOLLOWUP] Properly check registered function repla…"}},{"before":"7f99f2cbd7d2d637f15b8444aebae3f9630ed3ab","after":"d3a324d63f82ffc4a4818bb1bfe7485d12f1dada","ref":"refs/heads/branch-3.5","pushedAt":"2024-06-05T08:35:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### What changes were proposed in this pull request?\nUpdate config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### Why are the changes needed?\nClarifying the implications of turning off this config after a certain Spark version\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A - config doc only change\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46875 from anishshri-db/task/SPARK-48535.\n\nAuthored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>\nSigned-off-by: Kent Yao <yao@apache.org>\n(cherry picked from commit c4f720dfb41919dade7002b49462b3ff6b91eb22)\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48535\">SPARK-48535</a>][SS] Update config docs to indicate possibility of data …"}},{"before":"db527ac346f2f6f6dbddefe292a24848d1120172","after":"c4f720dfb41919dade7002b49462b3ff6b91eb22","ref":"refs/heads/master","pushedAt":"2024-06-05T08:34:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### What changes were proposed in this pull request?\nUpdate config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### Why are the changes needed?\nClarifying the implications of turning off this config after a certain Spark version\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A - config doc only change\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46875 from anishshri-db/task/SPARK-48535.\n\nAuthored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48535\">SPARK-48535</a>][SS] Update config docs to indicate possibility of data …"}},{"before":"adbfd17318bf50b34d03f62ccd04219b18a41103","after":"db527ac346f2f6f6dbddefe292a24848d1120172","ref":"refs/heads/master","pushedAt":"2024-06-05T05:20:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`\"\n\nThis reverts commit abbe301d7645217f22641cf3a5c41502680e65be.","shortMessageHtmlLink":"Revert \"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48505\">SPARK-48505</a>][CORE] Simplify the implementation of `Utils#isG…"}},{"before":"4075ce6771206ac8957029566c8d4196bcb8a87b","after":"adbfd17318bf50b34d03f62ccd04219b18a41103","ref":"refs/heads/master","pushedAt":"2024-06-05T05:11:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema\n\n### What changes were proposed in this pull request?\nAdd test for cached schema, to make Spark Classic's mapInXXX also works within `SparkConnectSQLTestCase`, also add a new `contextmanager` for `os.environ`\n\n### Why are the changes needed?\ntest coverage\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46871 from zhengruifeng/test_cached_schema.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48533\">SPARK-48533</a>][CONNECT][PYTHON][TESTS] Add test for cached schema"}},{"before":"a17ab572cfdaefdb4a988908aa923c33f3ed58e1","after":"4075ce6771206ac8957029566c8d4196bcb8a87b","ref":"refs/heads/master","pushedAt":"2024-06-05T03:12:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to explicitly set ANSI mode in `test_toArrow_error` test.\n\n### Why are the changes needed?\n\nTo make non-ANSI build passing https://github.com/apache/spark/actions/runs/9342888897/job/25711689943:\n\n```\n\n======================================================================\nFAIL [0.180s]: test_toArrow_error (pyspark.sql.tests.connect.test_parity_arrow.ArrowParityTests.test_toArrow_error)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"/__w/spark/spark/python/pyspark/sql/tests/test_arrow.py\", line 1207, in test_toArrow_error\n    with self.assertRaises(ArithmeticException):\nAssertionError: ArithmeticException not raised\n\n----------------------------------------------------------------------\nRan 88 tests in 17.797s\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nManually.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46872 from HyukjinKwon/SPARK-48374-followup.\n\nAuthored-by: Hyukjin Kwon <gurwls223@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48374\">SPARK-48374</a>][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode f…"}},{"before":"33aa467f75824ed8460d514ca1e37f559d3cc405","after":"a17ab572cfdaefdb4a988908aa923c33f3ed58e1","ref":"refs/heads/master","pushedAt":"2024-06-05T00:07:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][DOCS] Fix a typo in core-migration-guide.md\n\n### What changes were proposed in this pull request?\n\n Fix a typo in core-migration-guide.md:\n\n- agressively -> aggressively\n\n### Why are the changes needed?\n\nFix mistakes.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPassed GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46864 from wayneguow/typo.\n\nAuthored-by: Wei Guo <guow93@gmail.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[MINOR][DOCS] Fix a typo in core-migration-guide.md"}},{"before":"e47ce476b9ac962d24fabfbe1b344d074403d45b","after":"33aa467f75824ed8460d514ca1e37f559d3cc405","ref":"refs/heads/master","pushedAt":"2024-06-04T23:55:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48523][DOCS] Add `grpc_max_message_size ` description to `client-connection-string.md`\n\n### What changes were proposed in this pull request?\nThe pr aims to\n- add `grpc_max_message_size` description to `client-connection-string.md`\n- rename `hostname` to `host`.\n- fix some typo.\n\n### Why are the changes needed?\n- In PR https://github.com/apache/spark/pull/45842, we extract a `constant` as a `parameter` for the connect client, and we need to update the related doc.\n- Make the parameter names in our doc consistent with those in the code,\n  In the doc, it is called `hostname`, but in the code, it is called `host`\nhttps://github.com/apache/spark/blob/d273fdf37bc291aadf8677305bda2a91b593219f/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L36\n\n### Does this PR introduce _any_ user-facing change?\nYes, only for doc `client-connection-string.md`.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46862 from panbingkun/SPARK-48523.\n\nAuthored-by: panbingkun <panbingkun@baidu.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48523\">SPARK-48523</a>][DOCS] Add <code>grpc_max_message_size </code> description to `clie…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEXrepjwA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}