Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44838][SQL][FOLLOW-UP] Fix the test for raise_error by using default type for strings #46649

Closed
wants to merge 4 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 18, 2024

What changes were proposed in this pull request?

This PR is a followup of #46461 that fixes the CI failure when ANSI is off:

[info] - Support RaiseError misc expression with collation *** FAILED *** (21 milliseconds)
[info]   Expected exception org.apache.spark.SparkRuntimeException to be thrown, but org.apache.spark.sql.catalyst.ExtendedAnalysisException was thrown (CollationSQLExpressionsSuite.scala:991)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Assertions.intercept(Assertions.scala:756)
[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
[info]   at org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1564)
[info]   at org.apache.spark.sql.CollationSQLExpressionsSuite.$anonfun$new$124(CollationSQLExpressionsSuite.scala:991)
[info]   at org.apache.spark.sql.catalyst.SQLConfHelper.withSQLConf(SQLConfHelper.scala:56)
[info]   at org.apache.spark.sql.catalyst.SQLConfHelper.withSQLConf$(SQLConfHelper.scala:38)
[info]   at org.apache.spark.sql.CollationSQLExpressionsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(CollationSQLExpressionsSuite.scala:30)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:248)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:246)
[info]   at org.apache.spark.sql.CollationSQLExpressionsSuite.withSQLConf(CollationSQLExpressionsSuite.scala:30)
[info]   at org.apache.spark.sql.CollationSQLExpressionsSuite.$anonfun$new$123(CollationSQLExpressionsSuite.scala:988)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.CollationSQLExpressionsSuite.$anonfun$new$122(CollationSQLExpressionsSuite.scala:987)

Why are the changes needed?

CI is broken https://github.com/apache/spark/actions/runs/9136253329

Does this PR introduce any user-facing change?

No, the main change has not been released out.

How was this patch tested?

Manually ran the test with ANSI disabled.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 18, 2024
@HyukjinKwon
Copy link
Member Author

cc @cloud-fan and @uros-db

@HyukjinKwon
Copy link
Member Author

For a bit of more context, the test fails as below:

org.apache.spark.sql.AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "raise_error(USER_RAISED_EXCEPTION, map(errorMessage, 'aa' collate UTF8_BINARY_LCASE))" due to data type mismatch: The second parameter requires the "MAP<STRING, STRING>" type, however "map(errorMessage, 'aa' collate UTF8_BINARY_LCASE)" has the type "MAP<STRING, STRING COLLATE UTF8_BINARY_LCASE>". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(raise_error(cast(USER_RAISED_EXCEPTION as string collate UTF8_BINARY_LCASE), map(errorMessage, aa), NullType))]
+- OneRowRelation

	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7$adapted(CheckAnalysis.scala:302)
	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
	at scala.collection.immutable.Vector.foreach(Vector.scala:2124)
	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6(CheckAnalysis.scala:302)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6$adapted(CheckAnalysis.scala:302)
	at scala.collection.immutable.List.foreach(List.scala:334)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:302)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:216)
	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:216)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:198)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:192)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:190)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:161)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:192)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:393)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:212)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:92)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:225)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:599)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:225)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:923)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:89)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:73)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$3(Dataset.scala:118)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:923)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:115)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:660)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:923)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:681)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:840)

because it fails to add a cast from MapType(StringType, collated StringType) to MapType(StringType, StringType). For ANSI, the cast is added here: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala#L206-L211

cc @gengliangwang too

@uros-db
Copy link
Contributor

uros-db commented May 18, 2024

It was my understanding that this wouldn't be a problem, since this second parameter (MapType) is only used internally in Spark to raise errors

@uros-db
Copy link
Contributor

uros-db commented May 18, 2024

@HyukjinKwon I believe you could just use: AbstractMapType(StringTypeAnyCollation, StringTypeAnyCollation) for inputTypes in RaiseError (misc.scala) instead of MapType(StringType, StringType)

this would perhaps be a more uniform approach - it's what we usually do for various expressions (we offer support for AbstractMapType as well as AbstractArrayType)

@HyukjinKwon
Copy link
Member Author

Sure, that sounds like more localized fix

@HyukjinKwon
Copy link
Member Author

Seems like that still doesn't work with ANSI disabled:

org.apache.spark.sql.AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "raise_error(USER_RAISED_EXCEPTION, map(errorMessage, 1))" due to data type mismatch: The second parameter requires the "MAP<STRING, STRING>" type, however "map(errorMessage, 1)" has the type "MAP<STRING, INT>". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(raise_error(USER_RAISED_EXCEPTION, map(errorMessage, 1), NullType))]
+- OneRowRelation

@HyukjinKwon
Copy link
Member Author

im gonna switch back to just fix ANSI disabeld type coercion rule but I think we should revisit those whole thing ..

@@ -1048,6 +1048,9 @@ object TypeCoercion extends TypeCoercionBase {
}
}

// Allows the cast between different collated strings to match with ANSI behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change.

@uros-db I looked at the code base, we have implicit cast rules for AbstractArrayType, but not AbstractMapType, seems like a miss?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractMapType(StringTypeAnyCollation, StringTypeAnyCollation) does not work with non ANSI type coercion here. We should fix it here in any event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uros-db please open a separate PR - I can close mine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it

@dongjoon-hyun
Copy link
Member

Gentle ping, @uros-db . The CI is still broken.

@uros-db
Copy link
Contributor

uros-db commented May 20, 2024

fix should be ready #46661
please review @cloud-fan @HyukjinKwon @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants