Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gluten 1.1.1 does not work on Spark 3.3.1 on EMR #5798

Open
astarrr opened this issue May 17, 2024 · 2 comments
Open

Gluten 1.1.1 does not work on Spark 3.3.1 on EMR #5798

astarrr opened this issue May 17, 2024 · 2 comments
Labels
bug Something isn't working triage

Comments

@astarrr
Copy link

astarrr commented May 17, 2024

Backend

VL (Velox)

Bug description

When running a simple query like df.count() the spark job stops working with NoSuchMethodError:org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.isSubquery

I traced the error and this is to be address to the implementation of AdaptiveSparkPlanExec in spark-sql_2.12-3.3.1-amzn-0.1.jar which does not provide the isSubquery() method/field (I decompiled the JAR).

The method is called at GlutenExplainUtils.scala:93.
Switching to Gluten 1.1.0, there is no callback to AdaptiveSparkPlanExec.isSubquery and Gluten does work also in EMR.

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

java.lang.NoSuchMethodError: org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.isSubquery()Z
  at org.apache.spark.sql.execution.GlutenExplainUtils$.$anonfun$collectFallbackNodes$1(GlutenExplainUtils.scala:93)
  at org.apache.spark.sql.execution.GlutenExplainUtils$.$anonfun$collectFallbackNodes$1$adapted(GlutenExplainUtils.scala:80)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:410)
  at org.apache.spark.sql.execution.GlutenExplainUtils$.collect$1(GlutenExplainUtils.scala:80)
  at org.apache.spark.sql.execution.GlutenExplainUtils$.collectFallbackNodes(GlutenExplainUtils.scala:112)
  at org.apache.spark.sql.execution.GlutenExplainUtils$.$anonfun$processPlan$8(GlutenExplainUtils.scala:223)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at scala.collection.TraversableLike.map(TraversableLike.scala:286)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
  at scala.collection.AbstractTraversable.map(Traversable.scala:108)
  at org.apache.spark.sql.execution.GlutenExplainUtils$.processPlan(GlutenExplainUtils.scala:220)
  at org.apache.spark.sql.execution.GlutenFallbackReporter.postFallbackReason(GlutenFallbackReporter.scala:108)
  at org.apache.spark.sql.execution.GlutenFallbackReporter.apply(GlutenFallbackReporter.scala:45)
  at org.apache.spark.sql.execution.GlutenFallbackReporter.apply(GlutenFallbackReporter.scala:36)
  at io.glutenproject.extension.ColumnarOverrideRules.$anonfun$transformPlan$3(ColumnarOverrides.scala:725)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at io.glutenproject.extension.ColumnarOverrideRules.$anonfun$transformPlan$1(ColumnarOverrides.scala:723)
  at io.glutenproject.metrics.GlutenTimeMetric$.withNanoTime(GlutenTimeMetric.scala:41)
  at io.glutenproject.metrics.GlutenTimeMetric$.withMillisTime(GlutenTimeMetric.scala:46)
  at io.glutenproject.extension.ColumnarOverrideRules.transformPlan(ColumnarOverrides.scala:733)
  at io.glutenproject.extension.ColumnarOverrideRules.$anonfun$postColumnarTransitions$2(ColumnarOverrides.scala:712)
  at io.glutenproject.utils.QueryPlanSelector.maybe(QueryPlanSelector.scala:74)
  at io.glutenproject.extension.ColumnarOverrideRules.io$glutenproject$extension$ColumnarOverrideRules$$$anonfun$postColumnarTransitions$1(ColumnarOverrides.scala:699)
@astarrr astarrr added bug Something isn't working triage labels May 17, 2024
@astarrr
Copy link
Author

astarrr commented May 17, 2024

I upload some screenshots.
Here with the Amazon jar there is no method isSubquery.
image

Here with the classic Spark:
image

@FelixYBW
Copy link
Contributor

Do you mean AWS EMR? I don't think Gluten can work with AWS EMR. Gluten relies on many Spark APIs and hacked several Spark files, AWS EMR have much optimizations to Spark which may change APIs and conflict with files we hacked, which needs to be solved by AWS EMR team. As far as I know, there is no such effort in AWS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants