Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Hive incompatibility when using microsoft-spark-3-1_2.12-2.1.1.jar #1146

Open
dcosta91 opened this issue Mar 3, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@dcosta91
Copy link

dcosta91 commented Mar 3, 2023

Describe the bug
We are trying to upgrade some 2.4 spark jobs to 3.1 and everything goes well until it tries to write into a partition in Hive.
It complains it cannot find the get_partition_names_ps_req method.

Just as a note, if I use microsoft-spark-2-4_2.11-2.1.1.jar, it works. So it must be some incompatibility between microsoft-spark-3-1_2.12-2.1.1.jar and the cluster I'm using.

Some questions:

  • Where can I find the source code for the microsoft-spark-3-1_2.12-2.1.1.jar?
  • Which version of hive is this jar using?
  • Does anybody has ideas on how to put this working?

Utilized versions:

  • microsoft-spark-3-1_2.12-2.1.1.jar
  • Cluster versions:
    • CDH-7.1.7
    • Spark: 3.1.1
    • Hive: 3.1.3

Error message:

[2023-03-03T09:18:44.7972462Z] [ldchio1091] [Error] [JvmBridge] org.apache.spark.sql.AnalysisException: org.apache.thrift.TApplicationException: Invalid method name: 'get_partition_names_ps_req' at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadDynamicPartitions$1(HiveClientImpl.scala:969) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:316) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:245) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:244) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:298) at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:956) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadDynamicPartitions$1(HiveExternalCatalog.scala:945) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:925) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:221) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:103) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:133) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:132) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:997) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:997) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:542) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:496) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:165) at org.apache.spark.api.dotnet.DotnetBackendHandler.$anonfun$handleBackendRequest$2(DotnetBackendHandler.scala:105) at org.apache.spark.api.dotnet.ThreadPool$$anon$1.run(ThreadPool.scala:34) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

@dcosta91 dcosta91 added the bug Something isn't working label Mar 3, 2023
@dcosta91
Copy link
Author

dcosta91 commented Mar 6, 2023

I found out now that the method get_partition_names_ps_req does not exist in Hive 3.1.3 and was only added on Hive 4.0.0 (which is not released yet...)

Therefore my question is, why is Microsoft Worker not compatible with Hive 3.1.3?
If I run Java/Scala on the same spark version, it works with hive. What am I missing here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant