Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] synapse.ml.cognitive Detect transform error #2166

Open
3 of 19 tasks
GISJohnSourcemap opened this issue Feb 6, 2024 · 0 comments
Open
3 of 19 tasks

[BUG] synapse.ml.cognitive Detect transform error #2166

GISJohnSourcemap opened this issue Feb 6, 2024 · 0 comments

Comments

@GISJohnSourcemap
Copy link

SynapseML version

synapseml_2.12:0.10.0

System information

  • Scala 2.12
  • Spark 3.5.0
  • Databricks

Describe the problem

When using synapse.ml.cognitive import Detect for detecting languages in a field in a pyspark dataframe using the Azure AI Services Language service .transform throws a Py4JJavaError on the transform execution

Code to reproduce issue

from synapse.ml.cognitive import Detect
from pyspark.sql.functions import col, flatten

# Create a Dataframe to Detect Language & Translate against
df_sentences = spark.createDataFrame([
  ("ヒョンデ", "ja")
], ["text", "expected_lang"])

detect = (Detect()
    .setSubscriptionKey(cognitive_services_key)
    .setLocation(cognitive_services_region)
    .setTextCol("text")
    .setOutputCol("result"))

display(detect
    .transform(df_sentences)
    .withColumn("language", col("result.language"))
    .select("language"))

Other info / logs

Py4JJavaError: An error occurred while calling o435.transform.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc$lzycompute(SparkBindings.scala:17)
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc(SparkBindings.scala:17)
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.makeFromRowConverter(SparkBindings.scala:26)
	at com.microsoft.azure.synapse.ml.io.http.ErrorUtils$.addErrorUDF(SimpleHTTPTransformer.scala:57)
	at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.$anonfun$makePipeline$1(SimpleHTTPTransformer.scala:135)
	at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
	at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8103)
	at com.microsoft.azure.synapse.ml.stages.Lambda.$anonfun$transform$1(Lambda.scala:55)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logVerb(BasicLogging.scala:62)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logVerb$(BasicLogging.scala:59)
	at com.microsoft.azure.synapse.ml.stages.Lambda.logVerb(Lambda.scala:24)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logTransform(BasicLogging.scala:52)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logTransform$(BasicLogging.scala:51)
	at com.microsoft.azure.synapse.ml.stages.Lambda.logTransform(Lambda.scala:24)
	at com.microsoft.azure.synapse.ml.stages.Lambda.transform(Lambda.scala:55)
	at com.microsoft.azure.synapse.ml.stages.Lambda.transformSchema(Lambda.scala:63)
	at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
	at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.transformSchema(SimpleHTTPTransformer.scala:169)
	at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:72)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:148)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:141)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:45)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:309)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:289)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:289)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:308)
	at com.microsoft.azure.synapse.ml.cognitive.CognitiveServicesBaseNoHandler.$anonfun$transform$1(CognitiveServiceBase.scala:358)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logVerb(BasicLogging.scala:62)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logVerb$(BasicLogging.scala:59)
	at com.microsoft.azure.synapse.ml.cognitive.CognitiveServicesBaseNoHandler.logVerb(CognitiveServiceBase.scala:306)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logTransform(BasicLogging.scala:52)
	at com.microsoft.azure.synapse.ml.logging.BasicLogging.logTransform$(BasicLogging.scala:51)
	at com.microsoft.azure.synapse.ml.cognitive.CognitiveServicesBaseNoHandler.logTransform(CognitiveServiceBase.scala:306)
	at com.microsoft.azure.synapse.ml.cognitive.CognitiveServicesBaseNoHandler.transform(CognitiveServiceBase.scala:358)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
@github-actions github-actions bot added the triage label Feb 6, 2024
@GISJohnSourcemap GISJohnSourcemap changed the title [BUG from synapse.ml.cognitive import Detect [BUG] synapse.ml.cognitive Detect transform error Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant