You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a version of Synapse ML is used to load data into an index that has a custom analyzer or tokenizer (and possibly other custom objects but they haven't neem tested) it fails with the following error : -
Py4JJavaError: An error occurred while calling z:com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write.
: spray.json.DeserializationException: Expected String as JsString, but got {"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer","charFilters":[],"name":"keyword_analyzer","tokenFilters":["lowercase"],"tokenizer":"keyword_v2"}
This happens with all apiVersions set and seemingly any version greater than 0.11.0. It works correctly against the same index when run using a Spark 3.2 Azure Synapse cluster, which uses Synapse ML version 0.10.2
Code to reproduce issue
Create an index with a custom analyzer
This needs to be done through the API: -
POST https://{{service-name}}.search.windows.net/indexes?api-version={{api-version}}
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.2") \
.getOrCreate()
import synapse.ml
from synapse.ml.services import writeToAzureSearch
from pyspark.sql.functions import lit, col
df = spark.range(10) \
.withColumn("Id", col("id").cast("string")) \
.withColumn("action", lit("upload"))
x = writeToAzureSearch(df,
subscriptionKey=admin_key,
actionCol="action",
serviceName=search_service,
indexName=search_index,
keyCol="Id")
You can also run the same code (without the spark creation) on Azure Synapse 3.3 and get the same result. I imagine this will happen on Databricks, and Synapse 3.4 but haven't tested it.
Other info / logs
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
c:\Users\dbundred\Documents\Projects\OSM\OtherStuff\synapseMlTest.ipynb Cell 2 line 1
11 from pyspark.sql.functions import lit, col
14 df = spark.range(10) \
15 .withColumn("Id", col("id").cast("string")) \
16 .withColumn("action", lit("upload"))
---> 18 x = writeToAzureSearch(df,
19 subscriptionKey=admin_key,
20 actionCol="action",
21 serviceName=search_service,
22 indexName=search_index,
23 keyCol="Id")
File ~\AppData\Local\Temp\spark-c09947f2-255d-45b5-a241-6a7165bbac06\userFiles-b3991b65-4cea-4012-980d-108b02406dcc\com.microsoft.azure_synapseml-cognitive_2.12-1.0.2.jar\synapse\ml\services\search\AzureSearchWriter.py:28, in writeToAzureSearch(df, **options)
26 jvm = SparkContext.getOrCreate()._jvm
27 writer = jvm.com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter
---> 28 writer.write(df._jdf, options)
File c:\Users\dbundred\AppData\Local\Programs\Python\Python311\Lib\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File c:\Users\dbundred\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyspark\errors\exceptions\captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)
File c:\Users\dbundred\AppData\Local\Programs\Python\Python311\Lib\site-packages\py4j\protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling z:com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write.
: spray.json.DeserializationException: Expected String as JsString, but got {"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer","charFilters":[],"name":"keyword_analyzer","tokenFilters":["lowercase"],"tokenizer":"keyword_v2"}
at spray.json.package$.deserializationError(package.scala:23)
at spray.json.ProductFormats.fromField(ProductFormats.scala:63)
at spray.json.ProductFormats.fromField$(ProductFormats.scala:51)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchProtocol$.fromField(AzureSearchSchemas.scala:67)
at spray.json.ProductFormatsInstances$$anon$11.read(ProductFormatsInstances.scala:341)
at spray.json.ProductFormatsInstances$$anon$11.read(ProductFormatsInstances.scala:319)
at spray.json.JsValue.convertTo(JsValue.scala:33)
at com.microsoft.azure.synapse.ml.services.search.IndexParser.parseIndexJson(AzureSearchAPI.scala:25)
at com.microsoft.azure.synapse.ml.services.search.IndexParser.parseIndexJson$(AzureSearchAPI.scala:24)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.parseIndexJson(AzureSearch.scala:147)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.getVectorColConf(AzureSearch.scala:325)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.prepareDF(AzureSearch.scala:269)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:432)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:440)
at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write(AzureSearch.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
What component(s) does this bug affect?
area/cognitive: Cognitive project
area/core: Core project
area/deep-learning: DeepLearning project
area/lightgbm: Lightgbm project
area/opencv: Opencv project
area/vw: VW project
area/website: Website
area/build: Project build system
area/notebooks: Samples under notebooks folder
area/docker: Docker usage
area/models: models related issue
What language(s) does this bug affect?
language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages
What integration(s) does this bug affect?
integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations
The text was updated successfully, but these errors were encountered:
Hey @DBundred-cfc 👋!
Thank you so much for reporting the issue/feature request 🚨.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.
SynapseML version
0.11.0-1.0.2
System information
Describe the problem
When a version of Synapse ML is used to load data into an index that has a custom analyzer or tokenizer (and possibly other custom objects but they haven't neem tested) it fails with the following error : -
This happens with all apiVersions set and seemingly any version greater than 0.11.0. It works correctly against the same index when run using a Spark 3.2 Azure Synapse cluster, which uses Synapse ML version 0.10.2
Code to reproduce issue
Create an index with a custom analyzer
This needs to be done through the API: -
POST https://{{service-name}}.search.windows.net/indexes?api-version={{api-version}}
Try and load the index
Run the following pyspark on a spark 3.4
You can also run the same code (without the spark creation) on Azure Synapse 3.3 and get the same result. I imagine this will happen on Databricks, and Synapse 3.4 but haven't tested it.
Other info / logs
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: