[FEAT] Scala 2.13 support? #2132

kg005 · 2024-04-04T14:44:26Z

Is your proposal related to a problem?

I am getting following error:

24/04/04 14:26:47 WARN TaskSetManager: Lost task 4.0 in stage 538.0 (TID 7052) (10.132.0.177 executor 1): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (UDFRegistration$Lambda$4595/0x00007f30033f42d8: (string, string) => double).
	at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
	at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.hashAgg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:576)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.NoSuchMethodError: 'scala.collection.GenMap scala.collection.mutable.Map$.apply(scala.collection.Seq)'
	at uk.gov.moj.dash.linkage.LevDamerauDistance.call(Similarity.scala:265)
	at uk.gov.moj.dash.linkage.LevDamerauDistance.call(Similarity.scala:254)
	at org.apache.spark.sql.UDFRegistration.$anonfun$register$354(UDFRegistration.scala:767)
	... 18 more

With no prior knowledge of scala, after some exploration of:

https://github.com/moj-analytical-services/splink_scalaudfs
Environment I am using (spark 3.4.0, scala 2.13)
Differences between scala 2.13 and 2.12 version
I assume the error comes from the scala version mismatch. Splink jars being build with scala 2.12 and my current environment using 2.13.

Describe the solution you'd like

Building .jar files from https://github.com/moj-analytical-services/splink_scalaudfs for scala 2.13.

Describe alternatives you've considered

Changing my environment to use scala 2.12 but I am currently not in a position to be able to change the environment I am running the splink on.

The text was updated successfully, but these errors were encountered:

RobinL · 2024-04-17T07:51:46Z

Thanks for the request. We're pretty stretched at the moment so we're unlikely to be able to get round to this soon. If you're willing/able, feel free to do a PR, which would be gratefully accepted!

kg005 · 2024-04-18T07:47:34Z

Hi @RobinL, here is a PR for the changes needed to build the splink_scalaudfs for Scala 2.13. As I am new to Scala, I would be happy to have it reviewed so I can adjust it as needed.

RobinL · 2024-04-19T12:03:42Z

@kg005 Thank very much. Just to say we're taking a look at this. I'm also not a scala person myself, but the code looks ok to me at least.

One thing we need to be careful with is accepting an external PR that includes the jar, since we have no easy way of knowing whether it contains malicious code. (The diff looks ok, and the code you're wrote looks fine btw, so this is no reflection on you, just security policy!)

I'm going to try and get a colleague to build it on their machine. But if you happen to work for somewhere 'trusted' (e.g. uk gov, let me know and it'll make it a little easier - robinlinacre@hotmail.com!)

kg005 · 2024-04-19T12:44:42Z

Thanks for the heads up @RobinL. I understand the policies. Feel free to override the jar with a new version that you manage to build using your infrastructure.

kg005 added the enhancement New feature or request label Apr 4, 2024

kg005 mentioned this issue Apr 18, 2024

Build for scala 2.13 moj-analytical-services/splink_scalaudfs#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Scala 2.13 support? #2132

[FEAT] Scala 2.13 support? #2132

kg005 commented Apr 4, 2024 •

edited

RobinL commented Apr 17, 2024 •

edited

kg005 commented Apr 18, 2024 •

edited

RobinL commented Apr 19, 2024

kg005 commented Apr 19, 2024

[FEAT] Scala 2.13 support? #2132

[FEAT] Scala 2.13 support? #2132

Comments

kg005 commented Apr 4, 2024 • edited

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

RobinL commented Apr 17, 2024 • edited

kg005 commented Apr 18, 2024 • edited

RobinL commented Apr 19, 2024

kg005 commented Apr 19, 2024

kg005 commented Apr 4, 2024 •

edited

RobinL commented Apr 17, 2024 •

edited

kg005 commented Apr 18, 2024 •

edited