Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent versions of Spark not supported ? (ClassCastException error) #1334

Open
AWebNagra opened this issue Mar 27, 2024 · 1 comment
Open

Comments

@AWebNagra
Copy link

AWebNagra commented Mar 27, 2024

Hello,

We recently tried using almond as a scala kernel within our jupyterlab environment but we are encountering errors when trying to use recent versions of spark.

Spark versions tested : 3.5.0
Scala version : 2.13.8 (scala version used in spark 3.5.0)
Java version : 17.0.2
Almond versions (tried) : 0.13.14 and 0.14.0-RC13

The errors arise when trying to send code to the execs, hence inducing serialization and deserialization. Other operations work fine (count, show, etc..)

Here's a minimalistic example causing the error :

import org.apache.spark.rdd.RDD
val rdd:RDD[Int] = spark.sparkContext.parallelize(List(1,2,3,4,5))
val multipliedRDD = rdd.map(_ * 2)
println(multipliedRDD.collect().mkString(", "))

and the error is :

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

Note that running the exact same code in a spark-shell on the jupyterlab instance works fine, hence the problem must come from almond.
Our best guess is that the classpath used by almond imports mismatched versions of some libs, but we don't have any proof that this could be the issue

Second note : we tried using our own spark installed in the jupyterlab image AND installing spark directly with ivy from a scala notebook, both induce the same error.

Does anyone have any idea what could be causing this issue ?

@coreyoconnor
Copy link
Contributor

Be sure the spark version is the scala 2.13 version?

I can confirm spark 3.5.1 and almond 0.14.0-RC14 work fine:

image

Specific setup: https://github.com/coreyoconnor/nix_configs/blob/dev/modules/ufo-k8s/almond-2/Dockerfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants