Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] Error initializing SparkContext #2013

Open
alstjs129 opened this issue May 7, 2024 · 0 comments
Open

[ERROR] Error initializing SparkContext #2013

alstjs129 opened this issue May 7, 2024 · 0 comments
Labels
question Further information is requested

Comments

@alstjs129
Copy link

Hi there.
i meet serious problem now

I installed with following commands:

helm repo add spark-operator https://kubeflow.github.io/spark-operator

helm repo update

helm install sparkoperator spark-operator/spark-operator --namespace spark-operator --create-namespace --set sparkJobNamespace=spark-operator

And I made my own docker image using the basic file that I have when I download Spark
with following command:

sudo ./bin/docker-image-tool.sh -r [dockerhubID] -t [tag] -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
sudo ./bin/docker-image-tool.sh -r [dockerhubID] -t [tag] -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile push

And apply the yaml file to the spark operator through the yaml file that provided kubeflow/examples/spark-py-pi.
Yaml only changed the docker image.

I used serviceaccount sparkoperator-spark that was created when the spark-operator was installed by helm.

my using yaml file

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-pi
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "[myOwnDockerimage]"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.5.1"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "5120m"
    labels:
      version: 3.5.1
    serviceAccount: sparkoperator-spark
  executor:
    cores: 1
    instances: 1
    memory: "5120m"
    labels:
      version: 3.5.1

but error occured.
error log below

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3043)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:568)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:185)
	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:139)
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3037)
	... 15 more

i think main problem is "ERROR SparkContext: Error initializing SparkContext." here

Has anyone solved it after experiencing a related error?

I don't know what I did wrong!!

ps.
I already changed serviceaccount to sparkoperator-spark-operator and my own serviceaccount.
i think it is not related to rbac problem.

I also already try change image to official spark image.

plz give me solution guys plz.

@alstjs129 alstjs129 added the question Further information is requested label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant