Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark-operator 1.12.3 crashes because of missing image #2004

Open
fjammes opened this issue Apr 25, 2024 · 5 comments
Open

[BUG] Spark-operator 1.12.3 crashes because of missing image #2004

fjammes opened this issue Apr 25, 2024 · 5 comments

Comments

@fjammes
Copy link

fjammes commented Apr 25, 2024

Description

Spark operator pod crashes with ImagePullBackOffError.

  • [ X] ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:

$ helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
"spark-operator" has been added to your repositories
NAME: my-release
LAST DEPLOYED: Thu Apr 25 10:13:49 2024
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ helm list -A
NAME      	NAMESPACE     	REVISION	UPDATED                                 	STATUS  	CHART                	APP VERSION        
my-release	spark-operator	1       	2024-04-25 10:24:26.291749107 +0200 CEST	deployed	spark-operator-1.2.13	v1beta2-1.4.4-3.5.0
$ kubectl get pods -n spark-operator 
NAME                                         READY   STATUS         RESTARTS   AGE
my-release-spark-operator-5cbd8bb556-nr5nb   0/1     ErrImagePull   0          30s
$ kubectl describe pods -n spark-operator my-release-spark-operator-5cbd8bb556-nr5nb
...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  43s                default-scheduler  Successfully assigned spark-operator/my-release-spark-operator-5cbd8bb556-nr5nb to kind-control-plane
  Normal   BackOff    18s (x2 over 41s)  kubelet            Back-off pulling image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0"
  Warning  Failed     18s (x2 over 41s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    5s (x3 over 43s)   kubelet            Pulling image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0"
  Warning  Failed     4s (x3 over 41s)   kubelet            Failed to pull image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": failed to resolve reference "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning  Failed     4s (x3 over 41s)   kubelet            Error: ErrImagePull

Expected behavior

Spark operator should start successfully

Actual behavior

Spark operator pod crashes with ImagePullBackOffError.

Environment & Versions

  • Spark Operator App version: v1beta2-1.4.4-3.5.0
  • Helm Chart Version: 1.12.3
  • Kubernetes Version: 1.27.3
  • Apache Spark version: Non applicable
@xiphl
Copy link

xiphl commented Apr 25, 2024

try the fixes suggested in an earlier issue #1991

@fjammes
Copy link
Author

fjammes commented Apr 26, 2024

I tried to use --set 'image.repository=ghcr.io/googlecloudplatform/spark-operator' as proposed in #1991 but it did not solve at all the current issue.

@xiphl
Copy link

xiphl commented Apr 26, 2024

the last message in the thread --set image.repository=ghcr.io/kubeflow/spark-operator --set image.tag=v1beta2-1.4.3-3.5.0 works for me.

@vara-bonthu
Copy link
Contributor

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0
Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

@fjammes
Copy link
Author

fjammes commented Apr 29, 2024

Thanks!
This command works fine: helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --version 1.2.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants