You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Component
Library/kfp
What happened + What you expected to happen
In this release, we moved from CodeFlare SDK to RayAPIServer,
I observe different error/warning messages in the Ray logs. See below.
The messages can stem from wrong API parameters or internal RayAPIServer implementation.
From the RayAPIServer Pod logs:
W0504 07:30:13.041268 1 interceptor.go:17] Get compute template failure: NotFoundError: Compute template noop-kfp--78783-head-template not found: configmaps "noop-kfp--78783-head-template" not found. (It looks like the server tries to access the template before it was created)
W0504 07:56:26.660498 1 warnings.go:70] unknown field "spec.headGroupSpec.template.metadata.creationTimestamp"
W0504 07:56:26.660565 1 warnings.go:70] unknown field "spec.workerGroupSpecs[0].template.metadata.creationTimestamp"
W0504 07:56:26.660585 1 warnings.go:70] unknown field "status.desiredCPU"
W0504 07:56:26.660599 1 warnings.go:70] unknown field "status.desiredGPU"
W0504 07:56:26.660630 1 warnings.go:70] unknown field "status.desiredMemory"
W0504 07:56:26.660648 1 warnings.go:70] unknown field "status.desiredTPU"
W0504 07:56:26.680745 1 cluster_server.go:43] Failed to get cluster's event, cluster: kubeflow/noop-kfp--1d2d3, err: No Event with RayCluster name noop-kfp--1d2d3
I0504 07:57:47.189239 1 interceptor.go:14] /proto.RayJobSubmissionService/SubmitRayJob handler starting
{"level":"info","v":0,"logger":"jobsubmissionservice","message":"RayJobSubmissionService submit job"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
A successfully finished RAY job, returns:
> 00:59:16 INFO - Exception running ray remote orchestration
Initialization failure from server:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/util/client/server/proxier.py", line 711, in Datapath
raise RuntimeError(
RuntimeError: Starting Ray client server failed. See ray_client_server_23000.err for detailed logs.
There is no errors in ray_client_server_23000.err, but ray_client_server.err we can see some info:
Search before asking
Component
Library/kfp
What happened + What you expected to happen
In this release, we moved from CodeFlare SDK to RayAPIServer,
I observe different error/warning messages in the Ray logs. See below.
The messages can stem from wrong API parameters or internal RayAPIServer implementation.
From the RayAPIServer Pod logs:
W0504 07:30:13.041268 1 interceptor.go:17] Get compute template failure: NotFoundError: Compute template noop-kfp--78783-head-template not found: configmaps "noop-kfp--78783-head-template" not found. (It looks like the server tries to access the template before it was created)
W0504 07:56:26.660498 1 warnings.go:70] unknown field "spec.headGroupSpec.template.metadata.creationTimestamp"
W0504 07:56:26.660565 1 warnings.go:70] unknown field "spec.workerGroupSpecs[0].template.metadata.creationTimestamp"
W0504 07:56:26.660585 1 warnings.go:70] unknown field "status.desiredCPU"
W0504 07:56:26.660599 1 warnings.go:70] unknown field "status.desiredGPU"
W0504 07:56:26.660630 1 warnings.go:70] unknown field "status.desiredMemory"
W0504 07:56:26.660648 1 warnings.go:70] unknown field "status.desiredTPU"
W0504 07:56:26.680745 1 cluster_server.go:43] Failed to get cluster's event, cluster: kubeflow/noop-kfp--1d2d3, err: No Event with RayCluster name noop-kfp--1d2d3
I0504 07:57:47.189239 1 interceptor.go:14] /proto.RayJobSubmissionService/SubmitRayJob handler starting
{"level":"info","v":0,"logger":"jobsubmissionservice","message":"RayJobSubmissionService submit job"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
A successfully finished RAY job, returns:
> 00:59:16 INFO - Exception running ray remote orchestration
Initialization failure from server:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/util/client/server/proxier.py", line 711, in Datapath
raise RuntimeError(
RuntimeError: Starting Ray client server failed. See ray_client_server_23000.err for detailed logs.
There is no errors in
ray_client_server_23000.err
, butray_client_server.err
we can see some info:ray_client_server.err.zip
Reproduction script
Run the noop pipeline and check the Ray server logs
Anything else
No response
OS
Other
Python
3.11
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: