Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray error during noop pipeline run - local KFP and Kind #27

Closed
ykoyfman opened this issue Apr 29, 2024 · 18 comments
Closed

Ray error during noop pipeline run - local KFP and Kind #27

ykoyfman opened this issue Apr 29, 2024 · 18 comments
Assignees
Labels
bug Something isn't working fixed Marks an issues as fixed in the dev branch

Comments

@ykoyfman
Copy link
Collaborator

Environment: KFP + Kind on Mac M1 Max with Docker and Colima. VM settings: default Running aarch64 8CPU 32GiB 1000GiB docker

During noop pipeline execution (all default values), execute ray step fails with:

image
@roytman
Copy link
Member

roytman commented Apr 30, 2024

tried to reproduce it on Mac M1 with Podman VM settings: 9 CPUs, 67 GB RAM
image

@roytman
Copy link
Member

roytman commented May 1, 2024

@Mohammad-nassar10 , can you try the test on your Intel-based laptop?

@Mohammad-nassar10
Copy link
Collaborator

I tried it with the default values that exist in the workflow and it passed and generated output file.
image

@roytman
Copy link
Member

roytman commented May 1, 2024

I have faced the issue when podman inspect or stop return "json: cannot unmarshal array into Go struct field InspectContainerConfig.Config.Entrypoint of type string. We have observed it with Hajar.
Therefore, I updated Podman to the latest version v5.0.2; it started successfully, but the Ray execution failed again with a different error
image

We decided we don't support Mac with Apple silicon, didn't we?

@blublinsky
Copy link
Collaborator

Works for me as well. Just re tested

Screenshot 2024-05-01 at 9 04 14 AM Screenshot 2024-05-01 at 9 05 26 AM

@blublinsky
Copy link
Collaborator

Will try to work with Alexey later today to try it on his Mac

@roytman
Copy link
Member

roytman commented May 1, 2024

Restarted my laptop and tried it again with Podman.

It showed success, but we can see errors in the logs.
image
And actually, it submitted two jobs: one Failed, and another is Running for ever.

image

@blublinsky
Copy link
Collaborator

So I think the bottom line is that M1 with Podman is not a supported version for this release.

@blublinsky
Copy link
Collaborator

@roytman, @ykoyfman, @shahrokhDaijavad I think we can close this one

@shahrokhDaijavad
Copy link
Member

@blublinsky I am ok with saying that a Mac M1 with Podman is not supported for kfp in this release, but of course it is ok for individual transforms and using S3-compatible Object Store locally. However, I want to make sure we get confirmation from @ykoyfman that he was able to run a full kfp test on a virtual X86 environment, before closing it.

@ykoyfman
Copy link
Collaborator Author

ykoyfman commented May 2, 2024

@shahrokhDaijavad - Agree - I'm continuing to test on X86 VM. Initially created a 32GB/8xCPU/100GB RHEL9 VM - Kubeflow could not completely start under Kind on this VM.

@blublinsky
Copy link
Collaborator

do not use RHEL9, use Ubuntu

@roytman
Copy link
Member

roytman commented May 3, 2024

Please do not close the issue, given that we have more time, I want to try to resolve the M1 limitations.

@shahrokhDaijavad
Copy link
Member

Sure, Alexey.

@blublinsky
Copy link
Collaborator

@roytman, @ykoyfman, @shahrokhDaijavad. I think we tested this to death and have most of the answers. Can we, please, close it

@roytman
Copy link
Member

roytman commented May 14, 2024

no, as we discussed it still doesn't work on Mac M1. Somehow we should resolve it.

@daw3rd daw3rd added the bug Something isn't working label May 14, 2024
@shahrokhDaijavad
Copy link
Member

shahrokhDaijavad commented May 14, 2024

@blublinsky: As you have seen in the tech slack channel, @ykoyfman is running this today on a RHEL VM. Once he is successful, we can close this, other than the Mac M1 problem that @roytman points to and may not have a short-term solution.

@shahrokhDaijavad
Copy link
Member

Success by @ykoyfman today in running noop with kfp on RHEL VM! I close this issue for now, because I think Mac M1 may not be short-term.

@shahrokhDaijavad shahrokhDaijavad added the fixed Marks an issues as fixed in the dev branch label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Marks an issues as fixed in the dev branch
Projects
None yet
Development

No branches or pull requests

6 participants