Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modin fail on CORE (GEN13 i9) #1957

Open
weiseng-yeap opened this issue Oct 10, 2023 · 0 comments
Open

Modin fail on CORE (GEN13 i9) #1957

weiseng-yeap opened this issue Oct 10, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@weiseng-yeap
Copy link

Summary

When I try to installed oneAPI base toolkit and test the MODIN sample apps:
https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Then detected error below:
(raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when
(raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio.
(raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure.
(raylet) - The agent is killed by the OS (e.g., out of memory).

Version

oneAPI toolkit version: 2023.2.0

Environment

OS is Linux uBuntu 22.04.2 LTS
CPU: 13th Gen Intel(R) Core(TM) i9-13900
RAM: 32GB

Steps to reproduce

Using the conda running the MODIN sample apps that released by oneAPI:
https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Observed behavior

Detected the raylet fail like below log:

(raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when
(raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio.
(raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure.
(raylet) - The agent is killed by the OS (e.g., out of memory).

Expected behavior

I tested on XEON is working, but CORE product not working as same setup.

@weiseng-yeap weiseng-yeap added the bug Something isn't working label Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant