Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nccl:send not found #37

Open
qyysjtu opened this issue Apr 9, 2024 · 2 comments
Open

nccl:send not found #37

qyysjtu opened this issue Apr 9, 2024 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@qyysjtu
Copy link

qyysjtu commented Apr 9, 2024

Describe the Bug

When I run the pytorch converter, it shows nccl:send comm_type not supported, is there any plan to support this or this comm_type is not expected in the trace?

admin@admin: ~/llm/chakra(main)$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/profile_et_rank_0_plus.json --output_filename et_plus/profile_chakra.0.et 
Traceback (most recent call last):
  File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/et_converter.py", line 89, in main
    converter.convert()
  File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/pytorch2chakra_converter.py", line 169, in convert
    collective_comm_type = self.get_collective_comm_type(pytorch_node.name)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/pytorch2chakra_converter.py", line 395, in get_collective_comm_type
    raise ValueError(f"'{name}' not found in collective communication mapping. "
ValueError: 'nccl:send' not found in collective communication mapping. Please add this collective communication name to the mapping.
@TaekyungHeo
Copy link
Contributor

TaekyungHeo commented May 9, 2024

Supported collective communication types are listed here. Currently, Chakra does not recognize nccl:send as a collective communication type. The Chakra working group must decide whether to add SEND and RECV as new collective types. We understand that these appear in the collected traces, but currently, we do not have a working solution. You can make local changes to support SEND and RECV types on your own. If this works and makes sense, you can create a PR.

@srinivas212
Copy link
Contributor

Thanks for reporting this issue.

@TaekyungHeo - we probably need to handle this as COMM_SEND_NODE right? Wdyt? This cannot be a collective operation.

@srinivas212 srinivas212 added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants