[Feature Request]Add more comm op support on gpu_hlo_cost_analysis #11934

zjjott · 2024-04-28T04:27:36Z

such as allgather/reducescatter/alltoall, which is commonly usely on fsdp/moe model
my suggestion:

decoupling output_bytes_accessed and bytes_accessed,which bytes_accessed shoud be operand size bytes. not put them together
add allgather/reducescatter/alltoall support
support upper op on gpu_collective_performance_model.cc;

we can make a table to show how many bytes will send inter nodes and inner nodes

I have prepared a PR for them, is this Feature Request needed?

additional question:
Can I get which gpu is inner node gpu? use nvml api?

The text was updated successfully, but these errors were encountered:

zjjott · 2024-05-06T08:36:51Z

@Tixxx @olegshyshkov

olegshyshkov · 2024-05-06T14:06:44Z

Could you please add more details why the current model doesn't work for comm ops?

decoupling output_bytes_accessed and bytes_accessed,which bytes_accessed shoud be operand size bytes. not put them together

There is also operand_bytes_accessed. And bytes_accessed is just sum(operand_bytes_accessed) + output_bytes_accessed.

We don't use bytes_accessed in gpu_performance_model.cc, but I see uses in other parts of the codebase and I can't predict all the implications if we change semantics of the field.

zjjott · 2024-05-07T02:24:45Z

@olegshyshkov OK,I understand~
so the second feature, do we need more op support like allgather/reducescatter/alltoall cost analysis? I have finish a PR to do this

Tixxx · 2024-05-07T02:50:08Z

@olegshyshkov OK,I understand~ so the second feature, do we need more op support like allgather/reducescatter/alltoall cost analysis? I have finish a PR to do this

Yes, having more support for those ops will be good if you can add it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]Add more comm op support on gpu_hlo_cost_analysis #11934

[Feature Request]Add more comm op support on gpu_hlo_cost_analysis #11934

zjjott commented Apr 28, 2024

zjjott commented May 6, 2024

olegshyshkov commented May 6, 2024

zjjott commented May 7, 2024

Tixxx commented May 7, 2024

[Feature Request]Add more comm op support on gpu_hlo_cost_analysis #11934

[Feature Request]Add more comm op support on gpu_hlo_cost_analysis #11934

Comments

zjjott commented Apr 28, 2024

zjjott commented May 6, 2024

olegshyshkov commented May 6, 2024

zjjott commented May 7, 2024

Tixxx commented May 7, 2024