Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did BytePS Support multiple NICs now? #408

Open
wuyujiji opened this issue Aug 6, 2021 · 13 comments
Open

Did BytePS Support multiple NICs now? #408

wuyujiji opened this issue Aug 6, 2021 · 13 comments

Comments

@wuyujiji
Copy link

wuyujiji commented Aug 6, 2021

Hello! Did BytePS implement multiple NICs internally?

@wuyujiji
Copy link
Author

wuyujiji commented Aug 9, 2021

Hello! Did BytePS implement multiple NICs internally?

@ymjiang @bobzhuyb

@bobzhuyb
Copy link
Member

bobzhuyb commented Aug 9, 2021

Hello! Did BytePS implement multiple NICs internally?

You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction?

@wuyujiji
Copy link
Author

Hello! Did BytePS implement multiple NICs internally?

You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction?

@eric-haibin-lin Could you please share the document of multi NICs?

@wuyujiji
Copy link
Author

Hello! Did BytePS implement multiple NICs internally?

You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction?

@bobzhuyb By the way, why did you choice UCX to implement multi NICs?

@bobzhuyb
Copy link
Member

Hello! Did BytePS implement multiple NICs internally?

You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction?

@bobzhuyb By the way, why did you choice UCX to implement multi NICs?

Because UCX has native support for multi-NIC, and it's from Mellanox (RDMA NIC vendor), and it's part of NVIDIA now..

@wuyujiji
Copy link
Author

Hello! Did BytePS implement multiple NICs internally?

You can use UCX van and enable multi-rail. @eric-haibin-lin Do you have a brief instruction?

@bobzhuyb By the way, why did you choice UCX to implement multi NICs?

Because UCX has native support for multi-NIC, and it's from Mellanox (RDMA NIC vendor), and it's part of NVIDIA now..

Thanks a lot. Looking forward to your shared the document of multi NICs.

@eric-haibin-lin
Copy link
Collaborator

Hi @wuyujiji , you need to update the ps-lite commit to the latest one (49e4582), which contains several important fixes for UCXVan. There's a small patch for byteps to use the latest ps-lite, @pleasantrabbit can share/PR the patch.

In order to use multiple NICs, you will need to build ucx with rdma support (https://github.com/bytedance/ps-lite#build) and specify env var UCX_TLS=rc_x,tcp,sm when launching the job. In order to use multiple NICs, you need to set UCX_MAX_RNDV_RAILS to at least 4.

For serious performance benchmarks, i suggest you run ucxvan performance test first in your cluster. You can run this test https://github.com/bytedance/ps-lite#3-other-benchmarks to get an idea of pushpull speed with multiple NICs if you upgrade to the latest UCXVan.

@pleasantrabbit
Copy link
Collaborator

@wuyujiji this patch updates ps-lite to the latest commit: #409

@wuyujiji
Copy link
Author

wuyujiji commented Aug 12, 2021

@eric-haibin-lin @pleasantrabbit Thanks for your wonderful works! I will test it in my cluster. By the way, did you test and compare the performance (push pull speed and end-to-end model) between single NIC and multi NICs?

@wuyujiji
Copy link
Author

wuyujiji commented Aug 16, 2021

@bobzhuyb Hello,I ask a detailed question! The multi NICs operation process of BytePS is whether to allocate each 4M tensor to a different NIC, or to split a 4M tensor into sub-tensor to allocate to different NICs?

@bobzhuyb
Copy link
Member

@bobzhuyb Hello,I ask a detailed question! The multi NICs operation process of BytePS is whether to allocate each 4M tensor to a different NIC, or to split a 4M tensor into sub-tensor to allocate to different NICs?

We give the 4M tensor to UCX. Then UCX will split it onto multiple NICs by itself.

@eric-haibin-lin
Copy link
Collaborator

@eric-haibin-lin @pleasantrabbit Thanks for your wonderful works! I will test it in my cluster. By the way, did you test and compare the performance (push pull speed and end-to-end model) between single NIC and multi NICs?

If I remember correctly with one ps-lite worker per node, it reaches about 300 Gb/s with two 200 Gb/s NICs. In the internal version, we create multiple ps-lite worker instances per node to increase the goodput.

@wuyujiji
Copy link
Author

@bobzhuyb @eric-haibin-lin OK, thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants