Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDMA TLB 在不同特征维度下的测试 #19

Open
eedalong opened this issue May 24, 2022 · 3 comments
Open

RDMA TLB 在不同特征维度下的测试 #19

eedalong opened this issue May 24, 2022 · 3 comments
Assignees

Comments

@eedalong
Copy link
Member

eedalong commented May 24, 2022

RDMA TLB Results

call for help: @Aiemu
https://github.com/quiver-team/quiver-feature/blob/main/tests/python/test_MultiMachineDistTensorClientServer.py

IB Params:

 POST_LIST_SIZE = 128
 CQ_MOD = 1
 QP_NUM = 8
 TX_DEPTH = 2048

FeatureDim = 128, Tensor Size: 228.8818359375 GB, Sample Size = 250000

W/O TLB

2机2卡: 8488.63404334975 MB/s
2机4卡:
2机6卡:

W/ TLB

2机2卡:
2机4卡:
2机6卡

@eedalong
Copy link
Member Author

eedalong commented May 25, 2022

今天重新分析了下,TLB的命中缺失带来的开销和特征Dim有关系的一个核心原因很可能是PTW带来的时间开销相对于特征读取开销的占比。特征比较大的时候,PTW的开销相对而言没有那么明显,而特征比较小的时候,PTW的开销相对占比就会比较高了。

所以这件事儿其实是两个维度:

  1. 固定FeatureDim,不断的增大NUM_ELEMENT
  2. 固定NUM_ELEMENT, 不断的增大FeatureDim

@eedalong eedalong added the Doing label May 25, 2022
@Aiemu
Copy link
Collaborator

Aiemu commented May 26, 2022

IB Params

POST_LIST_SIZE = 128
CQ_MOD = 1
QP_NUM = 8
TX_DEPTH = 2048

W/O TLB

2机2卡

Server1 Server2
0 8798.309074974653 MB/s 8925.925280242674 MB/s
1 8776.74163466813 MB/s 8940.264366411147 MB/s
2 8864.57287302192 MB/s 8876.406442329364 MB/s
Avg 8813.207860888235 MB/s 8914.198696327729 MB/s

2机4卡

Server1-GPU1 Server1-GPU2 Server2-GPU1 Server2-GPU2
0 8592.910848549946 MB/s 8788.04002677606 MB/s 8784.270665339876 MB/s 8780.655119190533 MB/s
1 8797.553180521667 MB/s 8774.936587372318 MB/s 8914.114595121611 MB/s 8973.797213215319 MB/s
2 8524.098892866063 MB/s 8900.942248183304 MB/s 8922.503180384434 MB/s 8851.85249217683 MB/s
Avg 8638.187640645892 MB/s 8821.306287443893 MB/s 8873.629480281974 MB/s 8868.768274860895 MB/s

2机6卡

Server1-GPU1 Server1-GPU2 Server1-GPU3 Server2-GPU1 Server2-GPU2 Server2-GPU3
0 8482.438701126574 MB/s 8943.231441048036 MB/s 8717.267681411107 MB/s 8778.8484619869 MB/s 8799.670012374536 MB/s 8948.702263392466 MB/s
1 8652.1562795728 MB/s 8897.38465548701 MB/s 8966.253962138591 MB/s 8694.766158339844 MB/s 8954.179783140959 MB/s 8694.766158339844 MB/s
2 8745.708282800677 MB/s 8748.099167905411 MB/s 8819.982773471145 MB/s 8723.803032884649 MB/s 8905.586864259376 MB/s 8735.561584003002 MB/s
Avg 8626.767754500017 MB/s 8862.90508814682 MB/s 8834.50147234028 MB/s 8732.472551070465 MB/s 8886.478886591625 MB/s 8793.010001911769 MB/s

W/ TLB

2机2卡

Server1 Server2
0 8894.293407452446 MB/s 9021.231609549819 MB/s
1 9041.782926570833 MB/s 9033.646805582512 MB/s
2 8788.643424824484 MB/s 8908.06597536363 MB/s
Avg 8908.239919615922 MB/s 8987.64813016532 MB/s

2机4卡

Server1-GPU1 Server1-GPU2 Server2-GPU1 Server2-GPU2
0 8828.347271316492 MB/s 8765.472256937906 MB/s 8852.311629032816 MB/s 9036.197737420802 MB/s
1 8821.958405844547 MB/s 8898.31244894767 MB/s 9007.107170501724 MB/s 8746.75413420801 MB/s
2 8805.723720418271 MB/s 8874.560171944604 MB/s 8978.203307205358 MB/s 8830.022075055187 MB/s
Avg 8818.67646585977 MB/s 8846.114959276727 MB/s 8945.874035579967 MB/s 8870.991315561332 MB/s

2机6卡

Server1-GPU1 Server1-GPU2 Server1-GPU3 Server2-GPU1 Server2-GPU2 Server2-GPU3
0 8924.525013073035 MB/s 8956.999405199258 MB/s 8989.553156000351 MB/s 8889.66056081257 MB/s 8843.596165471976 MB/s 8831.39284174213 MB/s
1 8607.790723088045 MB/s 8871.484760799127 MB/s 9010.594488050403 MB/s 8880.255307340087 MB/s 8852.15857812203 MB/s 7898.187427689934 MB/s
2 8737.94692379896 MB/s 8720.385604550951 MB/s 8884.570000694108 MB/s 8930.59601262842 MB/s 8805.723720418271 MB/s 8772.831637024092 MB/s
Avg 8756.75421998668 MB/s 8849.62325684978 MB/s 8961.572548248287 MB/s 8900.170626927027 MB/s 8833.82615467076 MB/s 8500.80396881872 MB/s

@eedalong
Copy link
Member Author

eedalong commented May 26, 2022

@Aiemu 你跑一下这个文件,维度仍然设置为128,但是不要设置过大的FeatureSize,测个80G左右的就行。

@eedalong eedalong assigned yuanzhigang-source and unassigned Aiemu May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants