New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paddle进行开启分布式训练时如何进行性能分析? #63858
Comments
您好,请问是否可以提供相关命令/截图/报错信息等,以便进行相关开发同学进行复现,谢谢~ |
编译并安装paddlecustomdevice中的mlu后端,并在runtime.cc文件中的SetDevice接口加入std::cout << "-----------device id:" << device->id << std::endl;以打印device信息,执行paddlenlp的llama pretrain脚本,开启tp、pp训练,终端输出的device id一直为0 |
运行llama脚本如下 PYTHONPATH=../:$PYTHONPATH |
您好,您可以在作业运行的同时,输入cnmon的命令查看0-3号卡是否正常运行吗? 如果正常被占用的话,性能分析可以使用飞桨的原生Profiler,MLU在适配Paddle时候有通过 https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/mlu/runtime/runtime.cc#L982 这里定义的profiler接口支持飞桨的原生Profiler。 飞桨原生Profiler的使用可以查看文档 https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/profiler/Profiler_cn.html#profiler 里面的使用说明,代码修改可以这个PR PaddlePaddle/PaddleCustomDevice#785 的样例代码,将其中的代码参考如下修改即可。 profiler = profiler.Profiler(targets=[profiler.ProfilerTarget.CUSTOM_DEVICE], custom_device_types=['mlu']) |
请提出你的问题 Please ask your question
paddle开启分布式训练时,如tp、pp均为2时,我想进行性能分析,查看板卡占用信息确实可以看到4张卡都被占用了,但是打印device时永远都是0卡,这怎样查看所有卡的信息并进行分析呢?
The text was updated successfully, but these errors were encountered: