Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add support to AMD's ROCm GPU #74

Open
2 tasks done
Junyi-99 opened this issue Jun 11, 2023 · 11 comments · May be fixed by #123
Open
2 tasks done

[Feature Request] Add support to AMD's ROCm GPU #74

Junyi-99 opened this issue Jun 11, 2023 · 11 comments · May be fixed by #123
Assignees
Labels
api Something related to the core APIs enhancement New feature or request upstream Something upstream related

Comments

@Junyi-99
Copy link

Required prerequisites

  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

Motivation

I have been using nvitop for monitoring NVIDIA devices and processes, and I find it to be a great tool with a beautiful UI. Thank you for this good project!

However, I noticed that it doesn't support AMD's ROCm GPU platform. As an AMD user (we have an AMD GPU cluster), I can only use "rocm-smi" to monitor my GPU, and I would love to have a similar tool like nvitop for ROCm.

I believe that adding support for AMD's ROCm GPU would make nvitop a more versatile and inclusive monitoring tool. It would allow users who work with AMD GPUs to benefit from the same features and options that nvitop provides to NVIDIA users.

Solution

Using rocm-smi

Alternatives

No response

Additional context

No response

@Junyi-99 Junyi-99 added the enhancement New feature or request label Jun 11, 2023
@Junyi-99 Junyi-99 changed the title [Feature Request] [Feature Request] Add support to AMD's ROCm GPU Jun 11, 2023
@XuehaiPan XuehaiPan added upstream Something upstream related api Something related to the core APIs labels Jun 21, 2023
@XuehaiPan
Copy link
Owner

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.

nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

@Junyi-99
Copy link
Author

Thanks for your reply and for digging into ROCm monitoring solutions. I will continue to pay attention to this project, I think this issue can be closed.

Thank you again.

@yan-rui
Copy link

yan-rui commented Jun 26, 2023

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.

nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

Thank you so much for this work. It has become an important part of my work.
But recently I started working on other GPUs and it's very inconvenient without nvitop.
Can you help point out what parts needs to be modified to replace nvidia-smi related commands?

@Junyi-99
Copy link
Author

Hi @Junyi-99, thanks for raising this, and I apologize for the late response. I'm afraid that AMD graphics cards may not be supported in nvitop in the foreseeable future. This is due to a lack of necessary Python dependencies on PyPI (there is only an example in RadeonOpenCompute/rocm_smi_lib) and I personally don't have access to a machine with an AMD graphics card for testing. Sorry about that.
nvitop is completely open-source, and you are welcome to fork and develop your own monitor tool.

Thank you so much for this work. It has become an important part of my work. But recently I started working on other GPUs and it's very inconvenient without nvitop. Can you help point out what parts needs to be modified to replace nvidia-smi related commands?

What do you mean "to replace nvidia-smi related commands"?

@yan-rui
Copy link

yan-rui commented Jun 26, 2023

我现在用的GPU使用的监控工具与nvidia很像。
所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

@Junyi-99
Copy link
Author

我现在用的GPU使用的监控工具与nvidia很像。 所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

我认为是可以的

@yan-rui
Copy link

yan-rui commented Jun 26, 2023

我现在用的GPU使用的监控工具与nvidia很像。 所以我想,是否需要将nvidia监控工具相关的api替换为现在在用的GPU的监控工具api,就可以用上nvitop了?

我认为是可以的

我修改哪些地方可以达到这个目的呢?

@Junyi-99
Copy link
Author

抱歉,你需要自己阅读源代码,找到需要修改的位置

@yan-rui
Copy link

yan-rui commented Jun 26, 2023

好的, 多谢回复,我仔细看看代码

@Junyi-99
Copy link
Author

I've successfully adapted nvitop for AMD platforms using ROCm and tested it on mi50, mi100, and mi210 machines without affecting NVIDIA functionality.

I'd like to contribute these changes to extend nvitop's usability to AMD users.

Looking forward to your feedback!

  • MI50:
mi50
  • MI100:
mi100
  • MI210:
mi210

@Junyi-99 Junyi-99 reopened this Mar 11, 2024
@Junyi-99 Junyi-99 linked a pull request Mar 11, 2024 that will close this issue
@Junyi-99
Copy link
Author

🤗 Really looking forward to nvitop's support for AMD GPUs, just like ROCm offers CUDA compatibility!

@XuehaiPan XuehaiPan linked a pull request Mar 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Something related to the core APIs enhancement New feature or request upstream Something upstream related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants