You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! So I am currently involved in bringing AMD GPU support to the Zeus Project.
We are using ROCM 6.0.2 and testing on AMD HPC Fund Cluster, specifically on an MI100.
We are using amdsmi_get_energy_count() to query the energy consumption of the GPU while training. However, the values returned seem to not be updating, resulting in no change in energy consumption, as shown below:
>>> import amdsmi
>>> amdsmi.amdsmi_init()
>>> handles = amdsmi.amdsmi_get_processor_handles()
>>> amdsmi.amdsmi_get_energy_count(handles[0])
{'power': 8152136, 'counter_resolution': 15.300000190734863, 'timestamp': 3824190842400799}
>>> amdsmi.amdsmi_get_energy_count(handles[0])
{'power': 8152136, 'counter_resolution': 15.300000190734863, 'timestamp': 3824190842400799} # returns the same power value
Even while fully utilizing the GPU during a pytorch training script, there seems to be no difference in the return value, resulting in the calculated energy consumption to be 0 always.
Is there something else that needs to be done for it to return an updated value?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello! So I am currently involved in bringing AMD GPU support to the Zeus Project.
We are using ROCM 6.0.2 and testing on AMD HPC Fund Cluster, specifically on an MI100.
We are using amdsmi_get_energy_count() to query the energy consumption of the GPU while training. However, the values returned seem to not be updating, resulting in no change in energy consumption, as shown below:
Even while fully utilizing the GPU during a pytorch training script, there seems to be no difference in the return value, resulting in the calculated energy consumption to be 0 always.
Is there something else that needs to be done for it to return an updated value?
Beta Was this translation helpful? Give feedback.
All reactions