Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PwrCap - does not stay static #194

Open
saki2fifty opened this issue Sep 21, 2023 · 5 comments
Open

PwrCap - does not stay static #194

saki2fifty opened this issue Sep 21, 2023 · 5 comments

Comments

@saki2fifty
Copy link

saki2fifty commented Sep 21, 2023

image

When setting all devices to a max power cap of any number (in this screenshot, it was set to 100), over time it ignores this and the power adjusts automatically.

I loop through each device, but per device the commands are:

Exact Powershell:

$gpuList = ((echo '{"command": "list_devices"}' | ncat -U /run/lactd.sock | ConvertFrom-Json).data) | Where-Object { $_.name -notmatch 'HD Graphics' }

$powerCap = "100"
$performanceLevel = "manual"
$max_memory_clock = "1900"
$max_core_clock = "1100"

$gpuStats = @()

foreach ($gpu in $gpuList) {

    Write-Host "Setting gpu: $($gpu.id) - Memory: $($max_memory_clock) - Core: $($max_core_clock)" -ForegroundColor Green

    Invoke-Expression -Command "echo '{""command"": ""set_performance_level"", ""args"": {""id"": ""$($gpu.id)"", ""performance_level"": ""$($performanceLevel)""}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""confirm_pending_config"", ""args"": {""command"": ""confirm""}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""set_clocks_value"", ""args"": {""id"": ""$($gpu.id)"", ""command"": {""type"": ""max_memory_clock"", ""value"": $($max_memory_clock)}}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""confirm_pending_config"", ""args"": {""command"": ""confirm""}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""set_clocks_value"", ""args"": {""id"": ""$($gpu.id)"", ""command"": {""type"": ""max_core_clock"", ""value"": $($max_core_clock)}}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""confirm_pending_config"", ""args"": {""command"": ""confirm""}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""set_power_cap"", ""args"": {""id"": ""$($gpu.id)"", ""cap"": $($powerCap)}}' | ncat -U /run/lactd.sock"
    Invoke-Expression -Command "echo '{""command"": ""confirm_pending_config"", ""args"": {""command"": ""confirm""}}' | ncat -U /run/lactd.sock"

}

foreach ($gpu in $gpuList) {

    $deviceStats = (echo '{"command":"device_stats","args": {"id": "1002:67DF-1682:C580-0000:0b:00.0"}}' | ncat -U /run/lactd.sock | ConvertFrom-Json).data

    # Add the GPU statistics to the array
    $gpuStats += $deviceStats

    # Add custom properties to the last element of the array
    $gpuStats[-1] | Add-Member -MemberType NoteProperty -Name "GPU_ID" -Value $gpu.id
    $gpuStats[-1] | Add-Member -MemberType NoteProperty -Name "GPU_Type" -Value $gpu.name

    Start-Sleep -Seconds 1
}
@saki2fifty
Copy link
Author

I have many rigs and this happens across all of them. RX580's in the screenshot.

@ilya-zlobintsev
Copy link
Owner

Can you provide more info on when the power limit resets? How long does it take for it to reset? Does anything else (clocks, performance level) reset? Is anything printed in system logs (dmesg)?

This could be solved by performing a check every few minutes that compares the actual settings of the gpu to the previously applied ones, though this is not an ideal solution.

Also, I have to say that I did not expect LACT to be used for mining rigs. Good to know that the API is useful though.

@saki2fifty
Copy link
Author

Currently on a work call... but yeah, LACT is 100% useful. I use it instead of rocm-smi and is my default oc/stat'er.

I have a check every so many minutes to see if it changes and force it again.

I'd say it resets probably slowly starting with 15 minutes, one by one and within an hour all have reset.

But, i'll check those logs and let you know.

@pinbuck
Copy link

pinbuck commented Jan 8, 2024

I have this issue too, no matter which power cap I set, the GPU starts "power throttling" even before it reaches the rated normal 145 watts max. I didn't get this issue on Linux Mint but I get it on Fedora 38/39 and Nobara 38 with and without amdgpu.ppfeaturemask=0xffffffff or amdgpu.ppfeaturemask=0xfffd7fff. It may be worth the time to test older kernels where this is possibly a non-issue and find which kernel version after this becomes a problem.

@ilya-zlobintsev
Copy link
Owner

@pinbuck this sounds like a kernel-side issue, you should report it on https://gitlab.freedesktop.org/drm/amd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants