Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix temperature readings on Intel CPUs with heterogeneous core design #1269

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dawidpotocki
Copy link

Fixes #1048

I don't really know C, so it is what it is.

This code probably should be changed to also handle CPUs that use ,
instead of - in sysfs. Could probably then replace the code that just
blindly copies the temperatures of the first half to the second half.

Screenshots

Before
Screenshot of htop with incorrect temperature readings

After
Screenshot of htop with correct temperature readings

Example sysfs output of a Raptor Lake CPU (i7-13700K)

P-cores

==> /sys/devices/system/cpu/cpu0/topology/thread_siblings_list <==
0-1
==> /sys/devices/system/cpu/cpu1/topology/thread_siblings_list <==
0-1
==> /sys/devices/system/cpu/cpu2/topology/thread_siblings_list <==
2-3
==> /sys/devices/system/cpu/cpu3/topology/thread_siblings_list <==
2-3
==> /sys/devices/system/cpu/cpu4/topology/thread_siblings_list <==
4-5
==> /sys/devices/system/cpu/cpu5/topology/thread_siblings_list <==
4-5
==> /sys/devices/system/cpu/cpu6/topology/thread_siblings_list <==
6-7
==> /sys/devices/system/cpu/cpu7/topology/thread_siblings_list <==
6-7
==> /sys/devices/system/cpu/cpu8/topology/thread_siblings_list <==
8-9
==> /sys/devices/system/cpu/cpu9/topology/thread_siblings_list <==
8-9
==> /sys/devices/system/cpu/cpu10/topology/thread_siblings_list <==
10-11
==> /sys/devices/system/cpu/cpu11/topology/thread_siblings_list <==
10-11
==> /sys/devices/system/cpu/cpu12/topology/thread_siblings_list <==
12-13
==> /sys/devices/system/cpu/cpu13/topology/thread_siblings_list <==
12-13
==> /sys/devices/system/cpu/cpu14/topology/thread_siblings_list <==
14-15
==> /sys/devices/system/cpu/cpu15/topology/thread_siblings_list <==
14-15

E-cores

==> /sys/devices/system/cpu/cpu16/topology/thread_siblings_list <==
16
==> /sys/devices/system/cpu/cpu17/topology/thread_siblings_list <==
17
==> /sys/devices/system/cpu/cpu18/topology/thread_siblings_list <==
18
==> /sys/devices/system/cpu/cpu19/topology/thread_siblings_list <==
19
==> /sys/devices/system/cpu/cpu20/topology/thread_siblings_list <==
20
==> /sys/devices/system/cpu/cpu21/topology/thread_siblings_list <==
21
==> /sys/devices/system/cpu/cpu22/topology/thread_siblings_list <==
22
==> /sys/devices/system/cpu/cpu23/topology/thread_siblings_list <==
23

@BenBE BenBE added the Linux 🐧 Linux related issues label Jul 17, 2023
@@ -261,6 +261,43 @@ void LibSensors_getCPUTemperatures(CPUData* cpus, unsigned int existingCPUs, uns
goto out;
}

/* Read sysfs to get HT/SMT cores and correctly distribute the temperature values */
if (coreTempCount < activeCPUs) {
double oldData[existingCPUs + 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable length array is not recommended in C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid VLA. Although they are part of C99 (which htop targets), there are later language versions that make them optional. Also, having dynamically sized data on the stack is an accident waiting to happen, from a security PoV. Thus: No VLAs!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a late response, but isn't this also a VLA?

double data[existingCPUs + 1];

That's where I got this from.

I don't mind changing it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LibSensors code is not the cleanest in some places … ;-)

That said, I did some cleanup in #1273.

int dataCell = 1;
for (unsigned int i = 0; i < existingCPUs; i++) {
char path[100];
sprintf(path, "/sys/devices/system/cpu/cpu%d/topology/thread_siblings_list", i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have xSnprintf function in htop codebase. Please use that instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if you are iterating over several items in a common parent directory, it might be worth using things like opendir and openat.

Copy link
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should best be split in a way that the gathered topology information can be re-used for frequency and temperatures alike.

Also, please take better care of defense-in-depth precautions to avoid security related issues.

memcpy(oldData, data, sizeof(oldData));

int dataCell = 1;
for (unsigned int i = 0; i < existingCPUs; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use size_t for things that represent the number of items/counts of things.

int dataCell = 1;
for (unsigned int i = 0; i < existingCPUs; i++) {
char path[100];
sprintf(path, "/sys/devices/system/cpu/cpu%d/topology/thread_siblings_list", i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if you are iterating over several items in a common parent directory, it might be worth using things like opendir and openat.

for (unsigned int i = 0; i < existingCPUs; i++) {
char path[100];
sprintf(path, "/sys/devices/system/cpu/cpu%d/topology/thread_siblings_list", i);
FILE* file = fopen(path, "r");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fopen APIs are usually avoided due to their overhead and performance impact. As mentioned above, use opendir and openat (cf. wrappers in XUtils.h) instead.

Comment on lines +280 to +285
char contents[10];
fgets(contents, 10, file);
fclose(file);

int start, end;
int rangeNums = sscanf(contents, "%d-%d", &start, &end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This buffer is too short for the potential values that could be returned by the kernel. Should be at least 24 bytes (2x 10 bytes for int32_t plus 2 bytes for - and \n, plus one final \0 terminator.

Comment on lines +284 to +285
int start, end;
int rangeNums = sscanf(contents, "%d-%d", &start, &end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer fixed width integer types and suffixes defined in inttypes.h.

int rangeNums = sscanf(contents, "%d-%d", &start, &end);

if (rangeNums == 1) {
data[i+1] = oldData[dataCell];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent use of indexing source. When just one value given, this should equal to running the below code for just the start index, which is not the case at all.

dataCell++;
} else if (rangeNums == 2) {
for (int j = start; j <= end; j++) {
data[j+1] = oldData[dataCell];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchecked buffer overflow from untrusted/unsanitized input.

fgets(contents, 10, file);
fclose(file);

int start, end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use some sane initialization, even though sscanf will potentially assign those variables.

@eworm-de
Copy link
Contributor

Hmm, this does not give the expected results for me. Actually it show the temperature for one core less than before.

@BenBE
Copy link
Member

BenBE commented May 23, 2024

@dawidpotocki Do you mind updating this PR and incorporate the comments from the different reviews? TIA.

@eworm-de Mind to provide some dump to reproduce this?

@BenBE BenBE added the needs-rebase Pull request needs to be rebased and conflicts to be resolved label May 23, 2024
@eworm-de
Copy link
Contributor

Hmm, #1352 addresses the same issue. Does it make sense to focus on this topic in just one PR?

@BenBE
Copy link
Member

BenBE commented May 23, 2024

Whichever fixes the open issues first … I do not have a particular preference right now …

@eworm-de
Copy link
Contributor

Ok. So what info to provide from my system?

This is from sensors:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +47.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 8:        +42.0°C  (high = +100.0°C, crit = +100.0°C)
Core 12:       +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 16:       +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 17:       +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 18:       +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 19:       +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 20:       +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 21:       +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 22:       +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 23:       +43.0°C  (high = +100.0°C, crit = +100.0°C)

BAT1-acpi-0
Adapter: ACPI interface
in0:          17.34 V  
curr1:         0.00 A  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +34.0°C  

nvme-pci-0100
Adapter: PCI adapter
Composite:    +38.9°C  (low  =  -5.2°C, high = +83.8°C)
                       (crit = +87.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +49.8°C  
temp2:        +45.8°C  
temp3:        +47.8°C  
temp4:        +43.8°C  
temp5:        +32.8°C

@dawidpotocki
Copy link
Author

@BenBE I was going to update it ages ago, but stuff happened and forgot about this a bit.

I will not have time to work on this until at least July.

@BenBE
Copy link
Member

BenBE commented May 23, 2024

@BenBE I was going to update it ages ago, but stuff happened and forgot about this a bit.

I will not have time to work on this until at least July.

No worries. Take the time you need.

There were quite a few comments that probably will need a bit of work to address properly. In any case: If you need help or if you have questions feel free to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Linux 🐧 Linux related issues needs-rebase Pull request needs to be rebased and conflicts to be resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing temperature reports on Intel Alder Lake Processor cores
4 participants