Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

rrzeschorscherl · 2023-07-23T17:30:32Z

Bug description
likwid-perfctr incorrectly reports some metrics by adding up core- or socket-local values. This happens, e.g., with:

clock frequency
CPI
runtime
operational intensity

These are "intensive" quantities, i.e., they do not scale with the size of the machine but need to be "averaged" (not literally, of course) in the proper way. In contrast, "extensive" quantities like energy consumption, memory data volume, etc, can be added across the machine to yield a useful number.

To Reproduce

LIKWID command and/or API usage
- likwid-perfctr -g MEM_DP -C M0:0@M1:0 likwid-bench -t triad_avx -W N:2GB:2 on dual-socket Ice Lake 6326
- Operational intensity is correct on each domain separately, but the reported value is twice as high
- Same for clock, runtime, CPI (but on a HW thread basis, so the deviation is even stronger with more threads)
LIKWID version: 5.2.2
Operating system Ubuntu 22.04 LTS
Are you using the MarkerAPI (CPU code instrumentation) or the NvMarkerAPI (Nvidia GPU code instrumentation)?
- yes, but that does not matter

Suggestion

Generalize the formuals by which metrics are calculated and make them configurable as to how different entities (threads, socketc, ...) are handled. For example,operational intensity could be calculated as sth like "sum(flops, all cores)/sum(traffic, all domains)". Clock could be "sum(cycles,all HW threads)/(timenoOfThreads)", CPI could be "sum(cycles,all HW threads)/(noOfThreadssum(instructions, all HW threads))" etc. This will reduce hard-coded stuff but will make config files more complex.

The text was updated successfully, but these errors were encountered:

TomTheBear · 2023-10-20T10:29:10Z

Thanks for your suggestion. I thought about it but it will not be in the upcoming 5.3 version.

While the internal calculator would already support functions like SUM(X,Y,Z) or MIN(X,Y,Z), the integration of data from other threads can be problematic. Especially in the MarkerAPI where each thread updates its own values. One has to synchronize the threads after the counter readings to ensure valid metric values.

In order to reduce the changes to the internal calculator, one could use a two-step approach. When creating the internal group structure, we could expand the proposed syntax SUM(<countername>, <topological-info) to SUM(<countername>_<hw0>, <countername>_<hw1>, ...) with <hw*> being the responsible HW threads for the topological level. This way, we can still use the internal calculator for the final calculation. Of course, it still increases the work in each metric evaluation because we would need to fill the variables map (countername -> value) with the values of all HW threads. In case of modern systems with 100s of HW threads, this will cause quite some overhead.

Moreover, it does not change the way the statistics table is calculated and it is questionable whether it is still required at all. All threads would have the same CPI, Clock, etc. Calculating min, max, mean does not make sense for those or one has to magically transform SUM(cycles, all HW threads) to e.g. MIN(cycles, all HW threads) and re-calculate for the statistics table.

rrzeschorscherl added the bug label Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

rrzeschorscherl commented Jul 23, 2023

TomTheBear commented Oct 20, 2023

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

Comments

rrzeschorscherl commented Jul 23, 2023

TomTheBear commented Oct 20, 2023