Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

Open
rrzeschorscherl opened this issue Jul 23, 2023 · 1 comment
Labels

Comments

@rrzeschorscherl
Copy link

Bug description
likwid-perfctr incorrectly reports some metrics by adding up core- or socket-local values. This happens, e.g., with:

  • clock frequency
  • CPI
  • runtime
  • operational intensity

These are "intensive" quantities, i.e., they do not scale with the size of the machine but need to be "averaged" (not literally, of course) in the proper way. In contrast, "extensive" quantities like energy consumption, memory data volume, etc, can be added across the machine to yield a useful number.

To Reproduce

  • LIKWID command and/or API usage
    • likwid-perfctr -g MEM_DP -C M0:0@M1:0 likwid-bench -t triad_avx -W N:2GB:2 on dual-socket Ice Lake 6326
    • Operational intensity is correct on each domain separately, but the reported value is twice as high
    • Same for clock, runtime, CPI (but on a HW thread basis, so the deviation is even stronger with more threads)
  • LIKWID version: 5.2.2
  • Operating system Ubuntu 22.04 LTS
  • Are you using the MarkerAPI (CPU code instrumentation) or the NvMarkerAPI (Nvidia GPU code instrumentation)?
    • yes, but that does not matter

Suggestion

  • Generalize the formuals by which metrics are calculated and make them configurable as to how different entities (threads, socketc, ...) are handled. For example,operational intensity could be calculated as sth like "sum(flops, all cores)/sum(traffic, all domains)". Clock could be "sum(cycles,all HW threads)/(timenoOfThreads)", CPI could be "sum(cycles,all HW threads)/(noOfThreadssum(instructions, all HW threads))" etc. This will reduce hard-coded stuff but will make config files more complex.
@TomTheBear
Copy link
Member

Thanks for your suggestion. I thought about it but it will not be in the upcoming 5.3 version.

While the internal calculator would already support functions like SUM(X,Y,Z) or MIN(X,Y,Z), the integration of data from other threads can be problematic. Especially in the MarkerAPI where each thread updates its own values. One has to synchronize the threads after the counter readings to ensure valid metric values.

In order to reduce the changes to the internal calculator, one could use a two-step approach. When creating the internal group structure, we could expand the proposed syntax SUM(<countername>, <topological-info) to SUM(<countername>_<hw0>, <countername>_<hw1>, ...) with <hw*> being the responsible HW threads for the topological level. This way, we can still use the internal calculator for the final calculation. Of course, it still increases the work in each metric evaluation because we would need to fill the variables map (countername -> value) with the values of all HW threads. In case of modern systems with 100s of HW threads, this will cause quite some overhead.

Moreover, it does not change the way the statistics table is calculated and it is questionable whether it is still required at all. All threads would have the same CPI, Clock, etc. Calculating min, max, mean does not make sense for those or one has to magically transform SUM(cycles, all HW threads) to e.g. MIN(cycles, all HW threads) and re-calculate for the statistics table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants