Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] incomplete/incorrect statements regarding marker API #602

Open
jdomke opened this issue Feb 5, 2024 · 2 comments
Open

[DOCS] incomplete/incorrect statements regarding marker API #602

jdomke opened this issue Feb 5, 2024 · 2 comments

Comments

@jdomke
Copy link
Contributor

jdomke commented Feb 5, 2024

The wiki (https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#using-the-marker-api) states the following:

For a threaded code it is important to call the following sequence of function calls from the serial part of the program:

LIKWID_MARKER_INIT;
[...]
LIKWID_MARKER_CLOSE;

but if any openmp region is opened before the LIKWID_MARKER_INIT call, then the internal data structures are incorrect (or at least might be, depending on the underlying CPU/node arch), and counters are read incorrectly.

E.g. on A64FX with 4 ranks and 6 threads trying to read EA_L2 results in rank 0 / thread 0 reading the counter (so far so good), but also rank 1 / thread 0+1, rank 2 / thread 0+1, and rank 3 / thread 0+1 are reading the same counter. Thread 1 should not read it, but is due to a incorrectly created internal topology data structure.

@jdomke
Copy link
Contributor Author

jdomke commented Feb 6, 2024

The bug with multiple threads reading/reporting counters (marker API only) which they should not access seems to go away when a topology file, generated via likwid-genTopoCfg, is present on the node. I assume the topology parser (when there's no topo file) has some bugs which need to be fixed, or the topo should not be recreated for threads within the marker ROI. Anyhow, if you want to recreate the issue i suggest starting with this command on a a64fx (or other node with multiple numa domains):

mpirun -np 4 -x OMP_NUM_THREADS=6 -x OMP_PROC_BIND=close -x XOS_MMM_L_ARENA_LOCK_TYPE=0 -x XOS_MMM_L_HPAGE_TYPE=hugetlbfs -x XOS_MMM_L_PAGING_POLICY=demand:demand:demand --mca btl ^openib,tcp --oversubscribe --map-by slot:pe=6 --bind-to core:overload-allowed --tag-output --merge-stderr-to-stdout likwid-perfctr --marker -g ENERGY ./stream_f.exe

(see PR #603 for the ENERGY.txt file)

@TomTheBear
Copy link
Member

TomTheBear commented Feb 16, 2024

The issue comes from changed CPUsets is both cases. When an application is started through LIKWID, the application initially has a CPUset containing all selected HWthreads. If LIKWID_MARKER_INIT is called in this case, it "sees" all potential HWthreads taking part in the computation. As soon as a Pthread thread is started (e.g. by OpenMP), LIKWID's pinning library pins the application (the master thread) to the first HWthread and the workers to consecutive HWthreads in the CPUset. If LIKWID_MARKER_INIT is executed afterwards, it "sees" only its single-core CPUset.

If the topology file is provided, the application as well as all started threads read their topology from the file. This included the CPUset (commonly all threads are allowed because likwid-getTopoCfg is rarely executed in environments with limited CPUset).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants