Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the correct way to use likwid-perfctr (with pylikwid and marker API) with Python Multiprocessing? #514

Open
suyashbakshi opened this issue Feb 14, 2023 · 2 comments

Comments

@suyashbakshi
Copy link

suyashbakshi commented Feb 14, 2023

My program spawns several processes using the Python multiprocessing module. Within each process I would like to measure a specific code section for the number of double precision flops performed.

I initialize the marker API using pylikwid.markerinit() and pylikwid.markerthreadinit() in the parent process. This parent process then spawns several process using multiprocessing.Pool.starmap, where each process executes another function which carries out a bunch of computation. I marked the code section in this function that I'm interested in measuring the FLOPS_DP for, using pylikwid.markerstartregion() and pylikwid.markerstopregion(). Before exiting the said function, each process calls pylikwid.markerclose().

The program is invoked with likwid-perfctr -m -g FLOPS_DP python3 <my_program.py>

This setup works fine if there's only 1 process being spawned. However, for multiple processes (which is how I intend to use it), the outcome is a variety of behaviors. Either the program finishes, but it reports metrics for only one process (or a subset of the processes), even which are erroneous. Or the program just does not finish and stays hung.

Side note: If I use the python Threading module, I get expected results from all the threads. However, due to the python GIL, threading does not really solve my problem as I require parallel execution for performance reasons.

@TomTheBear
Copy link
Member

This is not easy to answer. The main problem is probably that the multiprocessing module spawns actual processes and I don't know how much of the master process's memory is available for the child processes. With the threading module, you don't get an OS thread for each Python thread, so MarkerAPI is not really usable (the code runs on a single core).

The main issue why you get results only for a single process (or none at all) is that you seem to call pylikwid.markerclose() inside the child threads. All write their data to the same file, so either you get a single result or the file is corrupt and you get no output (and it may hang).

Can you provide a simple test code?

@TomTheBear
Copy link
Member

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants