Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusually Long Cali-Query Time #376

Open
tbroaddus opened this issue Jul 21, 2021 · 2 comments
Open

Unusually Long Cali-Query Time #376

tbroaddus opened this issue Jul 21, 2021 · 2 comments

Comments

@tbroaddus
Copy link

Greetings,

We are trying to create a timeline of MPI function calls amongst 8 ranks that are performing irregular communication in AMR code, so the MPI whitelist consists of only MPI_Isend, MPI_Irecv, and MPI_Allreduce. The ranks generate cali files that are 45 megabytes each. My cali-query command to generate a timeline is the following:

cali-query -t -o query.out msgtrace* --sort-by=time.offset

The execution of the previous command runs but never completes and is killed after ~50 minutes of running, even when using multiple threads since we have multiple .cali files generated (specified with the --threads option).

We have CALI_MARK_FUNCTION macro's in every function that contains communication to track the parent functions of the MPI_Function calls.

Our caliper configuration within the slurm job script is the following:

export CALI_SERVICES_ENABLE=event,recorder,timestamp,trace
export CALI_EVENT_TRIGGER=function,mpi.function
export CALI_TIMER_SNAPSHOT_DURATION=false
export CALI_TIMER_INCLUSIVE_DURATION=false
export CALI_TIMER_OFFSET=true
export CALI_MPI_MSG_TRACING=true
export CALI_MPI_WHITELIST=MPI_Isend,MPI_Irecv,MPI_Allreduce
export CALI_RECORDER_FILENAME=msgtrace-%mpi.rank%.cali

We are wondering if this is the appropriate way of using Caliper to create a timeline table of MPI Function calls within our code base. Any suggestions are welcomed.

Thank you!

@daboehme
Copy link
Member

Hi @tbroaddus ,

The .cali format isn't the most efficient unfortunately. You can try to speed things up by filtering, in particular the number of records to look at. Assuming you want to sort by function start times, you can try this:

cali-query -q "select function,event.begin#mpi.function,mpi.rank,time.offset where event.begin#mpi.function format table order by time.offset" 

However, note that the event timestamps are local - they're set to 0 when each process starts, but MPI ranks may start at different times, so the timestamps will likely be shifted by some amount between MPI ranks. You can't really compare them between processes.

I have some tools for processing Caliper MPI traces, which among other things have a heuristic to compute global timestamps. Let me know if you're interested in those.

@tbroaddus
Copy link
Author

Hi @daboehme,

Your explanation of the event timestamps makes sense. I will try your solution as well.

I am also interested in the tools to process Caliper MPI traces if you care to share.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants