Performance Metrics for GPU-enabled code #324

rcarson3 · 2020-12-09T01:06:34Z

rcarson3
Dec 9, 2020

@daboehme I'm currently using caliper to get basic timing information out of my code ExaConstit which is used within an ECP application. As part of this ECP app, we're currently looking at various different performance tools (TAU,HPCToolkit, and etc) to potentially get out such information as FLOP or tracking memory/ memory usage. I was wondering if either of these were possible in Caliper for GPU code.

I've tried using the PAPI feature to get at the FLOP metric based on the config file for LULESH in the caliper-example repo on Lassen and have been striking out with that.

daboehme · 2020-12-09T18:02:14Z

daboehme
Dec 9, 2020
Maintainer

Hi @rcarson3 ,

Getting HW counters like FLOP counts from GPU kernels is not straightforward. For GPU FLOP metrics and such we generally recommend the NVidia NSight tools (nsys and ncu, or the older nvprof). You can forward Caliper-annotated code regions to the NVidia tools with the "nvtx" service ("nprof" in Caliper <= 2.4), e.g.

CALI_SERVICES_ENABLE=nvtx ncu --nvtx ./app

You can then use Caliper annotations to only profile a specific region - check the NVidia tools documentation.

You can also use Caliper to profile CUDA API calls, or trace GPU activities (kernel execution, memcopies, etc.). For both of these you'll need to build Caliper with CUpti support. CUDA API profiling is supported for many built-in configs, e.g.

CALI_CONFIG=runtime-report,profile.cuda ./app

Activity tracing reports time on the device and host for CUDA activities:

CALI_CONFIG=cuda-activity ./app

While we currently don't support GPU metrics directly in Caliper, PAPI has a CUDA component - if you get that to work then Caliper should be able to read these metrics via the PAPI service. However, it looks like the PAPI CUDA component is no longer supported on newer devices. I also don't think we have a PAPI installation on Lassen, so you'd have to build it yourself. May be worth a try.

1 reply

rcarson3 Dec 17, 2020
Author

Thanks for the info @daboehme . I'll definitely give the nsight route a try.

Also, I was able to find a PAPI installation on Lassen that Tau is making use of. Although, I was getting some warnings about missing symbols or something like that.

jrmadsen · 2020-12-09T22:48:10Z

jrmadsen
Dec 9, 2020

For what it's worth, I've worked extensively with the CUPTI Callback API which underpins NVprof and the PAPI CUDA component. It is a pain to work with and there is a reason NVIDIA moved away from it. The new CUPTI Profiler API which underpins nsight-compute has a massive problem for PAPI and, in theory, Caliper -- it doesn't support getting data values in a nested context. At least, I could not get it to do so. E.g. if marker "A" contains marker "B", any attempts get the flop counts (or any other HW counter metric) exclusively for "B" either invalidated "A" resuming collection after the "B" metrics were recorded or "B" returned zeros because you didn't fully stop the profiler. It appears that NVIDIA designed the API with the expectation that you would only want values at the end of the application, which obviously causes issues for tools like PAPI, Caliper, etc. whose APIs implicitly (e.g. flush output) or explicitly (e.g. PAPI_read, callbacks) have the expectation that one can get the numerical results during the runtime.

1 reply

rcarson3 Dec 17, 2020
Author

@jrmadsen this is very interesting info about PAPI. I actually passed this onto one of my team members, and they were seeing exactly what you were talking about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Metrics for GPU-enabled code #324

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance Metrics for GPU-enabled code #324

rcarson3 Dec 9, 2020

Replies: 2 comments · 2 replies

daboehme Dec 9, 2020 Maintainer

rcarson3 Dec 17, 2020 Author

jrmadsen Dec 9, 2020

rcarson3 Dec 17, 2020 Author

rcarson3
Dec 9, 2020

Replies: 2 comments 2 replies

daboehme
Dec 9, 2020
Maintainer

rcarson3 Dec 17, 2020
Author

jrmadsen
Dec 9, 2020

rcarson3 Dec 17, 2020
Author