Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NVTX support and RMM_FUNC_RANGE() macro #1558

Merged
merged 13 commits into from
May 23, 2024

Conversation

harrism
Copy link
Member

@harrism harrism commented May 9, 2024

Description

Let's get RMM allocate/deallocates showing up in profiler timelines.

Closes #495

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@harrism harrism added feature request New feature or request non-breaking Non-breaking change labels May 9, 2024
@github-actions github-actions bot added CMake cpp Pertains to C++ code labels May 9, 2024
CMakeLists.txt Outdated Show resolved Hide resolved
# =============================================================================

# This function finds NVTX and sets any additional necessary environment variables.
function(find_and_configure_nvtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably set up NVTX in rapids-cmake since we have three repos using the same code (rmm, cudf, cuspatial -- can't recall if we finished cuspatial but it's supposed to use this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was surprised to find it wasn't already in rapids-cmake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice
Copy link
Contributor

bdice commented May 9, 2024

The build failure shows:

    /usr/bin/sccache /home/coder/.conda/envs/rapids/bin/x86_64-conda-linux-gnu-c++ -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_ -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -D_torch_allocator_EXPORTS -I/home/coder/rmm/build/conda/cuda-12.2/release/_deps/cccl-src/thrust/thrust/cmake/../.. -I/home/coder/rmm/build/conda/cuda-12.2/release/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/home/coder/rmm/build/conda/cuda-12.2/release/_deps/cccl-src/cub/cub/cmake/../.. -isystem /home/coder/rmm/include -isystem /home/coder/rmm/build/conda/cuda-12.2/release/_deps/fmt-src/include -isystem /home/coder/rmm/build/conda/cuda-12.2/release/_deps/spdlog-src/include -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/coder/.conda/envs/rapids/include  -I/home/coder/.conda/envs/rapids/targets/x86_64-linux/include  -L/home/coder/.conda/envs/rapids/targets/x86_64-linux/lib -L/home/coder/.conda/envs/rapids/targets/x86_64-linux/lib/stubs -O3 -DNDEBUG -fPIC -MD -MT rmm/_lib/CMakeFiles/_torch_allocator.dir/_torch_allocator.cpp.o -MF rmm/_lib/CMakeFiles/_torch_allocator.dir/_torch_allocator.cpp.o.d -o rmm/_lib/CMakeFiles/_torch_allocator.dir/_torch_allocator.cpp.o -c /home/coder/rmm/python/rmm/rmm/_lib/_torch_allocator.cpp
    In file included from /home/coder/rmm/include/rmm/mr/device/device_memory_resource.hpp:20,
                     from /home/coder/rmm/include/rmm/mr/device/cuda_memory_resource.hpp:20,
                     from /home/coder/rmm/include/rmm/mr/device/per_device_resource.hpp:21,
                     from /home/coder/rmm/python/rmm/rmm/_lib/_torch_allocator.cpp:19:
    /home/coder/rmm/include/rmm/detail/nvtx/ranges.hpp:19:10: fatal error: nvtx3/nvtx3.hpp: No such file or directory
       19 | #include <nvtx3/nvtx3.hpp>
          |          ^~~~~~~~~~~~~~~~~
    compilation terminated.

This is because we're using NVTX as a BUILD_LOCAL_INTERFACE, so it doesn't seem to get picked up by the rmm Python package from the librmm header-only library:

target_link_libraries(rmm INTERFACE $<BUILD_LOCAL_INTERFACE:nvtx3-cpp>)

I know we had to enable this for cuDF in rapidsai/cudf#15271 because not doing so led to breakage in static builds. I don't know how this should be handled for RMM as a header-only library. Maybe we just need NVTX to have a normal INTERFACE linkage, since it's required by public RMM headers?

edit: I checked with @KyleFromNVIDIA and attempted this change in 702915b.

CMakeLists.txt Outdated Show resolved Hide resolved
rapids_config.cmake Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really makes me happy to see how simple this is :)

@harrism
Copy link
Member Author

harrism commented May 22, 2024

Depends on #rapidsai/rapids-cmake#606

@harrism harrism marked this pull request as ready for review May 22, 2024 01:51
@harrism harrism requested review from a team as code owners May 22, 2024 01:51
@harrism harrism requested review from rongou and jrhemstad May 22, 2024 01:51
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -0,0 +1,61 @@
/*
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
* Copyright (c) 2024, NVIDIA CORPORATION.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@@ -117,6 +118,7 @@ class device_memory_resource {
*/
void* allocate(std::size_t bytes, cuda_stream_view stream = cuda_stream_view{})
{
RMM_FUNC_RANGE();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we want the same set of annotations for the host_memory_resource as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I added them. Note that the new pinned_host_memory_resource does not derive from host_memory_resource so I had to add ranges to it explicitly. This is the way it will have to be done for all MRs that just implement the concepts in the future, unfortunately.

rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request May 22, 2024
Adds `rapids_cpm_nvtx3`, adapted from the code that is currently implemented in libcudf. This aims to help with rapidsai/rmm#1558.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Mark Harris (https://github.com/harrism)
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)

URL: #606
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small things. I would like to hear Robert's opinion on one since I think it's perpetuating a form of tech debt that we have throughout RAPIDS CMake code and I'd like to establish a different practice, but it shouldn't block merging so feel free to ping me tomorrow if it isn't addressed by the freeze deadline.

CMakeLists.txt Outdated Show resolved Hide resolved

endfunction()

find_and_configure_nvtx3()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I find these extra modules to be overkill when they're so trivial and would prefer that we inline them. In a few of the more recent cases we have done so. @robertmaynard WDYT? This feels like a pattern that's been copy-pasted many times into a number of scenarios where it isn't really needed.

Specifically I'm talking about just moving lines 18/19 into CMakeLists.txt and removing this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree, I also feel that this PR is consistent with the style of 3 other rapids-cmake packages used in RMM (CCCL, fmt, spdlog -- which has additional local cmake code for some reason). Therefore I would like to defer this change to a later PR, which should be applied in unison with the same style change across all of RAPIDS.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake approval.

@harrism
Copy link
Member Author

harrism commented May 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit 4e9482f into rapidsai:branch-24.06 May 23, 2024
57 checks passed
target_compile_features(rmm INTERFACE cxx_std_17 $<BUILD_INTERFACE:cuda_std_17>)
target_compile_definitions(rmm INTERFACE LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE)

# Disable NVTX if necessary
if(NOT USE_NVTX)
target_compile_definitions(rmm INTERFACE NVTX_DISABLE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option for disabling NVTX does not appear to be used. This seems like a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Add NVTX ranges to pool allocation/deallocation
5 participants