[BUG] No Raw Allocation! #5852

harrism · 2024-04-18T02:45:45Z

Describe the bug

cuML should make more of an effort to use containers (especially RMM device containers) and eradicate all instances of raw allocation.

I have recently been refactoring all RAPIDS repos to use rmm::device_async_resource_ref rather than device_memory_resource*. While these are partially compatible, the former has a different interface for allocate() and deallocate(), defined by cuda::mr::memory_resource concepts.

More disciplined libraries like libcudf and libcuspatial have NO raw allocation. Instead they use storage containers based on rmm::device_buffer (including rmm::device_uvector and rmm::device_scalar). In addition to having strong stream-ordered semantics, these containers take resource_refs and abstract the calls to allocate and deallocate. Therefore, refactoring libcudf and libcuspatial has been nearly trivial, since the interface changes are all covered by RMM classes. The calling library only needs to pass around the resource_refs.

In contrast cuML has multiple places that directly call the allocate()/deallocate() functions of the memory resource (at least they aren't calling cudaMalloc directly!). For example:

cuml/cpp/include/cuml/tsa/arima_common.h

Lines 62 to 113 in 654d95a

    
           struct ARIMAParams { 
        
             DataT* mu     = nullptr; 
        
             DataT* beta   = nullptr; 
        
             DataT* ar     = nullptr; 
        
             DataT* ma     = nullptr; 
        
             DataT* sar    = nullptr; 
        
             DataT* sma    = nullptr; 
        
             DataT* sigma2 = nullptr; 
        
             /** 
        
              * Allocate all the parameter device arrays 
        
              * 
        
              * @tparam      AllocatorT Type of allocator used 
        
              * @param[in]   order      ARIMA order 
        
              * @param[in]   batch_size Batch size 
        
              * @param[in]   stream     CUDA stream 
        
              * @param[in]   tr         Whether these are the transformed parameters 
        
              */ 
        
             void allocate(const ARIMAOrder& order, int batch_size, cudaStream_t stream, bool tr = false) 
        
             { 
        
               rmm::mr::device_memory_resource* rmm_alloc = rmm::mr::get_current_device_resource(); 
        
               if (order.k && !tr) mu = (DataT*)rmm_alloc->allocate(batch_size * sizeof(DataT), stream); 
        
               if (order.n_exog && !tr) 
        
                 beta = (DataT*)rmm_alloc->allocate(order.n_exog * batch_size * sizeof(DataT), stream); 
        
               if (order.p) ar = (DataT*)rmm_alloc->allocate(order.p * batch_size * sizeof(DataT), stream); 
        
               if (order.q) ma = (DataT*)rmm_alloc->allocate(order.q * batch_size * sizeof(DataT), stream); 
        
               if (order.P) sar = (DataT*)rmm_alloc->allocate(order.P * batch_size * sizeof(DataT), stream); 
        
               if (order.Q) sma = (DataT*)rmm_alloc->allocate(order.Q * batch_size * sizeof(DataT), stream); 
        
               sigma2 = (DataT*)rmm_alloc->allocate(batch_size * sizeof(DataT), stream); 
        
             } 
        
             /** 
        
              * Deallocate all the parameter device arrays 
        
              * 
        
              * @tparam      AllocatorT Type of allocator used 
        
              * @param[in]   order      ARIMA order 
        
              * @param[in]   batch_size Batch size 
        
              * @param[in]   stream     CUDA stream 
        
              * @param[in]   tr         Whether these are the transformed parameters 
        
              */ 
        
             void deallocate(const ARIMAOrder& order, int batch_size, cudaStream_t stream, bool tr = false) 
        
             { 
        
               rmm::mr::device_memory_resource* rmm_alloc = rmm::mr::get_current_device_resource(); 
        
               if (order.k && !tr) rmm_alloc->deallocate(mu, batch_size * sizeof(DataT), stream); 
        
               if (order.n_exog && !tr) 
        
                 rmm_alloc->deallocate(beta, order.n_exog * batch_size * sizeof(DataT), stream); 
        
               if (order.p) rmm_alloc->deallocate(ar, order.p * batch_size * sizeof(DataT), stream); 
        
               if (order.q) rmm_alloc->deallocate(ma, order.q * batch_size * sizeof(DataT), stream); 
        
               if (order.P) rmm_alloc->deallocate(sar, order.P * batch_size * sizeof(DataT), stream); 
        
               if (order.Q) rmm_alloc->deallocate(sma, order.Q * batch_size * sizeof(DataT), stream); 
        
               rmm_alloc->deallocate(sigma2, batch_size * sizeof(DataT), stream); 
        
             }

Here we have a struct that has raw pointer members (first No-no). This struct can be either own the memory or be a view of the memory (2nd no-no). This is achieved by providing allocate() and deallocate() member functions that use raw calls to allocate/deallocate (3rd no-no). I tried changing the struct so that rather than raw pointers it uses device_uvector, but those can't be used as non-owning views, so this doesn't work.

Other locations where cuML uses raw allocation are in svm/results.cuh, svm/svc_impl.cuh.

Also, cpp/test/sg/svc_test.cu has a BIG no-no use where it ONLY calls the current device resource's deallocate() function even though the allocation happens elsewhere (somewhere I can't find). This is really bad because it's not guaranteed that the same resource was used to allocate the memory.

The text was updated successfully, but these errors were encountered:

…rce_ref (#5853) Closes #5839 Replaces all occurrences of rmm::mr::device_memory_resource* in parameters with rmm::device_async_resource_ref. Also updates calls to raw `resource->allocate()` to use `resource_ref.allocate_async()` with default alignment argument. Same for `deallocate()`. Ideally would replace these with `device_uvector` / `device_buffer` but that is outside scope of this PR. See #5852 Authors: - Mark Harris (https://github.com/harrism) Approvers: - Bradley Dice (https://github.com/bdice) - Dante Gama Dessavre (https://github.com/dantegd) - Divye Gala (https://github.com/divyegala) URL: #5853

harrism added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 18, 2024

harrism mentioned this issue Apr 18, 2024

Replace rmm::mr::device_memory_resource* with rmm::device_async_resource_ref #5853

Merged

dantegd added Tech Debt Issues related to debt good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] No Raw Allocation! #5852

[BUG] No Raw Allocation! #5852

harrism commented Apr 18, 2024 •

edited

[BUG] No Raw Allocation! #5852

[BUG] No Raw Allocation! #5852

Comments

harrism commented Apr 18, 2024 • edited

harrism commented Apr 18, 2024 •

edited