Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] No Raw Allocation! #5852

Open
harrism opened this issue Apr 18, 2024 · 0 comments
Open

[BUG] No Raw Allocation! #5852

harrism opened this issue Apr 18, 2024 · 0 comments
Labels
bug Something isn't working good first issue Good for newcomers Tech Debt Issues related to debt

Comments

@harrism
Copy link
Member

harrism commented Apr 18, 2024

Describe the bug

cuML should make more of an effort to use containers (especially RMM device containers) and eradicate all instances of raw allocation.

I have recently been refactoring all RAPIDS repos to use rmm::device_async_resource_ref rather than device_memory_resource*. While these are partially compatible, the former has a different interface for allocate() and deallocate(), defined by cuda::mr::memory_resource concepts.

More disciplined libraries like libcudf and libcuspatial have NO raw allocation. Instead they use storage containers based on rmm::device_buffer (including rmm::device_uvector and rmm::device_scalar). In addition to having strong stream-ordered semantics, these containers take resource_refs and abstract the calls to allocate and deallocate. Therefore, refactoring libcudf and libcuspatial has been nearly trivial, since the interface changes are all covered by RMM classes. The calling library only needs to pass around the resource_refs.

In contrast cuML has multiple places that directly call the allocate()/deallocate() functions of the memory resource (at least they aren't calling cudaMalloc directly!). For example:

struct ARIMAParams {
DataT* mu = nullptr;
DataT* beta = nullptr;
DataT* ar = nullptr;
DataT* ma = nullptr;
DataT* sar = nullptr;
DataT* sma = nullptr;
DataT* sigma2 = nullptr;
/**
* Allocate all the parameter device arrays
*
* @tparam AllocatorT Type of allocator used
* @param[in] order ARIMA order
* @param[in] batch_size Batch size
* @param[in] stream CUDA stream
* @param[in] tr Whether these are the transformed parameters
*/
void allocate(const ARIMAOrder& order, int batch_size, cudaStream_t stream, bool tr = false)
{
rmm::mr::device_memory_resource* rmm_alloc = rmm::mr::get_current_device_resource();
if (order.k && !tr) mu = (DataT*)rmm_alloc->allocate(batch_size * sizeof(DataT), stream);
if (order.n_exog && !tr)
beta = (DataT*)rmm_alloc->allocate(order.n_exog * batch_size * sizeof(DataT), stream);
if (order.p) ar = (DataT*)rmm_alloc->allocate(order.p * batch_size * sizeof(DataT), stream);
if (order.q) ma = (DataT*)rmm_alloc->allocate(order.q * batch_size * sizeof(DataT), stream);
if (order.P) sar = (DataT*)rmm_alloc->allocate(order.P * batch_size * sizeof(DataT), stream);
if (order.Q) sma = (DataT*)rmm_alloc->allocate(order.Q * batch_size * sizeof(DataT), stream);
sigma2 = (DataT*)rmm_alloc->allocate(batch_size * sizeof(DataT), stream);
}
/**
* Deallocate all the parameter device arrays
*
* @tparam AllocatorT Type of allocator used
* @param[in] order ARIMA order
* @param[in] batch_size Batch size
* @param[in] stream CUDA stream
* @param[in] tr Whether these are the transformed parameters
*/
void deallocate(const ARIMAOrder& order, int batch_size, cudaStream_t stream, bool tr = false)
{
rmm::mr::device_memory_resource* rmm_alloc = rmm::mr::get_current_device_resource();
if (order.k && !tr) rmm_alloc->deallocate(mu, batch_size * sizeof(DataT), stream);
if (order.n_exog && !tr)
rmm_alloc->deallocate(beta, order.n_exog * batch_size * sizeof(DataT), stream);
if (order.p) rmm_alloc->deallocate(ar, order.p * batch_size * sizeof(DataT), stream);
if (order.q) rmm_alloc->deallocate(ma, order.q * batch_size * sizeof(DataT), stream);
if (order.P) rmm_alloc->deallocate(sar, order.P * batch_size * sizeof(DataT), stream);
if (order.Q) rmm_alloc->deallocate(sma, order.Q * batch_size * sizeof(DataT), stream);
rmm_alloc->deallocate(sigma2, batch_size * sizeof(DataT), stream);
}

Here we have a struct that has raw pointer members (first No-no). This struct can be either own the memory or be a view of the memory (2nd no-no). This is achieved by providing allocate() and deallocate() member functions that use raw calls to allocate/deallocate (3rd no-no). I tried changing the struct so that rather than raw pointers it uses device_uvector, but those can't be used as non-owning views, so this doesn't work.

Other locations where cuML uses raw allocation are in svm/results.cuh, svm/svc_impl.cuh.

Also, cpp/test/sg/svc_test.cu has a BIG no-no use where it ONLY calls the current device resource's deallocate() function even though the allocation happens elsewhere (somewhere I can't find). This is really bad because it's not guaranteed that the same resource was used to allocate the memory.

@harrism harrism added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 18, 2024
rapids-bot bot pushed a commit that referenced this issue Apr 22, 2024
…rce_ref (#5853)

Closes #5839

Replaces all occurrences of rmm::mr::device_memory_resource* in parameters with rmm::device_async_resource_ref. Also updates calls to raw `resource->allocate()` to use `resource_ref.allocate_async()` with default alignment argument. Same for `deallocate()`.

Ideally would replace these with `device_uvector` / `device_buffer` but that is outside scope of this PR. See #5852

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Divye Gala (https://github.com/divyegala)

URL: #5853
@dantegd dantegd added Tech Debt Issues related to debt good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers Tech Debt Issues related to debt
Projects
None yet
Development

No branches or pull requests

2 participants