Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA invalid configuration if no Gaussian is in view #84

Open
Xanthorapedia opened this issue Dec 15, 2023 · 2 comments · May be fixed by #100
Open

CUDA invalid configuration if no Gaussian is in view #84

Xanthorapedia opened this issue Dec 15, 2023 · 2 comments · May be fixed by #100

Comments

@Xanthorapedia
Copy link

A CUDA invalid configuration exception will be thrown from kernel launches that perform per-point processing if the number of points is 0, which can happen if no Gaussian is in the view.

This is caused by kernel launches with 0 blocks, e.g. in the example below, blocks evaluates to 0 if num_pts is 0.

int blocks = (num_pts + N_THREADS - 1) / N_THREADS;
compute_cov2d_bounds_kernel<<<blocks, N_THREADS>>>(

I would suggest checking num_pts before the kernel launch, i.e.,

if (num_pts > 0) {
    compute_cov2d_bounds_kernel<<<(num_pts + N_THREADS - 1) / N_THREADS, N_THREADS>>>(
        num_pts,
        covs2d.contiguous().data_ptr<float>(),
        conics.contiguous().data_ptr<float>(),
        radii.contiguous().data_ptr<float>()
    );
}

I'm happy to submit a PR and please let me know if there are better alternatives.

@kerrj
Copy link
Collaborator

kerrj commented Dec 15, 2023

That would be great, thanks! if there are no gaussians in view it seems reasonable to just return a torch.full() with dimensions HxWxC with the background color.

@Xanthorapedia
Copy link
Author

Xanthorapedia commented Jan 6, 2024

As a related issue, access to tile_bins in the kernel below sometimes also throws OOM at the line:

int2 range = tile_bins[tile_id];

The reason is that the size of tile_bins returned by get_tile_bin_edges_tensor is set to the number of intersections instead of the actual number of tiles, which might be very low or even 0:

torch::Tensor get_tile_bin_edges_tensor(
int num_intersects, const torch::Tensor &isect_ids_sorted
) {
CHECK_INPUT(isect_ids_sorted);
torch::Tensor tile_bins = torch::zeros(
{num_intersects, 2}, isect_ids_sorted.options().dtype(torch::kInt32)
);

I would propose to fix this by changing

tile_bins = get_tile_bin_edges(num_intersects, isect_ids_sorted)

to

    num_tiles = tile_bounds[0] * tile_bounds[1]
    tile_bins = get_tile_bin_edges(num_tiles, isect_ids_sorted)

It might also be helpful to return the tile_bins with shape (tile_bounds[0], tile_bounds[1]) or just tile_bounds instead of (tile_bounds[0] * tile_bounds[1],).

Please let me know how you like it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants