Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast_inputs triggers tensor storage copy, peaks CUDA memory consumption #252

Open
hxu296 opened this issue Jun 26, 2023 · 1 comment

Comments

@hxu296
Copy link
Member

hxu296 commented Jun 26, 2023

Summary

It seems that the following line in def broadcast_inputs(x, y) triggerred a tensor storage copy that caused a CUDA memory overflow when I tried to run a small bundle adjustment dataset with 31843 pixel observations. Both reshape and contiguous could trigger a memory copy. If we can avoid memory copy in broadcast_inputs, we can avoid overflowing CUDA memory at this step.

x = x.expand(shape+(x.shape[-1],)).reshape(-1,x.shape[-1]).contiguous()

image

Improvements

refactor broadcast_inputs to not use reshape and contiguous.

Risks

TBD

Involved components

Optional: Intended side effects

TBD

Optional: Missing test coverage

TBD

@hxu296 hxu296 changed the title broadcast_inputs triggers tensor storange copy, peaks CUDA memory consumption broadcast_inputs triggers tensor storage copy, peaks CUDA memory consumption Jun 26, 2023
@wang-chen
Copy link
Member

@hxu296 This is a historical issue, the functions were implemented by Cuda and the tensor copy behavior was expected. But now it seems not needed anymore. A similar function could be torch.broadcast_tensors and torch.broadcast_shapes, you may re-check those functions called after broadcast_inputs, if they are implemented by batch, then broadcast_inputs can be safely removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants