Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should shuffle around the realization order to minimize peak memory usage #8150

Open
abadams opened this issue Mar 12, 2024 · 3 comments
Open
Labels
enhancement New user-visible features or improvements to existing features.

Comments

@abadams
Copy link
Member

abadams commented Mar 12, 2024

Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.

We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.

@abadams abadams added the enhancement New user-visible features or improvements to existing features. label Mar 12, 2024
@abadams abadams changed the title We should shuffle around the realization order to minimize peak memory usages We should shuffle around the realization order to minimize peak memory usage Mar 12, 2024
@rootjalex
Copy link
Member

This seems like something that should be scheduable, instead of automatic?

@steven-johnson
Copy link
Contributor

This seems like something that should be scheduable, instead of automatic?

Is there ever a situation where we'd choose to use more than the minimum?

@abadams
Copy link
Member Author

abadams commented Mar 19, 2024

It also affects locality, so there might be a trade-off here. Also if the allocations are all dynamic-size, the peak usage and thus the order will depend on those sizes, so the compiler won't be able to infer it.

You can already sort of schedule it with compute_at(Var::outermost(), the_func_you_want_to_go_before)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New user-visible features or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

3 participants