Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP unnecessarily clones buffers in state_dict()? #966

Open
rohan-varma opened this issue Mar 25, 2022 · 1 comment
Open

FSDP unnecessarily clones buffers in state_dict()? #966

rohan-varma opened this issue Mar 25, 2022 · 1 comment
Labels
FSDP FullyShardedDataParallel (zero-3) question Further information is requested

Comments

@rohan-varma
Copy link
Contributor

My understanding is that FSDP does not shard the model buffers, and as a result, unlike parameters which would be fred and go back to their sharded state after state_dict()/summon_full_params(), this would not happen with buffers. Although it seems that buffers are still cloned, which may be unnecessary and a small optimization could be made: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L2516

@anj-s
Copy link
Contributor

anj-s commented Mar 28, 2022

Are you talking about buffers which are separate from model parameters? Or just the state_dict itself which we end up calling clone on.

Yes, I do think we need to optimize this bit where we call clone since it runs into OOM errors.

Do you have any suggestions for what we can do to improve this?

@anj-s anj-s added question Further information is requested FSDP FullyShardedDataParallel (zero-3) labels Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FSDP FullyShardedDataParallel (zero-3) question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants