New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

FSDP unnecessarily clones buffers in state_dict()? #966

Open

rohan-varma opened this issue Mar 25, 2022 · 1 comment

Labels

Contributor

rohan-varma commented Mar 25, 2022

My understanding is that FSDP does not shard the model buffers, and as a result, unlike parameters which would be fred and go back to their sharded state after state_dict()/summon_full_params(), this would not happen with buffers. Although it seems that buffers are still cloned, which may be unnecessary and a small optimization could be made: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L2516

Contributor

anj-s commented Mar 28, 2022

Are you talking about buffers which are separate from model parameters? Or just the state_dict itself which we end up calling clone on.

Yes, I do think we need to optimize this bit where we call clone since it runs into OOM errors.

Do you have any suggestions for what we can do to improve this?

anj-s added question FSDP labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment