You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FSDP is both model-parallel and data-parallel. Each GPU only sees a chunk of the model at a time as well as seeing only a chunk of the data at a time. This is correct
馃摎 Documentation
https://lightning.ai/docs/pytorch/stable/advanced/model_init.html in this doc, PL say FSDP after "model-parallel training".
but we all know FSDP is a data parallel method.
just like in https://huggingface.co/docs/transformers/fsdp, it says FSDP is data parallel.
so I think there maybe sth wrong
cc @Borda
The text was updated successfully, but these errors were encountered: