Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

FSDP Issues Tracker #4518

Open
Rebecca-Qian opened this issue Apr 27, 2022 · 2 comments
Open

FSDP Issues Tracker #4518

Rebecca-Qian opened this issue Apr 27, 2022 · 2 comments
Assignees
Labels
Agents Bug donotreap Avoid automatically marking as stale. P3

Comments

@Rebecca-Qian
Copy link
Contributor

Rebecca-Qian commented Apr 27, 2022

Description
Tracking known issues during training with FSDP.

  • Issue with resizing embedding dimensions in distributed train
    • Behavior: This throws an exception with embedding sizes out of bound
    • Repro: Train models with --ddp-backend zero2 and setting --special-tok-lst
  • T5 model parallel incompatible with zero2 ddp-backend (possible this affects other HuggingFace agents?)
    • Behavior: thread seems to hang indefinitely
    • Repro: Train models with --t5-model-parallel and --ddp-backend zero2
  • FiD does not work with FSDP and batchsize > 1 (see Cannot train Seeker with batch size > 1 #4531)
@Rebecca-Qian
Copy link
Contributor Author

Related fix: #4505

@github-actions
Copy link

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

@github-actions github-actions bot added the stale label Jun 17, 2022
@klshuster klshuster added donotreap Avoid automatically marking as stale. and removed stale labels Jun 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Agents Bug donotreap Avoid automatically marking as stale. P3
Projects
None yet
Development

No branches or pull requests

3 participants