Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BF16 for FSDP #963

Open
yuvalkirstain opened this issue Mar 22, 2022 · 8 comments
Open

Support BF16 for FSDP #963

yuvalkirstain opened this issue Mar 22, 2022 · 8 comments
Assignees
Labels
FSDP FullyShardedDataParallel (zero-3)

Comments

@yuvalkirstain
Copy link

Feature Request

Please support BF16 mixed-precision

Additional context

Training with BF16 is usually more stable than fp16, which is very important when we want to train large models. Additionally, many models (e.g. T5) are trained with BF16 and if we want to continue training them with mixed-precision, using fp16 will result in NaNs.

Thank you!

@anj-s
Copy link
Contributor

anj-s commented Mar 22, 2022

Thank you for this issue! We are currently working on adding support for bf16 and hope to have it done very soon :)

Assuming that you meant support bf16 with FSDP? Or were you thinking of another API?

@anj-s anj-s self-assigned this Mar 22, 2022
@yuvalkirstain
Copy link
Author

Exactly, bf16 with FSDP!

@anj-s anj-s changed the title Support BF16 Support BF16 for FSDP Mar 22, 2022
@anj-s anj-s added the FSDP FullyShardedDataParallel (zero-3) label Mar 22, 2022
@yuvalkirstain
Copy link
Author

@anj-s please let me know if there is anything we can do to help, having support for bf16 with FSDP in Fairseq will really really help us! :)

@yuvalkirstain
Copy link
Author

Hi, has there been any progress with resolving this issue? @anj-s
Thank you so much

@anj-s
Copy link
Contributor

anj-s commented May 25, 2022

Hi, has there been any progress with resolving this issue? @anj-s Thank you so much

Hi @yuvalkirstain, I think this should work without any issues. Can you try using bfloat16 by passing the right compute_dtype argument when using FSDP? Unfortunately i haven't had a chance to add a unit test but perhaps someone else on the team has looked into this. cc @anupambhatnagar @min-xu-ai

@wangleiofficial
Copy link

bfloat16 support with pytorch lighting will be better, do you have this consideration?

@toriving
Copy link

toriving commented Jul 29, 2022

Is there currently any progress on this issue?
Or I'm just wondering if it works if I just apply the above branch.

@anupambhatnagar
Copy link

There has been no progress on this so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FSDP FullyShardedDataParallel (zero-3)
Projects
None yet
Development

No branches or pull requests

5 participants