Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
numerical difference for SDPA between non-dtensor vs dtensor, when math attention and fp16 are used
bug
Something isn't working
#317
opened May 8, 2024 by
tianyu-l
freqs_cis
in llama model should be a non-persistent buffer
bug
#316
opened May 8, 2024 by
tianyu-l
Question on Model Init
question
Further information is requested
#312
opened May 6, 2024 by
XinDongol
add doc for adding custom dataset
documentation
Improvements or additions to documentation
enhancement
New feature or request
#311
opened May 5, 2024 by
lessw2020
freezeing some part of the model
enhancement
New feature or request
#306
opened May 3, 2024 by
tianyu-l
reload existing llama checkpoints
enhancement
New feature or request
#305
opened May 3, 2024 by
tianyu-l
add config option to only produce tensorboard logs on rank 0
enhancement
New feature or request
#304
opened May 3, 2024 by
tianyu-l
[Feature] Add gradient accumulation
enhancement
New feature or request
#292
opened May 1, 2024 by
XinDongol
[Feature] Plan to add New feature or request
model_register
enhancement
#282
opened Apr 28, 2024 by
XinDongol
numerical issue when running SDPA with DTensor
bug
Something isn't working
help wanted
Extra attention is needed
#267
opened Apr 24, 2024 by
tianyu-l
Fused RMSNorm incompatible with PP tracing (dynamic stride)
bug
Something isn't working
#217
opened Apr 10, 2024 by
wconstab
add unit test for ongoing numerical verification of fusedRMSNorm
better_engineering
Repo code quality improvements
#205
opened Apr 5, 2024 by
lessw2020
Make fused RMSNorm a registered op
bug
Something isn't working
enhancement
New feature or request
#199
opened Apr 5, 2024 by
lessw2020
Verify that we can do eval / inference
enhancement
New feature or request
#192
opened Apr 4, 2024 by
gnadathur
Add support for MoE model architecture
enhancement
New feature or request
#184
opened Apr 2, 2024 by
gnadathur
Starting off with different models across ranks and FSDP doesn't synchronise
bug
Something isn't working
#166
opened Mar 26, 2024 by
BadrYoubiIdrissi
Loss curve spikes on amalagamated datasets - need full scale shuffler in dataloader
enhancement
New feature or request
metrics - add L1 gradient norm tracking
enhancement
New feature or request
#119
opened Mar 8, 2024 by
lessw2020
consider - enable streaming attention as default for llama models (1-4M context)
enhancement
New feature or request
#86
opened Feb 25, 2024 by
lessw2020
add 'dry run' flag - one iter, no saving, as quick proof to check basic perf and verify that environ is ready to go
enhancement
New feature or request
#73
opened Feb 23, 2024 by
lessw2020
Python Zip and Strict = True is 3.10 only...fails on 3.9 with TypeError: zip() takes no keyword arguments
documentation
Improvements or additions to documentation
#62
opened Feb 16, 2024 by
lessw2020
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-04-23.