Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] Uneven work distribution caused by get_shard_size changes
#5515
opened May 9, 2024 by
oelayan7
[BUG] When initializing model_engine, if an mpu is specified, it can lead to an excessively large checkpoint size, and the checkpoint may not be convertible through the Something isn't working
training
zero_to_fp32.py
script.
bug
#5514
opened May 9, 2024 by
Kwen-Chen
FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'
bug
Something isn't working
compression
#5511
opened May 9, 2024 by
1148514800
[REQUEST] Launcher mode with SSH bypass
enhancement
New feature or request
#5510
opened May 8, 2024 by
dogacancolak-kensho
[BUG] Mismatch between dtype settings in model and ds_config results in NaN loss
bug
Something isn't working
training
#5509
opened May 8, 2024 by
Taiki-azrs
[REQUEST] Enable both CPU and NVMe for optimizer
enhancement
New feature or request
#5508
opened May 8, 2024 by
shanhx2000
[BUG] Unexpected High Memory Usage (OOM) when finetuning Llama2-7B
bug
Something isn't working
training
#5507
opened May 8, 2024 by
shanhx2000
[BUG] 3 GPUs is not as good as expectation compare with 2 GPUs; NV vs AMD performace; flash attention not support for AMD GPUs
bug
Something isn't working
training
#5503
opened May 6, 2024 by
0781532
[BUG] Jamba (Mamba+MoE) + ZeRO3 + LoRA training hangs
bug
Something isn't working
training
#5502
opened May 6, 2024 by
hijkzzz
[REQUEST] Any arguments for disabling saving global steps?
enhancement
New feature or request
#5499
opened May 4, 2024 by
annopackage
[REQUEST] Add documentation on how to run fast inference of New feature or request
transformers
models with ZeRO-3
enhancement
#5498
opened May 3, 2024 by
lewtun
[BUG] import deepspeed, MissingCUDAException
bug
Something isn't working
build
Improvements to the build and testing systems.
#5497
opened May 3, 2024 by
zsaladin
[BUG] Memory Leak in Stage 2 Optimizer
bug
Something isn't working
training
#5496
opened May 2, 2024 by
chiragjn
[BUG] Training crashes with "'Tensor' object has no attribute 'ds_id'"
bug
Something isn't working
training
#5495
opened May 2, 2024 by
oscarkey
[BUG] Frozen Parameters not saved when bf16 enabled but are when fp16 enabled
bug
Something isn't working
training
#5489
opened May 1, 2024 by
ethansmith2000
[REQUEST] How to finetune ONLY certain subset of the network parameters
enhancement
New feature or request
#5486
opened Apr 30, 2024 by
JasonLeeFdu
[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting)
bug
Something isn't working
training
#5485
opened Apr 30, 2024 by
JasonLeeFdu
[BUG] Deepspeed memory allocation estimation different than real!
bug
Something isn't working
training
#5484
opened Apr 30, 2024 by
mmarouen
RuntimeError: cannot pin 'CUDABFloat16Type' only dense CPU tensors can be pinned
bug
Something isn't working
training
#5472
opened Apr 27, 2024 by
cooper12121
[REQUEST] Use python sysconfig to generate CFLAGs
enhancement
New feature or request
#5471
opened Apr 26, 2024 by
williamtwomey
Previous Next
ProTip!
no:milestone will show everything without a milestone.