-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] RuntimeError: Error building extension 'fused_adam' Loading extension module fused_adam
bug
Something isn't working
compression
#5623
opened Jun 6, 2024 by
JinQiangWang2021
[REQUEST] Moving a trainable model with an optimiser between GPU and CPU
enhancement
New feature or request
#5620
opened Jun 5, 2024 by
kfertakis
[BUG] Pipeline Dataloader Samler: Something isn't working
training
shuffle=False
bug
#5619
opened Jun 5, 2024 by
Coobiw
[BUG] ZeRO3 partition parameters after fully load to each GPU!
bug
Something isn't working
training
#5617
opened Jun 5, 2024 by
CHNRyan
# [REQUEST] Upstream modifications of PaRO
enhancement
New feature or request
#5607
opened Jun 3, 2024 by
youshaox
[BUG] cannot import name '_get_socket_with_port' from 'torch.distributed.elastic.agent.server.api'
bug
Something isn't working
training
#5603
opened Jun 3, 2024 by
fahadh4ilyas
[BUG] Zero3 causes AttributeError: 'NoneType' object has no attribute 'numel' in continual training
bug
Something isn't working
training
#5602
opened Jun 3, 2024 by
thkimYonsei
[BUG] CUDA OOM error when Hugging Face Something isn't working
training
ignore_mismatched_sizes
is enabled
bug
#5599
opened May 31, 2024 by
matthewclso
[BUG] M1 Mac has an issue with Something isn't working
build
Improvements to the build and testing systems.
hostname -I
not being a valid command
bug
#5597
opened May 31, 2024 by
AbhinavMir
[REQUEST] Supporting custom generation loop (outlines, LMQL, guidance) in DeepSpeedHybridEngine
enhancement
New feature or request
#5595
opened May 31, 2024 by
Atry
[BUG] RepeatingLoader may be invalid in the pipe stages neither the fist nor last
bug
Something isn't working
training
#5593
opened May 31, 2024 by
janelu9
[BUG] DeepSpeed is loads the whole model to every GPUs instead of partitioning
bug
Something isn't working
inference
#5592
opened May 31, 2024 by
yunoJ
different setting for same (num_gpus * batch_size * grad_accum_steps) output different loss and gradient norm
bug
Something isn't working
training
#5583
opened May 29, 2024 by
SeunghyunSEO
Data Loading for DeepSpeed Ulysses and Data Parallelism
bug
Something isn't working
training
#5582
opened May 29, 2024 by
zijian-hu
[BUG] deepspeed amp seems to convert all input to specific dtype
bug
Something isn't working
training
#5580
opened May 29, 2024 by
rangehow
[BUG] fp6 can‘t load qwen1.5-34b-chat
bug
Something isn't working
inference
#5579
opened May 29, 2024 by
pointerhacker
[BUG]I found that the parameters of model will be fully transferred to the VRAM of each process. Is this abnormal in my understanding?
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#5575
opened May 28, 2024 by
tiandazhao
[BUG] Does deepspeed work with torch amp autocast?
bug
Something isn't working
training
#5573
opened May 28, 2024 by
lqniunjunlper
RuntimeError: Error(s) in loading state_dict
bug
Something isn't working
training
#5570
opened May 27, 2024 by
lxd551326
Use Pipeline Parallelism and get stuck in the mid[BUG]
bug
Something isn't working
training
#5568
opened May 26, 2024 by
HackGiter
[BUG] The specified pointer resides on host memory and is not registered with any CUDA device.
bug
Something isn't working
inference
#5561
opened May 22, 2024 by
La1c
Previous Next
ProTip!
no:milestone will show everything without a milestone.