Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[bug] fix xavier uniform init for output layers
#814 opened May 8, 2024 by hjlee1371 Loading…
Support for Megatron-VLM training
#806 opened May 5, 2024 by 1049451037 Loading…
Add dataset packing
#802 opened May 2, 2024 by shamanez Loading…
fix finalize_model_grads when sp is on
#798 opened Apr 29, 2024 by zhaoyinglia Loading…
Speed up the creation of attention mask
#797 opened Apr 29, 2024 by yuantailing Loading…
Fix incorrect src argument in broadcast_params function
#796 opened Apr 26, 2024 by Yuxin-CV Loading…
modifed the model parreleized gpt pre-trainign script
#789 opened Apr 22, 2024 by shamanez Loading…
forward step missing arg
#784 opened Apr 18, 2024 by malay-nagda Loading…
fix a mistake when check if num_layers dividable by vpp
#781 opened Apr 16, 2024 by constroy Loading…
Fix llama converter.
#777 opened Apr 12, 2024 by Victarry Loading…
Update pretrain_bert.py
#772 opened Apr 9, 2024 by ocryptocode Loading…
[very simple change] Remove duplicated code
#765 opened Apr 3, 2024 by NoelBird Loading…
fix new bucket when param require new bucket
#762 opened Apr 2, 2024 by wangxicoding Loading…
Updated fused_kernels import path
#760 opened Mar 31, 2024 by Yazeed7 Loading…
use new methods for communication
#758 opened Mar 30, 2024 by mayank31398 Loading…
drop redundant check
#757 opened Mar 30, 2024 by mayank31398 Loading…
Fix typo in README.md
#751 opened Mar 26, 2024 by HashiamKadhim Loading…
[BUG FIX] Fix world_size bug in QuickStart Example
#747 opened Mar 22, 2024 by Mr-Philo Loading…
fix torch softmax masking
#731 opened Mar 12, 2024 by JRD971000 Loading…
Support S3 data loading
#729 opened Mar 11, 2024 by jrocmar Loading…
ProTip! Exclude everything labeled bug with -label:bug.