NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2k
Star 9k

Code
Issues 285
Pull requests 128
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

285 Open 266 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage?

#861 opened Jun 8, 2024 by janelu9

When H800 is trained with FP8, the performance is not significantly improved compared to FP16, and is even worse than FP16.

#860 opened Jun 6, 2024 by yangzhipeng1108

Projeto liliti stk 3.6.9 acabou

#859 opened Jun 5, 2024 by felipeliliti

[BUG] Mismatch Between Docstring and Behavior in core.tensor_parallel.random.model_parallel_cuda_manual_seed

#858 opened Jun 5, 2024 by cong-bai

[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?

#856 opened Jun 5, 2024 by liangshaopeng

[BUG] Megatron Core example not working

#855 opened Jun 3, 2024 by schheda1

[ENHANCEMENT] update black version

#853 opened Jun 3, 2024 by hwdef

[QUESTION] Question about resume with distributed optimizer

#851 opened Jun 1, 2024 by WailordHe

[QUESTION] Does Megatron-LM supports P100?

#849 opened May 29, 2024 by gaokaiz2

Fonte facilitada em fractal 2030

#843 opened May 28, 2024 by felipeliliti

[BUG]

#842 opened May 28, 2024 by felipeliliti

Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors

#841 opened May 27, 2024 by Eisenhower

[BUG] GroupedMLP calculation problem.

#839 opened May 27, 2024 by Baibaifan

[BUG] The problems with bucket and shared_embedding.

#835 opened May 23, 2024 by Baibaifan

[BUG] Checkpoint saving is slow for zarr backend + distributed optimizer

#834 opened May 22, 2024 by chotzen

[QUESTION] Why enable non_blocking=True when doing synchronous D2H?

#833 opened May 22, 2024 by raywan-110

[QUESTION] How to Obtain Computation Model Graphs in Megatron-LM?

#832 opened May 19, 2024 by fwyc0573

[BUG] Modify FLOPs in MFU calculation for casual mask when using FlashAttention.

#831 opened May 17, 2024 by Yuxin-CV

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline

#830 opened May 17, 2024 by Hongjie1Chu

[QUESTION] Why not use tensor parallel APIs of pytorch

#829 opened May 16, 2024 by GuWei007

[QUESTION] how to profile bubble time in pipeline parallelism?

#828 opened May 15, 2024 by starstream

[BUG]

#827 opened May 14, 2024 by chrisgao7

[BUG] The argument --no-position-embedding should be fixed

#826 opened May 14, 2024 by Hoonly

[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,

#825 opened May 14, 2024 by starkhu

Does Megatron has plan to support llama pre-train？

#824 opened May 13, 2024 by wen020

Previous 1 2 3 4 5 … 11 12 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly