Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hello, I meet a problem #386

Open
etoilestar opened this issue May 22, 2023 · 8 comments
Open

hello, I meet a problem #386

etoilestar opened this issue May 22, 2023 · 8 comments

Comments

@etoilestar
Copy link

hello, when I run script to train gpt model,I meet an assertion error:Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed. the script I used is https://github.com/bigscience-workshop/Megatron-DeepSpeed#deepspeed-pp-and-zero-dp. can you tell me why?

@tjruwase
Copy link
Collaborator

Can you please share the assertion message and stack trace?

@tjruwase
Copy link
Collaborator

Please try https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/run_bf16.sh or the equivalent run_fp16.sh

@etoilestar
Copy link
Author

ok, I will have a try. on the other hand, I cannot find BF16Optimizer mentioned at https://huggingface.co/blog/zh/bloom-megatron-deepspeed#bf16optimizer, could you give me some tips?

@tjruwase
Copy link
Collaborator

@hymie122
Copy link

hymie122 commented Jun 6, 2023

I met the same problem when I was following the "start_fast.md".I want to know how to solve the question,Thank you!

@AoZhang
Copy link

AoZhang commented Jun 13, 2023

comment line 429 args=args in megatron/training.py will solve this problem.

model, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model[0],
    optimizer=optimizer,
    lr_scheduler=lr_scheduler,
    config=config,
    #args=args,
)

@murphypei
Copy link

deepspeed.initialize can't be given both config and args.deepspeed_config, you should remove one of them.

@divisionblur
Copy link

comment line 429 args=args in megatron/training.py will solve this problem.

model, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model[0],
    optimizer=optimizer,
    lr_scheduler=lr_scheduler,
    config=config,
    #args=args,
)

jesus!!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants