Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt #14

zxyscz · 2023-08-25T18:24:25Z

i want use Megatron-LM fine-tuning, but i run the process, get. a error : No such file or directory model_optim_rng.pt model_optim_rng.pt

Muennighoff · 2023-08-25T18:55:22Z

Hmm do you have --no_load_optim and --no_load_rng in your script?

model_optim_rng.pt files are not needed and not in the checkpoint I think

zxyscz · 2023-08-30T04:46:02Z

Hmm do you have --no_load_optim and --no_load_rng in your script?

model_optim_rng.pt files are not needed and not in the checkpoint I think

i run now ,but i did not how to merge many chekpoint and convert to hugging face format, can you help me?

Muennighoff · 2023-08-30T05:45:34Z

This is the script for merging & converting:

octopack/training/convert_large.sh

Line 4 in 4f0e261

### MERGE ###

Let me know if it does not work for you

zxyscz · 2023-08-30T06:44:02Z

yes, but i should git clone which branch , is this bigcode-project/Megatron-LM#40? , when i use mft branch , it did not run.

Muennighoff · 2023-08-30T07:26:06Z

Yeah that one; It is already merged into main so you can probably also use the main branch
It was merged slightly after the mtf branch was created hence the code is not in the mtf branch, but you can maybe also emerge main into the mtf branch if you want to

zxyscz · 2023-08-30T08:44:04Z

Yeah that one; It is already merged into main so you can probably also use the main branch It was merged slightly after the mtf branch was created hence the code is not in the mtf branch, but you can maybe also emerge main into the mtf branch if you want to

thx

zxyscz · 2023-08-31T12:56:47Z

Yeah that one; It is already merged into main so you can probably also use the main branch It was merged slightly after the mtf branch was created hence the code is not in the mtf branch, but you can maybe also emerge main into the mtf branch if you want to

i have a question, i find that humaneval python@1 value reduced a lot after fintune.

Muennighoff · 2023-08-31T13:09:42Z

Yeah that one; It is already merged into main so you can probably also use the main branch It was merged slightly after the mtf branch was created hence the code is not in the mtf branch, but you can maybe also emerge main into the mtf branch if you want to

i have a question, i find that humaneval python@1 value reduced a lot after fintune.

Yeah that's why we only fine-tune for few steps, e.g. OctoCoder is only fine-tuned for 2M tokens.

zxyscz · 2023-08-31T16:12:32Z

Yeah that one; It is already merged into main so you can probably also use the main branch It was merged slightly after the mtf branch was created hence the code is not in the mtf branch, but you can maybe also emerge main into the mtf branch if you want to

i have a question, i find that humaneval python@1 value reduced a lot after fintune.

Yeah that's why we only fine-tune for few steps, e.g. OctoCoder is only fine-tuned for 2M tokens.
In addition ,I evaluated starcoderbase-25000 step , the humaneval pass@1 value was 25%, it is lower than 30%. Is it because open source starcoderbase-megatron lm is not final checkpoint?

Muennighoff · 2023-08-31T19:04:00Z

What script are you using to evaluate it? That may explain the small difference.
It should be the final checkpoint.

zxyscz · 2023-09-01T00:16:23Z

What script are you using to evaluate it? That may explain the small difference.
It should be the final checkpoint.

First,I convert this checkpoint to hf format, then using greedy decoding to evaluate.

zxyscz · 2023-09-01T04:22:06Z

i convert to hf format , is it right ?

Muennighoff · 2023-09-01T05:55:01Z

Yeah that looks correct. I think for pass@1 HumanEval StarCoder is evaluated using temperature=0.2. Also I would set n_samples=20.

zxyscz · 2023-09-01T07:50:31Z

Yeah that looks correct. I think for pass@1 HumanEval StarCoder is evaluated using temperature=0.2. Also I would set n_samples=20.
I convert megatron model to hf as shown below, loading models is slow.

How can i convert model many partition like this：

Muennighoff · 2023-09-01T08:14:24Z

You have to shard it into multiple files when saving it

zxyscz · 2023-09-01T09:45:06Z

You have to shard it into multiple files when saving it

How to shard it into multiple files? Is there any code to refer to？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt #14

Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt #14

zxyscz commented Aug 25, 2023

Muennighoff commented Aug 25, 2023

zxyscz commented Aug 30, 2023

Muennighoff commented Aug 30, 2023

zxyscz commented Aug 30, 2023

Muennighoff commented Aug 30, 2023

zxyscz commented Aug 30, 2023

zxyscz commented Aug 31, 2023

Muennighoff commented Aug 31, 2023

zxyscz commented Aug 31, 2023

Muennighoff commented Aug 31, 2023

zxyscz commented Sep 1, 2023

zxyscz commented Sep 1, 2023

Muennighoff commented Sep 1, 2023

zxyscz commented Sep 1, 2023 •

edited

Muennighoff commented Sep 1, 2023

zxyscz commented Sep 1, 2023

Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt #14

Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt #14

Comments

zxyscz commented Aug 25, 2023

Muennighoff commented Aug 25, 2023

zxyscz commented Aug 30, 2023

Muennighoff commented Aug 30, 2023

zxyscz commented Aug 30, 2023

Muennighoff commented Aug 30, 2023

zxyscz commented Aug 30, 2023

zxyscz commented Aug 31, 2023

Muennighoff commented Aug 31, 2023

zxyscz commented Aug 31, 2023

Muennighoff commented Aug 31, 2023

zxyscz commented Sep 1, 2023

zxyscz commented Sep 1, 2023

Muennighoff commented Sep 1, 2023

zxyscz commented Sep 1, 2023 • edited

Muennighoff commented Sep 1, 2023

zxyscz commented Sep 1, 2023

zxyscz commented Sep 1, 2023 •

edited