Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval.py error while benchmarking T5 #460

Open
sigjhl opened this issue Jul 14, 2023 · 1 comment
Open

eval.py error while benchmarking T5 #460

sigjhl opened this issue Jul 14, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sigjhl
Copy link

sigjhl commented Jul 14, 2023

Console

[Eval batch=1/1289] Eval on lambada_openai/0-shot data
[Eval batch=130/1289] Eval on lambada_openai/0-shot data
[Eval batch=259/1289] Eval on lambada_openai/0-shot data
[Eval batch=387/1289] Eval on lambada_openai/0-shot data
[Eval batch=516/1289] Eval on lambada_openai/0-shot data
[Eval batch=645/1289] Eval on lambada_openai/0-shot data
[Eval batch=774/1289] Eval on lambada_openai/0-shot data
[Eval batch=903/1289] Eval on lambada_openai/0-shot data
[Eval batch=1031/1289] Eval on lambada_openai/0-shot data
[Eval batch=1160/1289] Eval on lambada_openai/0-shot data
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:35: UserWarning: Cannot split tensor of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split tensor of length {len(t)} into batches of size {microbatch_size}. '
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:26: UserWarning: Cannot split list of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split list of length {len(l)} into batches of size {microbatch_size}. '
[Eval batch=1289/1289] Eval on lambada_openai/0-shot data
[Eval batch=1/919] Eval on piqa/10-shot data
[Eval batch=93/919] Eval on piqa/10-shot data
[Eval batch=185/919] Eval on piqa/10-shot data
[Eval batch=276/919] Eval on piqa/10-shot data
[Eval batch=368/919] Eval on piqa/10-shot data
[Eval batch=460/919] Eval on piqa/10-shot data
[Eval batch=552/919] Eval on piqa/10-shot data
[Eval batch=644/919] Eval on piqa/10-shot data
[Eval batch=735/919] Eval on piqa/10-shot data
[Eval batch=827/919] Eval on piqa/10-shot data
[Eval batch=919/919] Eval on piqa/10-shot data
[Eval batch=1/10042] Eval on hellaswag/10-shot data
[Eval batch=1005/10042] Eval on hellaswag/10-shot data
[Eval batch=2009/10042] Eval on hellaswag/10-shot data
[Eval batch=3013/10042] Eval on hellaswag/10-shot data
[Eval batch=4017/10042] Eval on hellaswag/10-shot data
[Eval batch=5022/10042] Eval on hellaswag/10-shot data
[Eval batch=6026/10042] Eval on hellaswag/10-shot data
[Eval batch=7030/10042] Eval on hellaswag/10-shot data
[Eval batch=8034/10042] Eval on hellaswag/10-shot data
[Eval batch=9038/10042] Eval on hellaswag/10-shot data
[Eval batch=10042/10042] Eval on hellaswag/10-shot data
[Eval batch=1/2376] Eval on arc_easy/10-shot data
[Eval batch=238/2376] Eval on arc_easy/10-shot data
[Eval batch=476/2376] Eval on arc_easy/10-shot data
[Eval batch=714/2376] Eval on arc_easy/10-shot data
[Eval batch=951/2376] Eval on arc_easy/10-shot data
[Eval batch=1188/2376] Eval on arc_easy/10-shot data
[Eval batch=1426/2376] Eval on arc_easy/10-shot data
[Eval batch=1664/2376] Eval on arc_easy/10-shot data
[Eval batch=1901/2376] Eval on arc_easy/10-shot data
[Eval batch=2138/2376] Eval on arc_easy/10-shot data
[Eval batch=2376/2376] Eval on arc_easy/10-shot data
[Eval batch=1/1172] Eval on arc_challenge/10-shot data
[Eval batch=118/1172] Eval on arc_challenge/10-shot data
[Eval batch=235/1172] Eval on arc_challenge/10-shot data
[Eval batch=352/1172] Eval on arc_challenge/10-shot data
[Eval batch=469/1172] Eval on arc_challenge/10-shot data
[Eval batch=586/1172] Eval on arc_challenge/10-shot data
[Eval batch=704/1172] Eval on arc_challenge/10-shot data
[Eval batch=821/1172] Eval on arc_challenge/10-shot data
[Eval batch=938/1172] Eval on arc_challenge/10-shot data
[Eval batch=1055/1172] Eval on arc_challenge/10-shot data
[Eval batch=1172/1172] Eval on arc_challenge/10-shot data
[Eval batch=1/50] Eval on copa/0-shot data
[Eval batch=6/50] Eval on copa/0-shot data
[Eval batch=11/50] Eval on copa/0-shot data
[Eval batch=16/50] Eval on copa/0-shot data
[Eval batch=21/50] Eval on copa/0-shot data
[Eval batch=26/50] Eval on copa/0-shot data
[Eval batch=30/50] Eval on copa/0-shot data
[Eval batch=35/50] Eval on copa/0-shot data
[Eval batch=40/50] Eval on copa/0-shot data
[Eval batch=45/50] Eval on copa/0-shot data
[Eval batch=50/50] Eval on copa/0-shot data
[Eval batch=1/1635] Eval on boolq/10-shot data
[Eval batch=164/1635] Eval on boolq/10-shot data
[Eval batch=328/1635] Eval on boolq/10-shot data
[Eval batch=491/1635] Eval on boolq/10-shot data
[Eval batch=655/1635] Eval on boolq/10-shot data
[Eval batch=818/1635] Eval on boolq/10-shot data
[Eval batch=981/1635] Eval on boolq/10-shot data
[Eval batch=1145/1635] Eval on boolq/10-shot data
[Eval batch=1308/1635] Eval on boolq/10-shot data
[Eval batch=1472/1635] Eval on boolq/10-shot data
[Eval batch=1635/1635] Eval on boolq/10-shot data
Ran google/flan-t5-xl eval in: 13817.477584123611 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:252 in │
│ │
│ 249 │ │ yaml_cfg = om.load(f) │
│ 250 │ cli_cfg = om.from_cli(args_list) │
│ 251 │ cfg = om.merge(yaml_cfg, cli_cfg) │
│ ❱ 252 │ main(cfg) │
│ 253 │
│ │
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:126 in main │
│ │
│ 123 │ │ │ │ │ │ │ │ │ │ │ model_gauntlet_df) │
│ 124 │ │ │
│ 125 │ │ if model_gauntlet_callback is not None: │
│ ❱ 126 │ │ │ composite_scores = model_gauntlet_callback.eval_end( │
│ 127 │ │ │ │ None, in_memory_logger) │
│ 128 │ │ │
│ 129 │ │ benchmark_to_taxonomy = {} │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:112 in │
│ eval_end │
│ │
│ 109 │ │ return {k: sum(v) / len(v) for k, v in results.items()} │
│ 110 │ │
│ 111 │ def eval_end(self, state: State, logger: Logger): │
│ ❱ 112 │ │ new_metrics = self.compute_averages(logger) │
│ 113 │ │ composite_scores = {} │
│ 114 │ │ for category in self.categories: │
│ 115 │ │ │ composite_scores[category['name']] = [] │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:92 in │
│ compute_averages │
│ │
│ 89 │ │ │ 'metrics/(.?)/(\d+)-shot(/.?)?/InContextLearning(.*)') │
│ 90 │ │ for key in self.logger_keys: │
│ 91 │ │ │ match = pat.match(key) │
│ ❱ 92 │ │ │ val = logger_data.data[key][0][1].item() │
│ 93 │ │ │ │
│ 94 │ │ │ if match: │
│ 95 │ │ │ │ eval_name = match.group(1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'metrics/lambada_openai/0-shot/InContextLearningLMAccuracy'
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 11800) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 11800) exited with code 1

To reproduce

I pip installed mosaicml and llm-foundry requirements yesterday, and ran the eval.py script on a flan-t5-xl model according to the quickstart guide.
I only changed the max_seq_len, icl_seq_len to 512, model_name_or_path = google/flan-t5-xl, and model name to hf_t5, in hf_eval.yaml and tasks_light.yaml

Expected behavior

Successful benchmarking.

Additional context

I can't figure out why it couldn't find the key in the logger. I lack the experience to dig into it more, so I hope this info is enough for you guys to figure out what's wrong.

By the way, where is the benchmark results saved to?

@sigjhl sigjhl added the bug Something isn't working label Jul 14, 2023
@hanlint
Copy link
Collaborator

hanlint commented Jul 23, 2023

cc: @bmosaicml who worked on the evaluation code, to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants