You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Eval batch=1/1289] Eval on lambada_openai/0-shot data
[Eval batch=130/1289] Eval on lambada_openai/0-shot data
[Eval batch=259/1289] Eval on lambada_openai/0-shot data
[Eval batch=387/1289] Eval on lambada_openai/0-shot data
[Eval batch=516/1289] Eval on lambada_openai/0-shot data
[Eval batch=645/1289] Eval on lambada_openai/0-shot data
[Eval batch=774/1289] Eval on lambada_openai/0-shot data
[Eval batch=903/1289] Eval on lambada_openai/0-shot data
[Eval batch=1031/1289] Eval on lambada_openai/0-shot data
[Eval batch=1160/1289] Eval on lambada_openai/0-shot data
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:35: UserWarning: Cannot split tensor of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split tensor of length {len(t)} into batches of size {microbatch_size}. '
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:26: UserWarning: Cannot split list of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split list of length {len(l)} into batches of size {microbatch_size}. '
[Eval batch=1289/1289] Eval on lambada_openai/0-shot data
[Eval batch=1/919] Eval on piqa/10-shot data
[Eval batch=93/919] Eval on piqa/10-shot data
[Eval batch=185/919] Eval on piqa/10-shot data
[Eval batch=276/919] Eval on piqa/10-shot data
[Eval batch=368/919] Eval on piqa/10-shot data
[Eval batch=460/919] Eval on piqa/10-shot data
[Eval batch=552/919] Eval on piqa/10-shot data
[Eval batch=644/919] Eval on piqa/10-shot data
[Eval batch=735/919] Eval on piqa/10-shot data
[Eval batch=827/919] Eval on piqa/10-shot data
[Eval batch=919/919] Eval on piqa/10-shot data
[Eval batch=1/10042] Eval on hellaswag/10-shot data
[Eval batch=1005/10042] Eval on hellaswag/10-shot data
[Eval batch=2009/10042] Eval on hellaswag/10-shot data
[Eval batch=3013/10042] Eval on hellaswag/10-shot data
[Eval batch=4017/10042] Eval on hellaswag/10-shot data
[Eval batch=5022/10042] Eval on hellaswag/10-shot data
[Eval batch=6026/10042] Eval on hellaswag/10-shot data
[Eval batch=7030/10042] Eval on hellaswag/10-shot data
[Eval batch=8034/10042] Eval on hellaswag/10-shot data
[Eval batch=9038/10042] Eval on hellaswag/10-shot data
[Eval batch=10042/10042] Eval on hellaswag/10-shot data
[Eval batch=1/2376] Eval on arc_easy/10-shot data
[Eval batch=238/2376] Eval on arc_easy/10-shot data
[Eval batch=476/2376] Eval on arc_easy/10-shot data
[Eval batch=714/2376] Eval on arc_easy/10-shot data
[Eval batch=951/2376] Eval on arc_easy/10-shot data
[Eval batch=1188/2376] Eval on arc_easy/10-shot data
[Eval batch=1426/2376] Eval on arc_easy/10-shot data
[Eval batch=1664/2376] Eval on arc_easy/10-shot data
[Eval batch=1901/2376] Eval on arc_easy/10-shot data
[Eval batch=2138/2376] Eval on arc_easy/10-shot data
[Eval batch=2376/2376] Eval on arc_easy/10-shot data
[Eval batch=1/1172] Eval on arc_challenge/10-shot data
[Eval batch=118/1172] Eval on arc_challenge/10-shot data
[Eval batch=235/1172] Eval on arc_challenge/10-shot data
[Eval batch=352/1172] Eval on arc_challenge/10-shot data
[Eval batch=469/1172] Eval on arc_challenge/10-shot data
[Eval batch=586/1172] Eval on arc_challenge/10-shot data
[Eval batch=704/1172] Eval on arc_challenge/10-shot data
[Eval batch=821/1172] Eval on arc_challenge/10-shot data
[Eval batch=938/1172] Eval on arc_challenge/10-shot data
[Eval batch=1055/1172] Eval on arc_challenge/10-shot data
[Eval batch=1172/1172] Eval on arc_challenge/10-shot data
[Eval batch=1/50] Eval on copa/0-shot data
[Eval batch=6/50] Eval on copa/0-shot data
[Eval batch=11/50] Eval on copa/0-shot data
[Eval batch=16/50] Eval on copa/0-shot data
[Eval batch=21/50] Eval on copa/0-shot data
[Eval batch=26/50] Eval on copa/0-shot data
[Eval batch=30/50] Eval on copa/0-shot data
[Eval batch=35/50] Eval on copa/0-shot data
[Eval batch=40/50] Eval on copa/0-shot data
[Eval batch=45/50] Eval on copa/0-shot data
[Eval batch=50/50] Eval on copa/0-shot data
[Eval batch=1/1635] Eval on boolq/10-shot data
[Eval batch=164/1635] Eval on boolq/10-shot data
[Eval batch=328/1635] Eval on boolq/10-shot data
[Eval batch=491/1635] Eval on boolq/10-shot data
[Eval batch=655/1635] Eval on boolq/10-shot data
[Eval batch=818/1635] Eval on boolq/10-shot data
[Eval batch=981/1635] Eval on boolq/10-shot data
[Eval batch=1145/1635] Eval on boolq/10-shot data
[Eval batch=1308/1635] Eval on boolq/10-shot data
[Eval batch=1472/1635] Eval on boolq/10-shot data
[Eval batch=1635/1635] Eval on boolq/10-shot data
Ran google/flan-t5-xl eval in: 13817.477584123611 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:252 in │
│ │
│ 249 │ │ yaml_cfg = om.load(f) │
│ 250 │ cli_cfg = om.from_cli(args_list) │
│ 251 │ cfg = om.merge(yaml_cfg, cli_cfg) │
│ ❱ 252 │ main(cfg) │
│ 253 │
│ │
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:126 in main │
│ │
│ 123 │ │ │ │ │ │ │ │ │ │ │ model_gauntlet_df) │
│ 124 │ │ │
│ 125 │ │ if model_gauntlet_callback is not None: │
│ ❱ 126 │ │ │ composite_scores = model_gauntlet_callback.eval_end( │
│ 127 │ │ │ │ None, in_memory_logger) │
│ 128 │ │ │
│ 129 │ │ benchmark_to_taxonomy = {} │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:112 in │
│ eval_end │
│ │
│ 109 │ │ return {k: sum(v) / len(v) for k, v in results.items()} │
│ 110 │ │
│ 111 │ def eval_end(self, state: State, logger: Logger): │
│ ❱ 112 │ │ new_metrics = self.compute_averages(logger) │
│ 113 │ │ composite_scores = {} │
│ 114 │ │ for category in self.categories: │
│ 115 │ │ │ composite_scores[category['name']] = [] │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:92 in │
│ compute_averages │
│ │
│ 89 │ │ │ 'metrics/(.?)/(\d+)-shot(/.?)?/InContextLearning(.*)') │
│ 90 │ │ for key in self.logger_keys: │
│ 91 │ │ │ match = pat.match(key) │
│ ❱ 92 │ │ │ val = logger_data.data[key][0][1].item() │
│ 93 │ │ │ │
│ 94 │ │ │ if match: │
│ 95 │ │ │ │ eval_name = match.group(1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'metrics/lambada_openai/0-shot/InContextLearningLMAccuracy'
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 11800) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 11800) exited with code 1
To reproduce
I pip installed mosaicml and llm-foundry requirements yesterday, and ran the eval.py script on a flan-t5-xl model according to the quickstart guide.
I only changed the max_seq_len, icl_seq_len to 512, model_name_or_path = google/flan-t5-xl, and model name to hf_t5, in hf_eval.yaml and tasks_light.yaml
Expected behavior
Successful benchmarking.
Additional context
I can't figure out why it couldn't find the key in the logger. I lack the experience to dig into it more, so I hope this info is enough for you guys to figure out what's wrong.
By the way, where is the benchmark results saved to?
The text was updated successfully, but these errors were encountered:
Console
[Eval batch=1/1289] Eval on lambada_openai/0-shot data
[Eval batch=130/1289] Eval on lambada_openai/0-shot data
[Eval batch=259/1289] Eval on lambada_openai/0-shot data
[Eval batch=387/1289] Eval on lambada_openai/0-shot data
[Eval batch=516/1289] Eval on lambada_openai/0-shot data
[Eval batch=645/1289] Eval on lambada_openai/0-shot data
[Eval batch=774/1289] Eval on lambada_openai/0-shot data
[Eval batch=903/1289] Eval on lambada_openai/0-shot data
[Eval batch=1031/1289] Eval on lambada_openai/0-shot data
[Eval batch=1160/1289] Eval on lambada_openai/0-shot data
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:35: UserWarning: Cannot split tensor of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split tensor of length {len(t)} into batches of size {microbatch_size}. '
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:26: UserWarning: Cannot split list of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split list of length {len(l)} into batches of size {microbatch_size}. '
[Eval batch=1289/1289] Eval on lambada_openai/0-shot data
[Eval batch=1/919] Eval on piqa/10-shot data
[Eval batch=93/919] Eval on piqa/10-shot data
[Eval batch=185/919] Eval on piqa/10-shot data
[Eval batch=276/919] Eval on piqa/10-shot data
[Eval batch=368/919] Eval on piqa/10-shot data
[Eval batch=460/919] Eval on piqa/10-shot data
[Eval batch=552/919] Eval on piqa/10-shot data
[Eval batch=644/919] Eval on piqa/10-shot data
[Eval batch=735/919] Eval on piqa/10-shot data
[Eval batch=827/919] Eval on piqa/10-shot data
[Eval batch=919/919] Eval on piqa/10-shot data
[Eval batch=1/10042] Eval on hellaswag/10-shot data
[Eval batch=1005/10042] Eval on hellaswag/10-shot data
[Eval batch=2009/10042] Eval on hellaswag/10-shot data
[Eval batch=3013/10042] Eval on hellaswag/10-shot data
[Eval batch=4017/10042] Eval on hellaswag/10-shot data
[Eval batch=5022/10042] Eval on hellaswag/10-shot data
[Eval batch=6026/10042] Eval on hellaswag/10-shot data
[Eval batch=7030/10042] Eval on hellaswag/10-shot data
[Eval batch=8034/10042] Eval on hellaswag/10-shot data
[Eval batch=9038/10042] Eval on hellaswag/10-shot data
[Eval batch=10042/10042] Eval on hellaswag/10-shot data
[Eval batch=1/2376] Eval on arc_easy/10-shot data
[Eval batch=238/2376] Eval on arc_easy/10-shot data
[Eval batch=476/2376] Eval on arc_easy/10-shot data
[Eval batch=714/2376] Eval on arc_easy/10-shot data
[Eval batch=951/2376] Eval on arc_easy/10-shot data
[Eval batch=1188/2376] Eval on arc_easy/10-shot data
[Eval batch=1426/2376] Eval on arc_easy/10-shot data
[Eval batch=1664/2376] Eval on arc_easy/10-shot data
[Eval batch=1901/2376] Eval on arc_easy/10-shot data
[Eval batch=2138/2376] Eval on arc_easy/10-shot data
[Eval batch=2376/2376] Eval on arc_easy/10-shot data
[Eval batch=1/1172] Eval on arc_challenge/10-shot data
[Eval batch=118/1172] Eval on arc_challenge/10-shot data
[Eval batch=235/1172] Eval on arc_challenge/10-shot data
[Eval batch=352/1172] Eval on arc_challenge/10-shot data
[Eval batch=469/1172] Eval on arc_challenge/10-shot data
[Eval batch=586/1172] Eval on arc_challenge/10-shot data
[Eval batch=704/1172] Eval on arc_challenge/10-shot data
[Eval batch=821/1172] Eval on arc_challenge/10-shot data
[Eval batch=938/1172] Eval on arc_challenge/10-shot data
[Eval batch=1055/1172] Eval on arc_challenge/10-shot data
[Eval batch=1172/1172] Eval on arc_challenge/10-shot data
[Eval batch=1/50] Eval on copa/0-shot data
[Eval batch=6/50] Eval on copa/0-shot data
[Eval batch=11/50] Eval on copa/0-shot data
[Eval batch=16/50] Eval on copa/0-shot data
[Eval batch=21/50] Eval on copa/0-shot data
[Eval batch=26/50] Eval on copa/0-shot data
[Eval batch=30/50] Eval on copa/0-shot data
[Eval batch=35/50] Eval on copa/0-shot data
[Eval batch=40/50] Eval on copa/0-shot data
[Eval batch=45/50] Eval on copa/0-shot data
[Eval batch=50/50] Eval on copa/0-shot data
[Eval batch=1/1635] Eval on boolq/10-shot data
[Eval batch=164/1635] Eval on boolq/10-shot data
[Eval batch=328/1635] Eval on boolq/10-shot data
[Eval batch=491/1635] Eval on boolq/10-shot data
[Eval batch=655/1635] Eval on boolq/10-shot data
[Eval batch=818/1635] Eval on boolq/10-shot data
[Eval batch=981/1635] Eval on boolq/10-shot data
[Eval batch=1145/1635] Eval on boolq/10-shot data
[Eval batch=1308/1635] Eval on boolq/10-shot data
[Eval batch=1472/1635] Eval on boolq/10-shot data
[Eval batch=1635/1635] Eval on boolq/10-shot data
Ran google/flan-t5-xl eval in: 13817.477584123611 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:252 in │
│ │
│ 249 │ │ yaml_cfg = om.load(f) │
│ 250 │ cli_cfg = om.from_cli(args_list) │
│ 251 │ cfg = om.merge(yaml_cfg, cli_cfg) │
│ ❱ 252 │ main(cfg) │
│ 253 │
│ │
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:126 in main │
│ │
│ 123 │ │ │ │ │ │ │ │ │ │ │ model_gauntlet_df) │
│ 124 │ │ │
│ 125 │ │ if model_gauntlet_callback is not None: │
│ ❱ 126 │ │ │ composite_scores = model_gauntlet_callback.eval_end( │
│ 127 │ │ │ │ None, in_memory_logger) │
│ 128 │ │ │
│ 129 │ │ benchmark_to_taxonomy = {} │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:112 in │
│ eval_end │
│ │
│ 109 │ │ return {k: sum(v) / len(v) for k, v in results.items()} │
│ 110 │ │
│ 111 │ def eval_end(self, state: State, logger: Logger): │
│ ❱ 112 │ │ new_metrics = self.compute_averages(logger) │
│ 113 │ │ composite_scores = {} │
│ 114 │ │ for category in self.categories: │
│ 115 │ │ │ composite_scores[category['name']] = [] │
│ │
│ /home/codeless/Desktop/llm-foundry/llmfoundry/callbacks/model_gauntlet_callback.py:92 in │
│ compute_averages │
│ │
│ 89 │ │ │ 'metrics/(.?)/(\d+)-shot(/.?)?/InContextLearning(.*)') │
│ 90 │ │ for key in self.logger_keys: │
│ 91 │ │ │ match = pat.match(key) │
│ ❱ 92 │ │ │ val = logger_data.data[key][0][1].item() │
│ 93 │ │ │ │
│ 94 │ │ │ if match: │
│ 95 │ │ │ │ eval_name = match.group(1) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'metrics/lambada_openai/0-shot/InContextLearningLMAccuracy'
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 11800) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 11800) exited with code 1
To reproduce
I pip installed mosaicml and llm-foundry requirements yesterday, and ran the eval.py script on a flan-t5-xl model according to the quickstart guide.
I only changed the max_seq_len, icl_seq_len to 512, model_name_or_path = google/flan-t5-xl, and model name to hf_t5, in hf_eval.yaml and tasks_light.yaml
Expected behavior
Successful benchmarking.
Additional context
I can't figure out why it couldn't find the key in the logger. I lack the experience to dig into it more, so I hope this info is enough for you guys to figure out what's wrong.
By the way, where is the benchmark results saved to?
The text was updated successfully, but these errors were encountered: