Italian version of the BHH Dataset

Dataset based on the Italian translation provided by:

Leonardo Ranaldi, Giulia Pucci, Elena Sofia Ruzzetti, Fabio Massimo Zanzotto, and André Freitas - Teasing LLMs adapted to Italian

Citations

@article{suzgun2022challenging,
title={Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them},
author={Suzgun, Mirac and Scales, Nathan and Sch{\"a}rli, Nathanael and Gehrmann, Sebastian and Tay, Yi and Chung, Hyung Won and Chowdhery, Aakanksha and Le, Quoc V and Chi, Ed H and Zhou, Denny and and Wei, Jason},
journal={arXiv preprint arXiv:2210.09261},
year={2022}
}

@inproceedings{RanaldiPRZF23,
author       = {Leonardo Ranaldi and
                Giulia Pucci and
                Elena Sofia Ruzzetti and
                Fabio Massimo Zanzotto and
                Andr{\'{e}} Freitas},
title        = {Teasing LLMs Adapted to Italian},
booktitle    = {Proceedings of the 9th Italian Conference on Computational Linguistics,
                Venice, Italy, November 30 - December 2, 2023},
series       = {{CEUR} Workshop Proceedings},
volume       = {3596},
publisher    = {CEUR-WS.org},
year         = {2023},
url          = {https://ceur-ws.org/Vol-3596/short18.pdf},
timestamp    = {Tue, 02 Jan 2024 17:44:44 +0100},
}

@misc{basile2023llamantino,
    title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, 
    author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
    year={2023},
    eprint={2312.09993},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Description

BBH focuses on a suite of 23 challenging BIG-Bench tasks which we call BIG-Bench Hard (BBH). These are the task for which prior language model evaluations did not outperform the average human-rater. We find that applying chain-of-thought (CoT) prompting to BBH tasks enables PaLM to surpass the average humanrater performance on 10 of the 23 tasks, and Codex (code-davinci-002) to surpass the average human-rater performance on 17 of the 23 tasks. Since many tasks in BBH require multi-step reasoning, few-shot prompting without CoT, as done in the BIG-Bench evaluations (Srivastava et al., 2022), substantially underestimates the best performance and capabilities of language models, which is better captured via CoT prompting. As further analysis, we explore the interaction between CoT and model scale on BBH, finding that CoT enables emergent task performance on several BBH tasks with otherwise flat scaling curves.

Homepage

URL: https://github.com/suzgunmirac/BIG-Bench-Hard

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/train		data/train
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/train

data/train

README.md

README.md

Repository files navigation

Italian version of the BHH Dataset

Citations

Description

Homepage

About

Releases

Packages

marcopoli/bbh_ita

Folders and files

Latest commit

History

data/train

data/train

README.md

README.md

Repository files navigation

Italian version of the BHH Dataset

Citations

Description

Homepage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages