MEAL: Stable and Active Learning for Few-Shot Prompting

The code and data for our EMNLP 2023 Findings paper: MEAL (paper).

Our key-findings are:

Prompt-based fine-tuning faces major instability issues related to run variability and training data selection. See the figure above, the accuracy may vary from 52% to 78%.
Multiprompt finetuning and ensembling techniques improve the run variability significantly (see Table 1 in the paper).
We evaluate various active-learning / data selection strategies to attack variability of training data selection. We propose a novel strategy, IPUSD, relying on variance across different prompts. IPUSD outperforms other active learning strategies both in terms of accuracy and variance (see the table below).

Few-shot Active Learning / Data Selection Strategies

Our modified active learning pipeline for data selection is illustrated with an example sentence and two prompts for sentiment analysis. The PLM outputs several features in a zero-shot manner. AL selects a few-shot training set based on these output features.

Results

	Acc ↑	Rank ↓	Div. ↑	Repr. ↑	Ent. ↓
Random	72.6±2.8	4.0	13.6	17.6	2.0
Entropy	70.9	6.4	13.3	16.9	6.1
LC	70.9	5.6	13.5	17.2	5.3
BT	72.1	4.0	13.4	17.1	5.6
PP-KL (Ours)	69.1	5.6	13.4	16.9	9.0
CAL	70.4	4.4	13.1	17.1	23.5
BADGE	73.2±3.3	3.0	13.6	17.6	2.2
IPUSD (Ours)	73.9±2.3	3.0	13.5	17.6	2.0

IPUSD, our proposed data selection strategy, for few-shot prompting achieves higher accuracy while proposing much lower variance across RTE, SST-2, SST-5, TREC, and MRPC. We show that heuristics like random or highest entropy would lead to much lower performance.

Check out the data splits for different active learning strategies (including unlabeled and evaluation data splits) in the Datasets folder.

Citation

@inproceedings{koksal-etal-2023-meal,
    title = "{MEAL}: Stable and Active Learning for Few-Shot Prompting",
    author = {K{\"o}ksal, Abdullatif  and
      Schick, Timo  and
      Schuetze, Hinrich},
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.36",
    doi = "10.18653/v1/2023.findings-emnlp.36",
    pages = "506--517"
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Datasets		Datasets
Figures		Figures
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

Datasets

Figures

Figures

.gitignore

.gitignore

README.md

README.md

Repository files navigation

MEAL: Stable and Active Learning for Few-Shot Prompting

Few-shot Active Learning / Data Selection Strategies

Results

Citation

About

akoksal/MEAL

Folders and files

Latest commit

History

Repository files navigation

MEAL: Stable and Active Learning for Few-Shot Prompting

Few-shot Active Learning / Data Selection Strategies

Results

Citation

About

Topics

Resources

Stars

Watchers

Forks