Contrastive Active Learning (CAL)

Active Learning by Acquiring Contrastive Examples
Katerina Margatina, Giorgos Vernikos, Loic Barrault, Nikolaos Aletras
Empirical Methods in Natural Language Processing (EMNLP) 2021.

Quick Overview

In our paper, we propose a new acquisition function for active learning namely CAL: Contrastive Active Learning. This repository contains code for running active learning with our proposed acquisition function, CAL, and other baselines.

Acquisition functions

Specifically, there is code for running active learning with the following acquisition functions: CAL, Entropy, Least Confidence, BALD, BatchBALD, ALPS, BADGE, BertKM and Random sampling.

Tasks & Datasets

We evaluate the aforementioned AL algorithms in 4 Natural Lagnuage Processing (NLP) tasks and 7 datasets.

Sentiment analysis: SST-2, IMDB
Topic classification: AGNEWS, DBPEDIA, PUBMED
Natural language inference: QNLI
Paraphrase detection: QQP

Models

So far we have used only BERT-BASE, but the code can support any other model (e.g. from HuggingFace) with minimal changes.

Installation

This project is implemented with Python 3, PyTorch 1.9.0 and transformers 3.1.0.

Create Environment (Optional): Ideally, you should create a conda environment for the project.

conda create -n cal python=3.7
conda activate cal

Also install the required torch package(*):

conda install pytorch==1.9.0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

Finally install the rest of the requirements:

pip install -r requirements.txt

(*) Please check here for information on how to install properly the required pytorch version for your machine (cuda). This is important! Do not copy paste the above line without first checking which cuda version is supported by your machine. You can run nvcc --version in your terminal to check it.

Download data

In order to download the datasets we used run the following script:

bash get_data.sh

DBPedia is too large so dowload it manually from here.

Organization

The repository is organizes as follows:

acquisition: implementation of acquisition functions
analysis: scrips for analysis (see §6 of our paper)
cache: models downloaded from HuggingFace
checkpoints: model checkpoints
data: datasets
utilities: scripts for helpers (e.g. data loaders and processors)

Usage

The main script to run any AL experiment is run_al.py.

Example:

python run_al.py --dataset_name sst-2 --acquisition cal

Acknowledgements

We would like to thank the community for releasing their code! This repository contains code from HuggingFace, ALPS, and BatchBALD repositories.

Reference

Please feel free to cite our paper if you use our code or proposed algorithm.:blush:

@inproceedings{margatina-etal-2021-active,
    title = "Active Learning by Acquiring Contrastive Examples",
    author = {Margatina, Katerina  and
      Vernikos, Giorgos  and
      Barrault, Lo{\"\i}c  and
      Aletras, Nikolaos},
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.51",
    pages = "650--663",
    abstract = "Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.",
}

Contact

Please feel free to raise an issue or contact me in case you require any help setting up the repo!:blush:

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
acquisition		acquisition
analysis		analysis
cache		cache
checkpoints		checkpoints
data		data
utilities		utilities
LICENSE		LICENSE
README.md		README.md
cal.png		cal.png
datasets.png		datasets.png
get_data.sh		get_data.sh
requirements.txt		requirements.txt
results.png		results.png
run_al.py		run_al.py
sys_config.py		sys_config.py

License

mourga/contrastive-active-learning

Folders and files

Latest commit

History

Repository files navigation

Contrastive Active Learning (CAL)

Quick Overview

Acquisition functions

Tasks & Datasets

Models

Installation

Download data

Organization

Usage

Acknowledgements

Reference

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages