kfolden-ood-detection

The repository contains the code for the recent research advances at Shannon.AI.

kFolden: k-Fold Ensemble for Out-Of-Distribution Detection
Xiaoya Li, Jiwei Li, Xiaofei Sun, Chun Fan, Tianwei Zhang, Fei Wu, Yuxian Meng and Jun Zhang
EMNLP 2021, paper
If you find this repository helpful, please cite the following:

 @article{li2021k,
  title={$ k $ Folden: $ k $-Fold Ensemble for Out-Of-Distribution Detection},
  author={Li, Xiaoya and Li, Jiwei and Sun, Xiaofei and Fan, Chun and Zhang, Tianwei and Wu, Fei and Meng, Yuxian and Zhang, Jun},
  journal={arXiv preprint arXiv:2108.12731},
  year={2021}
}

Benchmarks

In this paper, we construct semantic shift and non-semantic shift benchmarks for out-of-distribution detection.
You can download the benchmarks following this guidline. This repository contains code and scripts for generating our benchmarks from their original datafiles.
The unzipped dataset directory should have the following structure:

<benchmark-name>
├── dev
│   ├── id_dev.csv
│   └── ood_dev.csv
├── test
│   ├── id_test.csv
│   └── ood_test.csv
└── train
    └── train.csv

Every dataset directory contains three subdirectories train/, dev/, and test/, each containing the randomly sampled training, development, and testing subsets, respectively.
For example, the testing set for in-distribution can be found in the <benchmark-name>/test/id_test.csv file. And the <benchmark-name>/test/ood_test.csv file contains out-of-distribution test data instances.
More details can be found in the paper (Section 5 and Appendix).

Requirements

If you are working on a GPU machine with CUDA 10.1, please run the following command to setup environment.

$ conda create -n kfolden-env python=3.6
$ conda activate kfolden-env
$ pip3 install -r requirements.txt 
$ pip3 install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Notice: please check your CUDA version and install compatible pytorch referring to pytorch.org.

1. Download Glove, BERT, and RoBERTa

Before start training models on benchmark datasets, please first download required files (e.g., Glove, BERT, RoBERTa).

For CNN/LSTM models, run bash ./scripts/download/word_embedding.sh to obtain Glove-300D the vocab file and the weight file.
For pretrained mlm models, run bash ./scripts/download/pretrained_lm.sh to obtain BERT and RoBERTa model files.

2. Train and Evaluate

Please change DATA_DIR, BERT_DIR, OUTPUT_DIR to your own data directory, BERT/RoBERTa directory and output directory, respectively.

2.1 Vanilla Models

For CNN/LSTM models, scripts for reproducing experimental results can be found under the ./scripts/<dataset_name>/vanilla/ folder.
During training, the trainer saves intermediate logs to the $OUTPUT_DIR/eval_result_log.txt file.
After training, the trainer loads the best_ckpt_on_dev model and evaluates it on in-distribution and out-of-distribution test sets. Evaluation results are saved to $OUTPUT_DIR/eval_result_log.txt.
For pretrained masked lm models, scripts for reproducing experimental results can be found under the ./scripts/<dataset_name>/vanilla/ folder.
During training, the trainer saves intermediate logs to the $OUTPUT_DIR/eval_result_log.txt file.
After training, the trainer loads the best_ckpt_on_dev model and evaluates it on in-distribution and out-of-distribution test sets. Evaluation results are saved to $OUTPUT_DIR/eval_result_log.txt.

2.2 kFolden Models

k denotes the number of labels for in-distribution data.

For CNN/LSTM models, scripts for reproducing experimental results can be found under the ./scripts/<dataset_name>/kfolden/ folder.
During training, the trainer creates k subfolders under $OUTPUT_DIR (from 0 to k-1) and saves intermediate logs to the $OUTPUT_DIR/eval_result_log.txt file.
After training, the trainer loads k best_ckpt_on_dev models and evaluates them on in-distribution and out-of-distribution test sets. Evaluation results are saved to $OUTPUT_DIR/eval_result_log.txt.
For pretrained mlm models, scripts for reproducing experimental results can be found under the ./scripts/<dataset_name>/kfolden/ folder.
During training, the trainer creates k subfolders under $OUTPUT_DIR (from 0 to k-1) and saves intermediate logs to the $OUTPUT_DIR/eval_result_log.txt file.
After training, the trainer loads k best_ckpt_on_dev models and evaluates them on in-distribution and out-of-distribution test sets. Evaluation results are saved to $OUTPUT_DIR/eval_result_log.txt.

Note: for <model-type>+<confidence-score-strategy> results in the paper (Table 2 and Table 3), you should run bash ./scripts/<dataset_name>/<vanilla-or-kfolden>/<model-type>.sh.
After training, the model trainer evaluates on in-distribution and out-of-distribution datasets with various calibration strategies.
For RoBERTa, RoBERTa+Scaling, and RoBERTa+Mahalanobis kfolden model results on 20Newsgroups-6S dataset, you should run bash ./nss_20newsgroups_6s/kfolden/kfolden_roberta.sh.
After training, the evaluation results can be found at $OUTPUT_DIR/<k-1>/eval_result_log.txt.

Contact

If you have any issues or questions about this repo, please feel free to contact xiaoya_li [AT] shannonai.com .
Any discussions, suggestions and questions are welcome !

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
metrics		metrics
models		models
modules		modules
scripts		scripts
task		task
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

metrics

metrics

models

models

modules

modules

scripts

scripts

task

task

utils

utils

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

kfolden-ood-detection

Benchmarks

Requirements

1. Download Glove, BERT, and RoBERTa

2. Train and Evaluate

2.1 Vanilla Models

2.2 kFolden Models

Contact

About

Releases

Packages

Contributors 2

Languages

ShannonAI/kfolden-ood-detection

Folders and files

Latest commit

History

Repository files navigation

kfolden-ood-detection

Benchmarks

Requirements

1. Download Glove, BERT, and RoBERTa

2. Train and Evaluate

2.1 Vanilla Models

2.2 kFolden Models

Contact

About

Resources

Stars

Watchers

Forks

Languages