Description

reliability-checklist is a Python framework (available via CLI) for Comprehensively Evaluating the Reliability of NLP Systems

reliability-checklist accepts any model and dataset as input and facilitates the comprehensive evaluation on a wide range of reliability-related aspects such as accuracy, selective prediction, novelty detection, stability, sensitivity, and calibration.

Why you might want to use it:

✅ No coding needed
Pre-defined templates available to easily integrate your models/datasets via command line only.

✅ Bring Your own Model (BYoM)
Your model template is missing? We have you covered: Check out BYoM to create your own model specific config file.

✅ Bring Your own Data (BYoD)
Your dataset template is missing? Check out BYoD to create your own dataset specific config file.

✅ Reliability metrics
Currently, we support a number of reliability related aspects:

Accuracy/F1/Precision/Recall
Calibration: Reliability Diagram Expected Calibration Error (ECE), Expected Overconfidence Error (EOE)
Selective Prediction: Risk-Coverage Curve (RCC), AUC of risk-coverage curve
Sensitivity
Stability
Out-of-Distribution

Upcoming Reliability Aspects:

Adversarial Attack: Model in the loop adversarial attacks to evaluate model's robustness.
Task-Specific Augmentations: Task-specific augmentations to check the reliability on augmented inputs.
Novelty
Other Measures: We plan to incorporate other measures such as bias, fairness, toxicity, and faithfulness of models. We also plan to measure the reliability of generative models on crucial parameters such as hallucinations.

Workflow

✅ Want to integrate more features?
Our easy-to-develop infrastructure allows developers to contribute models, datasets, augmentations, and evaluation metrics seamlessly to the workflow.

How to install?

pip install git+https://github.com/Maitreyapatel/reliability-checklist

python -m spacy download en_core_web_sm
python -c "import nltk;nltk.download('wordnet')"

How to use?

Evaluate example model/data with default configuration

# eval on CPU
recheck

# eval on GPU
recheck trainer=gpu +trainer.gpus=[1,2,3]

Evaluate model with chosen dataset-specific experiment configuration from reliability_checklist/configs/task/

recheck tasl=<task_name>

Specify the custom model_name as shown in following MNLI example

# if model_name is used for tokenizer as well.
recheck task=mnli custom_model="bert-base-uncased-mnli"

# if model_name is different for tokenizer then
recheck task=mnli custom_model="bert-base-uncased-mnli" custom_model.tokenizer.model_name="ishan/bert-base-uncased-mnli"

Add custom_model config

# create config folder structure similar to reliability_checklist/configs/
mkdir ./configs/
mkdir ./configs/custom_model/

# run following command after creating new config file inside ./configs/custom_model/<your-config>.yaml
recheck task=mnli custom_model=<your-config>

Visualization of results

reliability-checklist supports the wide range of visualization tools. One can decide to go with default wandb online visualizer. It also generates plots that are highly informative which will be stored into logs directory.

🤝 Contributing to `reliability-checklist`

Any kind of positive contribution is welcome! Please help us to grow by contributing to the project.

If you wish to contribute, you can work on any features/issues listed here or create one on your own. After adding your code, please send us a Pull Request.

Please read CONTRIBUTING for details on our CODE OF CONDUCT, and the process for submitting pull requests to us.

A ⭐️ to reliability-checklist is to build the reliability of Language Models.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github		.github
data		data
docs		docs
examples		examples
extras		extras
reliability_checklist		reliability_checklist
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

Maitreyapatel/reliability-checklist

Folders and files

Latest commit

History

Repository files navigation

Description

Why you might want to use it:

Upcoming Reliability Aspects:

Workflow

How to install?

How to use?

Add custom_model config

Visualization of results

🤝 Contributing to reliability-checklist

A ⭐️ to reliability-checklist is to build the reliability of Language Models.

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

🤝 Contributing to `reliability-checklist`