Bio-Benchmarks for Protein Engineering

This repository is for the paper submitted to the 2021 NeurIPS Benchmark track.

Folder breakup

collect_splits contains notebooks to process RAW datasets collected from various sources.
splits contains all splits, a brief description of their processing and the logic behind train/test splits
baselines contains code used to compute baselines

A .gitignored folder called data contains RAW data used to produce all splits. As the folder size is substantial, it could not be shipped with GitHub. However, it can be accessed here: http://data.bioembeddings.com/public/FLIP

Here are available all the FLIP datasets in FASTA format (following the standardization proposed in biotrainer).

Find out more about the splits

The goal of the splits in this repository is to assess how well machine learning devices using protein sequence inputs can represent different dimensions relevant for protein design. The main place to find out about the splits is the splits folder. Each set contains a zip file with one or more "splits", where different splits may be different train/test splits based on biological or statistical intuition.

Split semaphore

Splits are associated with a semaphore which indicates for what they may be used:

🟢: active splits can be used to evaluate accuracy of your machine learning models
🟠: splits that should not be used to make performance comparisons, as may give overestimations, or because other active splits have similar discriminative ability
🔴: splits that should not be used / considered obsolete. Please do not use these to report performance.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
baselines		baselines
collect_splits		collect_splits
evals_new		evals_new
helpers		helpers
splits		splits
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines

baselines

collect_splits

collect_splits

evals_new

evals_new

helpers

helpers

splits

splits

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Bio-Benchmarks for Protein Engineering

Folder breakup

Find out more about the splits

Split semaphore

About

Releases

Contributors 8

Languages

License

J-SNACKKB/FLIP

Folders and files

Latest commit

History

Repository files navigation

Bio-Benchmarks for Protein Engineering

Folder breakup

Find out more about the splits

Split semaphore

About

Topics

Resources

License

Stars

Watchers

Forks

Languages