Improving antibody language models with native pairing

To determine whether and the extent to which training with natively paired antibody sequence data can improve antibody-specific language models (LMs), we trained three baseline antibody language model (BALM) variants: BALM-paired, which is trained using only natively paired training data, BALM-shuffled, which is trained using randomly paired trianing data, and BALM-unpaied, which is trained using the same antibody sequences but without pairing information. Additionally, we performed full fine-tuning of the state-of-the-art general protein LM ESM-2 using the same natively paired dataset used to train BALM-paired. The Jupyter notebooks in this repository contain all code necessary to re-train each of these models from scratch:

BALM-paired: downloads training data (if necessary) and trains BALM-paired.
BALM-shuffled: training data will need to be processed to randomly shuffle the pairing, then use the same training script as BALM-paired
BALM-unpaired: downloads training data (if necessary) and trains BALM-unpaired.
ESM-2 fine-tuning: downloads training data (if necessary) and performs full fine-tuning of ESM-2.

pre-trained models

Weights for each of the aforementioned models can be downloaded from Zenodo.

how should I cite BALM?

BALM has been published in Patterns, and can be cited as:

Burbach, S.M., & Briney, B. (2024). Improving antibody language models with native pairing.
Patterns. https://doi.org/10.1016/j.patter.2024.100967

The current version of the BALM dataset (v2024.02.20) can be cited as:

Burbach SM, Briney B. Improving antibody language models with native pairing (v2024.02.20) [Data set].
Zenodo. 2023. https://doi.org/10.5281/zenodo.10684811

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
attention		attention
data		data
models		models
tokenizer		tokenizer
.gitignore		.gitignore
BALM-paired.ipynb		BALM-paired.ipynb
BALM-unpaired.ipynb		BALM-unpaired.ipynb
ESM2_fine-tuning.ipynb		ESM2_fine-tuning.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention

attention

data

data

models

models

tokenizer

tokenizer

.gitignore

.gitignore

BALM-paired.ipynb

BALM-paired.ipynb

BALM-unpaired.ipynb

BALM-unpaired.ipynb

ESM2_fine-tuning.ipynb

ESM2_fine-tuning.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Improving antibody language models with native pairing

pre-trained models

how should I cite BALM?

About

Releases

Packages

Contributors 2

Languages

License

brineylab/BALM-paper

Folders and files

Latest commit

History

Repository files navigation

Improving antibody language models with native pairing

pre-trained models

how should I cite BALM?

About

Resources

License

Stars

Watchers

Forks

Languages