Skip to content

A repository to get train transformers to access longer context for causal language models, most of these methods are still in testing. Try them out if you'd like but please lmk your results so we don't duplicate work :)

Notifications You must be signed in to change notification settings

arnavdantuluri/long-context-transformers

Repository files navigation

long-context-transformers

A repository to get train transformers to access longer context for causal language models, most of these methods are still in testing. Try them out if you'd like but please lmk your results so we don't duplicate work :) Exploring finetuning public checkpoints on filtered datasets to extend range of pre-trained models a la MPT-7B

Currently supported

Currently has code for Flash Attention + QLoRa, tested to work with NeoX models

Also has code for patching NeoX models with Blockwise Parallel Transformer attention (able to support 42k tokens on 160m model with single A100 gpu)

Will setup longformer and landmarks soon

Training examples WIP

Multiple GPUS

multiple gpus should be supported with 🤗 accelerate since QLoRa uses that but I have not tested it yet

About

A repository to get train transformers to access longer context for causal language models, most of these methods are still in testing. Try them out if you'd like but please lmk your results so we don't duplicate work :)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages