Usage

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

Usage

This clone of fairseq supports Knowledge Distillation, Recurrent Stacking, LoRA RoPE, YaRN and ALiBi for the Transformer model and the translation task. You can add the following flags to fairseq-train/fairseq-interactive/fairseq-generate to use them:

Name and Citation	Description	Flags to Activate	Source
Knowledge Distillation (Hinton et al., Kim & Rush, Wang et al., Gumma et al.)	Transfers soft information from a pretrained teacher model to a smaller student model	`--teacher-checkpoint-path $teacher_ckpt --task translation_with_kd --criterion label_smoothed_cross_entropy_with_kd --kd-args '{"strategy": "word_level"}'`	Selective Distillation
Recurrent Stacking (Dabre & Fujita)	Extreme parameter sharing technique in which all layers in the encoder/decoder are shared	`--encoder-recurrent-stacking $encoder_recurrent_stacking --decoder-recurrent-stacking $decoder_recurrent_stacking`	-
Low-Rank Adaptation (LoRA) (Hu et al.)	Efficient model adaptation technique that modifies a small number of model parameters while freezing the rest	`--lora-args '{"r": 8, "alpha": 16, "dropout": 0.05, "bias": "none, "target_modules": "k_proj,v_proj"}' --use-native-attention --load-checkpoint-liberally`	LoRA Implementation
Rotary Positional Embedding (RoPE) (Su et al.)	Encodes absolute position with a rotation matrix and incorporates explicit relative position dependency in self-attention formulation	`--rope-args '{"max_position_embeddings": 2048, "base": 10000, "type": "vanilla"}' --use-native-attention --no-token-positional-embeddings`	RoPE Implementation
Yet another RoPE extensioN method (YaRN) (Peng et al.)	Compute-efficient method to extend the context window of models	`--yarn-args '{"max_position_embeddings": 2048, "base": 10000, "type": "vanilla", "original_max_position_embeddings": 256, "extrapolation_factor": 1, "attn_factor": 1, "beta_fast": 32, "beta_slow": 1}' --use-native-attention --no-token-positional-embeddings`	YaRN Implementation
Attention with Linear Biases (ALiBi) (Press et al.)	Simple and efficient position method that biases query-key attention scores with a penalty proportional to their distance	`--alibi-args '{"alibi_asymmetrical": "false"}' --no-token-positional-embeddings --load-checkpoint-liberally`	ALiBi Implementation
Factorized Embedding Parameterization (Lan et al.)	Parameterizes large embeddings by adding an intermediate bottleneck layer	`--encoder-factorized-embed-dim $encoder_fac_embed_dim --decoder-factorized-embed-dim $decoder_fac_embed_dim --factorized-embed-activation-fn $fac_embed_activation_fn`	-
Penultimate Linear Transformation Activation	Adds activation to the penultimate linear transformation before the final projection onto the vocabulary	`--decoder-output-activation-fn $decoder_out_activation_fn`	-
Sanity Validation Steps	Runs a full pass over the validation set at the beginning of training	`--run-sanity-validation-steps`	-

Requirements and Installation

PyTorch version >= 2.1.1
Python version >= 3.8
For training new models, you'll also need an NVIDIA GPU and NCCL
To install fairseq and develop locally:

git clone https://github.com/VarunGumma/fairseq
cd fairseq
pip install -e ./

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run.

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as:

@misc{gumma2024fairseq,
  author = {Varun Gumma},
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/VarunGumma/fairseq}},
}
@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}

Final Note

I will try my best to keep this repo synced with the upstream fairseq repository. This clone is very dynamic and can have broken stuff once in a while. So feel free to raise issues or pull requests to clear any bugs or introduce new features.

Name		Name	Last commit message	Last commit date
Latest commit History 2,528 Commits
.github		.github
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
hydra_plugins/dependency_submitit_launcher		hydra_plugins/dependency_submitit_launcher
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
release_utils.py		release_utils.py
setup.cfg		setup.cfg
setup.py		setup.py
train.py		train.py

License

VarunGumma/fairseq

Folders and files

Latest commit

History

Repository files navigation

Usage

Requirements and Installation

License

Citation

Final Note

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages