Skip to content

Releases: facebookresearch/fairseq

v0.7.0

20 Jun 03:03
Compare
Choose a tag to compare

Notable (possibly breaking) changes:

  • d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
  • f2563c2: Move LM definitions into separate files
  • dffb167: Updates to model API:
    • FairseqModel -> FairseqEncoderDecoderModel
    • add FairseqDecoder.extract_features and FairseqDecoder.output_layer
    • encoder_out_dict -> encoder_out
    • rm unused remove_head functions
  • 34726d5: Move distributed_init into DistributedFairseqModel
  • cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
  • d45db80: Change default LR scheduler from reduce_lr_on_plateau to fixed
  • 96ac28d: Rename --sampling-temperature -> --temperature
  • fc1a19a: Deprecate dummy batches
  • a1c997b: Add memory mapped datasets
  • 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes

v0.6.2

15 Mar 17:28
Compare
Choose a tag to compare

Changelog:

  • 998ba4f: Add language models from Baevski & Auli (2018)
  • 4294c4f: Add mixture of experts code from Shen et al. (2019)
  • 0049349: Add example for multilingual training
  • 48d9afb: Speed improvements, including fused operators from apex
  • 44d27e6: Add Tensorboard support
  • d17fa85: Add Adadelta optimizer
  • 9e1c880: Add FairseqEncoderModel
  • b65c579: Add FairseqTask.inference_step to modularize generate.py
  • 2ad1178: Add back --curriculum
  • Misc bug fixes and other features

v0.6.1

09 Feb 18:37
Compare
Choose a tag to compare

Bumping version number for PyPI release.

v0.6.0

26 Sep 17:16
Compare
Choose a tag to compare

Changelog:

  • 4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
    • no more FP16Trainer, we just have an FP16Optimizer wrapper
    • most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
    • Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
    • Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
  • 1c56b58: parallelize preprocessing
  • Misc bug fixes and features

v0.5.0: 0.4.0 -> 0.5.0

15 Jun 20:38
388c520
Compare
Choose a tag to compare
Changelog:
- 97b58b4: add Transformer model from Vaswani et al. (2017)
- b2374e5: faster Transformer inference with improved caching
- 2d27ae0: simulate large mini-batch training with delayed updates (`--update-freq`)
- 7ee1d28: add FP16 training support (`--fp16`)
- 2a84f46: faster inference by removing completed sentences from the batch
- 663fd80: batched interactive generation
- 4c2ef2d: add language modeling / gated convolutional model from Dauphin et al. (2017)
- b59815b: add Hierarchical Neural Story Generation model from Fan et al. (2018)
- ff68a9e: add FairseqTask to modularize task definitions (e.g., translation, language modeling)

v0.4.0

15 Jun 19:07
ec0031d
Compare
Choose a tag to compare
Merge internal changes (#163)