Skip to content

v0.9.0

Compare
Choose a tag to compare
@myleott myleott released this 04 Dec 14:31

Possibly breaking changes:

  • Set global numpy seed (4a7cd58)
  • Split in_proj_weight into separate k, v, q projections in MultiheadAttention (fdf4c3e)
  • TransformerEncoder returns namedtuples instead of dict (27568a7)

New features:

  • Add --fast-stat-sync option (e1ba32a)
  • Add --empty-cache-freq option (315c463)
  • Support criterions with parameters (ba5f829)

New papers:

  • Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)
  • Levenshtein Transformer (86857a5, ...)
  • Cross+Self-Attention for Transformer Models (4ac2c5f)
  • Jointly Learning to Align and Translate with Transformer Models (1c66792)
  • Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)
  • Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)
  • CamemBERT: a French BERT (b31849a)

Speed improvements:

  • Add CUDA kernels for LightConv and DynamicConv (f840564)
  • Cythonization of various dataloading components (4fc3953, ...)
  • Don't project mask tokens for MLM training (718677e)