Skip to content

Latest commit

 

History

History
34 lines (29 loc) · 1.73 KB

history.md

File metadata and controls

34 lines (29 loc) · 1.73 KB

History

Metaseq originated as a fork of fairseq that merged FSDP with Megatron's tensor parallel libraries in order to train a 175B using 1k 80GB A100s.

In order to enable faster iteration, we have removed most features offered by fairseq, leaving only the bare minimum set needed to work at 175B scale. We have also renamed a lot of the Fairseq* classes to be prefixed with Base* or Metaseq*. The following includes a full list of renamed classes:

  • Training internals renaming (optimizer related changes + dropout)

    • FairseqOptimizer → BaseOptimizer
    • LegacyFairseqOptimizer → LegacyOptimizer
    • FairseqLRScheduler → BaseLRScheduler
    • FairseqCriterion → BaseCriterion
    • FairseqIncrementalState → IncrementalState
    • FairseqAdam → MetaseqAdam
      • FairseqAdamConfig → MetaseqAdamConfig
    • FairseqSGDW → MetaseqSGDW
    • FairseqDropout → Dropout
  • Model arch related renaming

    • FairseqDecoder → BaseDecoder (since replaced by IncrementalDecoder)
    • FairseqEncoder → BaseEncoder
    • DistributedFairseqModel → DistributedModel
    • BaseFairseqModel → BaseModel
    • FairseqEncoderDecoderModel → EncoderDecoderModel (to be ripped out, only affected tests)
    • FairseqLanguageModel → LanguageModel
  • Config and circuitry renaming

    • FairseqTask → BaseTask
    • LegacyFairseqTask → LegacyTask
    • FairseqDataclass → MetaseqDataclass
    • FairseqConfig → MetaseqConfig
    • FairseqDataset → BaseDataset
  • Module renaming

    • fairseq → metaseq
    • fairseq_cli → metaseq.cli