Skip to content
Stella Biderman edited this page Jan 30, 2021 · 4 revisions

Welcome to the gpt-neox wiki!

The purpose of this wiki is to organize information about all the different terminology and ideas floating around in the DeepSpeed papers, how they connect to each other, what benefits they provide, and why we care about them.

To Do List:

Each item on this list should have its own page.

ZeRO

  • ZeRO
  • ZeRO Stage 1 vs 2 vs 3
  • ZeRO Offload

Structural Optimizations

  • Pipeline Parallelism
  • Kernel Optimization
  • Gradient Clipping
  • Progressive Layer Dropping

Attention

  • Sparse Attention

Checkpointing

  • Model Checkpointing
  • Activation Checkpointing

Optimizers

  • Adam
  • 1-Bit Adam

Networking

  • TCP
  • Infiniband
  • PCIE
  • NVLINK
  • MPI
  • NCCL
Clone this wiki locally