Skip to content

Releases: Azure/MS-AMP

Release MS-AMP v0.4.0

26 Feb 10:34
51f34ac
Compare
Choose a tag to compare

MS-AMP Improvements

  • Improve GPT-3 performance by optimizing the FP8-gradient accumulation with kernel fusion technology
  • Support FP8 in FSDP
  • Support DeepSpeed+TE+MSAMP and add cifar10 example
  • Support MSAMP+TE+DDP
  • Update DeepSpeed to latest version
  • Update TransformerEngin to V1.1 and flash-attn to latest version
  • Support CUDA 12.2
  • Fix several bugs in DeepSpeed integration

MS-AMP-Examples Improvements

  • Improve document for data processing in GPT3
  • Add launch script for pretraining GPT-6b7
  • Use new API of TransformerEngine in Megatron-LM

Document Improvements

  • Add docker usage in Installation page
  • Tell customer how to run FSDP and DeepSpeed+TE+MSAMP example in "Run Examples" page

Release MS-AMP v0.3.0

03 Nov 10:41
3b0567a
Compare
Choose a tag to compare

MS-AMP 0.3.0 Release Notes

MS-AMP Improvements

  • Integrate latest Transformer Engine into MS-AMP
  • Integrate with latest Megatron-LM
  • Add a website for MS-AMP and improve documents
  • Add custom DistributedDataParallel which supports FP8 and computation/computation overlap
  • Refactor code in dist_op module
  • Support UT for distributed testing
  • Integrate with MSCCL

MS-AMP-Examples Improvements

  • Support pretrain GPT-3 with Megatron-LM and MS-AMP
  • Provide a tool to print the traffic per second of NVLINK and InfiniBand
  • Print tflops and throughput metrics in all the examples

Document Improvements

  • Add performance number in Introduction page
  • Enhance Usage page and Optimization Level page
  • Add Container Images page
  • Add Developer Guide section

Release MS-AMP v0.2.0

20 Jul 13:07
716ad89
Compare
Choose a tag to compare

MS-AMP 0.2.0 Release Notes

MS-AMP Improvements

  • Add O3 optimization for supporting FP8 in distributed training frameworks
  • Support ScalingTensor in functional.linear
  • Support customized attributes in FP8Linear
  • Improve performance
  • Add docker file for pytorch1.14+cuda11.8 and pytorch2.1+cuda12.1
  • Support pytorch 2.1
  • Add performance result and TE result in homepage
  • Cache TE build in pipeline

MS-AMP-Examples Improvements

Add 3 examples using MS-AMP:

Release MS-AMP v0.1.0

21 Apr 09:01
ac659d5
Compare
Choose a tag to compare

MS-AMP 0.1.0 Release Notes

MS-AMP package

  • Support the new FP8 feature that is introduced by latest accelerators (e.g. H100).
  • Speed up math-intensive operations, such as linear layers, by using Tensor Cores.
  • Speed up memory-limited operations by accessing one byte compared to half or single-precision.
  • Reduce memory requirements for training models, enabling larger models or larger minibatches.
  • Speed up communication for distributed model by transmitting lower precision gradients.
  • Support two optimization levels: O1 and O2.
  • Support two optimizers: Adam and AdamW.

Examples using MS-AMP