Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Feature

  1. Support both GPU and CPU Distributed Training
  2. Automatically clean up PS when the whole FrameworkAttempt is completed
  3. No need to adjust existing TensorFlow image
  4. No need to setup Kubernetes DNS and Kubernetes Service
  5. Common Feature

Prerequisite

  1. See [PREREQUISITE] in each specific Framework yaml file
  2. Need to setup Kubernetes Cluster-Level Logging, if you need to persist and expose the log for deleted Pod

Quick Start

  1. Common Quick Start
  2. CPU Example
  3. GPU Example