Skip to content

gjoliver/Betha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Betha

Betha is a tribute to Alpa

While Alpha is the nuclear weapon for large model training and serving, Betha is a toy example of model parallel training of a simple GPT-J implementation using manual sharding and Ray core.

model.py has the broken up GPT-J model. shard.py is the Ray actor wrapper that helps flow tensors and gradients across nodes.

To run: python train --model_dir=<cached HF GPT-J model>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages