Skip to content

OpenAccess-AI-Collective/voltronformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

voltronformers

Assembling the best SotA AI techniques into a unified model

- 13B parameter BitNet + infini-Attention + DenseFormer + MoD + 
  In Context-Pretraining + 2 stage pretraining 
- upcycle w c-BTX to an 8 expert sparse MoE + MoA 

https://twitter.com/winglian/status/1778675583817326842

References

BitNet

BitNet: Scaling 1-bit Transformers for Large Language Models

DenseFormer

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Mixture-of-Depths

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

In-Context Pretraining

In-Context Pretraining: Language Modeling Beyond Document Boundaries

MiniCPM (Two Stage Pre-training Strategy)

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Cluster-Branch-Train-Merge (c-BTM)

Scaling Expert Language Models with Unsupervised Domain Discovery

Branch-Train-MiX (BTX)

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Mixture Of Attention Heads

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

About

Assembling the best SotA AI techniques into a unified model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published