Skip to content

terru3/moe-kit

Repository files navigation

MoE-Kit

This repository contains implementations of mixture-of-expert models such as the Switch Transformer (Fedus et al. 2021), exploring the ways in which conditional computation can be exploited to scale model parameter count independently of compute as well as its effects on performance and training time.

For more, please see ROADMAP.md

Introduction

For points of contact, please directly contact the authors of this repository.

Data Acknowledgements


About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published