Skip to content

ml-for-speech/speechtoolkit

Repository files navigation

SpeechToolkit

NOTE: This project is still in an early alpha stage and is not ready for production yet.

A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, and more!

Please note that this toolkit is currently in an early alpha and not all features have been implemented.

If you prefer not to use SpeechToolkit but would like to interact with models individually and separately, please check out the ML for Speech page.

Implemented Features

  • Text-to-speech
    • StyleTTS 2
    • MetaVoice
    • Parler TTS
    • XTTS
  • Voice conversion
    • LVC-VC
    • NaturalSpeech3 Voice Conversion
    • StyleTTS2-VC
  • Automatic speech recognition
    • Whisper
    • Distil-Whisper
    • Canary
  • Audio classification
    • Language detection

Installation & Usage

Documentation is available online.

Disclaimer

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Proivded models may make mistakes.

THE MODEL IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS MODEL INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS MODEL.

About

[EARLY PUBLIC ALPHA] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activity detection, and more!

Resources

Stars

Watchers

Forks

Packages

No packages published