Skip to content

tts_decode.py (Core Tools)

kimjohn1 edited this page Mar 21, 2020 · 1 revision

I hope this question is appropriate for this site.

I am a computer science university student attempting to construct a term project that includes using text to speech. I happened to come across your excellent end to end speech processing toolkit and have listened to some of the samples of tts output (https://espnet.github.io/espnet-tts-sample/). I've explored the on line demo at https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb, which is great. I have also installed the "Synthesized speech using pretrained models" portion of the Pretrained Model in your ESPnet (0.6.2) end-to-end speech processing toolkit Documentation (https://espnet.github.io/espnet/notebook/pretrained.html), and have used that to explore using several of your available pretrained feature generation and vocoder models.

My project involves using tts in a program that will run on a non-GPU accelerated computer, and I notice that you list tts_decode.py as a Core Tool of the ESPnet system (https://espnet.github.io/espnet/apis/espnet_bin.html) that synthesizes text using a TTS model on a single CPU. That sounds ideal for my use case. The description of tts_decode.py and named arguements provided is great, but not quite enough for me to know how to use it. I have not located a description of how to use tts_decode.py, or any examples. Is there additional documentation that would describe pre-requisites for using, and some simple examples that use the tts_decode.py tool somewhere that you might direct me to?

Thanks, Kim