Roboy Soncreo

Roboy Soncreo (from Lat. sonus - sound and creō - I create, make, produce) - a library for Speech Generation based on Deep Learning models.

A pytorch implementaton that combines Tacotron2 and NV-Wavenet to provide audio synthesis from text. It also supports interfacing using ROS2 (not implemented yet)

Pre-requisites

NVIDIA GPU + CUDA cuDNN
Pytorch 1.0

Setup

Clone this repo: git clone https://github.com/Roboy/soncreo
Initialize submodules: git submodule init; git submodule update
Download and extract the LJ Speech dataset

To build nv-wavenet wrapper for pytorch

cd nv-wavenet\pytorch.
Update the Makefile with the appropriate ARCH=sm_70. Find your ARCH here: https://developer.nvidia.com/cuda-gpus. For example, NVIDIA Titan V has 7.0 compute capability; therefore, it's correct ARCH parameter is sm_70.
Build nv-wavenet and C-wrapper: make
Install the PyTorch extension: python build.py install

Training Tacotron2

cd tacotron2 and then update .wav paths: sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt
cd into parent Soncreo directory cd ..
python interface.py --output_directory=output --log_directory=logdir
(OPTIONAL) tensorboard --logdir=outdir/logdir

Training NV-Wavenet

Make a list of the file names to use for training/testing
ls ljs_datset_folder/*.wav | tail -n+10 > train_files.txt
ls ljs_dataset_folder/*.wav | head -n10 > test_files.txt
Train the model
python interface_wavenet.py -c nv-wavenet/pytorch/config.json

Inference Text to Speech

To play audio from text

python combine.py --default=False --text='Write your text here' --checkpoint_tac='checkpoint/tac' --checkpoint_wav='checkpoints/wav' --batch=1 output_directory='./output --implementation="persistent"

To infer with our pretrained models for tacotron2 and wavenet

Download pretrained models here
Create a folder named checkpoint and copy tacotron2 and wavenet pretrained models: mkdir checkpoints
Create a folder called output (used to save the produced wav file: mkdir outputs
Run the following command: python combine.py --default=True --text="Write your text here"

(Optional) Connect the Text to Speech Inference via ROS2

This repo contains a ROS2 Server (rospy client library) allows a ROS2 node to communicate.

Starting the ros service: python3 TTS_srv.py
Call the service via a client (simple example client for Roboy is Pyroboy)

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
apex		apex
generated		generated
tacotron2 @ 1913a86		tacotron2 @ 1913a86
waveglow @ 6188a1d		waveglow @ 6188a1d
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
README.md		README.md
ReadmeDocker.md		ReadmeDocker.md
TTS_srv.py		TTS_srv.py
TTS_srv_melodic.py		TTS_srv_melodic.py
TTS_srv_new.py		TTS_srv_new.py
asr_json.py		asr_json.py
baidu_tts.py		baidu_tts.py
combine.py		combine.py
google_TTS_srv_de.py		google_TTS_srv_de.py
google_TTS_srv_eng.py		google_TTS_srv_eng.py
interface.py		interface.py
package.xml		package.xml
requirements.txt		requirements.txt

Roboy/soncreo

Folders and files

Latest commit

History

Repository files navigation

Roboy Soncreo

Pre-requisites

Setup

To build nv-wavenet wrapper for pytorch

Training Tacotron2

Training NV-Wavenet

Inference Text to Speech

To play audio from text

To infer with our pretrained models for tacotron2 and wavenet

(Optional) Connect the Text to Speech Inference via ROS2

About

Resources

Stars

Watchers

Forks

Languages