Releases · DigitalPhonetics/IMS-Toucan

01 Dec 15:40

Flux9665

9036272

ChallengeDataContribution Pre-release

Pre-release

v2.asvspoof

fix popping noise and incorrect path in downloader

Assets 9

10 Apr 18:22

Flux9665

v2.5

5f1dce3

ToucanTTS Latest

Latest

We pack a bunch of designs into a new architecture, which will be the basis for our multilingual and low-resource research going forward. We call it ToucanTTS and as usual, provide pretrained models. The synthesis quality is very good and the training is very stable and requires few datapoints for training from scratch and even fewer for finetuning. It is hard to quantify these stats, so it's probably best to try it out yourself.

We also offer the option to use a BigVGAN vocoder, which sounds very nice, but is a bit slow on CPU. On GPU it is definitely recommended to use the new vocoder.

Assets 9

04 Apr 14:15

Flux9665

v2.b

9c0d819

Blizzard Challenge 2023

Our submission to the Blizzard Challenge 2023

Assets 5

22 Feb 17:08

Flux9665

v2.4

afbf183

Improved Controllable Multilingual

This release extends the toolkits functionality and provides new checkpoints.

new sampling rate for the vocoder: Using 24kHz instead of 48kHz lowers the theoretical upper bound for quality, but produces fewer artifacts in practice.
flow based postnet from portaspeech is included in the new TTS model which brings cleaner results at basically no expense
new controllability options through artificial speaker generation in a lower dimensional space with a better embedding function
quality of life changes, such as an integrated finetuning example and an arbiter for the train loops to be used and vocoder finetuning (although that should really not be necessary)
divese bugfixes and speed increases

This release breaks backwards compatibility, please download the new models or stick to a prior release if you rely on your old models.

Future releaes will include one more change to the vocoder used (BigVGAN generator) and lots of changes to scale up the multi-lingual capabilities of a single model.

Assets 8

25 Oct 15:16

Flux9665

v2.3

ee3b798

Controllable Speakers

This release extends the toolkits functionality and provides new checkpoints.

self contained embeddings: we no longer use an external embedding model for TTS conditioning. Instead we train one that is specifically tailored for this use.
new vocoder: Avocodo replaces HiFi-GAN
new controllability options through artificial speaker generation
quality of life changes, such as weights&biases integration, a graphic demo script and automated model downloading
divese bugfixes and speed increases

This release breaks backwards compatibility, please download the new models or stick to a prior release if you rely on your old models.

Assets 8

20 May 10:04

Flux9665

v2.2

1ae0202

Support all Types of Languages

This release extends the toolkits functionality and provides new checkpoints.

New Features:

support for all phonemes in the IPA standard through an extended lookup of articulatory features
support for some suprasegmental markers in the IPA standard through parsing (tone, lengthening, primary stress)
praat-parselmouth for greatly improved pitch extraction
faster phonemizaton
word boundaries are added, which are invisible to the aligner and the decoder, but can help the encoder in multilingual scenarios
tonal languages added, tested and included into the pretraining (Chinese, Vietnamese)
Scorer class to inspect data given a trained model and dataset cache (provided pretrained models can be used for this)
intuitive controls for scaling durations and variance in pitch and energy
divese bugfixes and speed increases

Note:

This release breaks backwards compatibility. Make sure you are using the associated pretrained models. Old checkpoints and dataset caches become incompatible. Only HiFiGAN remains compatible.
Work on upcoming releases is already in progress. Improved voice adaptation will be our next goal.
To use the pretrained checkpoints, download them, create their corresponding directories and place them into your clone as follows (you have to rename the HiFiGAN and FastSpeech2 checkpoints once in place):

...
Models
└─ Aligner
      └─ aligner.pt
└─ FastSpeech2_Meta
      └─ best.pt
└─ HiFiGAN_combined
      └─ best.pt
...

Assets 5

01 Mar 20:37

Flux9665

v2.1

81075a6

Multi Language and Multi Speaker

self contained aligner to get high quality durations quickly and easily without reliance on external tools or knowledge distillation
modelling speakers and languages jointly but disentangled, so you can use speakers across languages
look at the demo section for an interactive online demo

Pretrained FastSpeech2 model that can speak in many languages in any voices, HiFiGAN model and Aligner model are attached to this commit.

Assets 5

28 Feb 20:36

Flux9665

v1.1

6f180c7

Articulatory Features and LAML

This release includes our new text frontend that uses articulatory features of phonemes instead of phoneme identities as well as checkpoints trained with a variant of model agnostic meta learning that are very well suited as basis for fine-tuning a single speaker model on very little data in lots of different languages.

Assets 5

14 Jan 16:49

Flux9665

v1.0

17d3dda

Tacotron2 FastSpeech2 HiFiGAN basic implementation complete

The basic version of Tacotron 2, FastSpeech 2 and HiFiGAN are complete. A pretrained model for HiFiGAN is attached to this release.

Future updates will include different models and new features and changes to existing models which will break backwards compatibility. This version is the most basic, but complete.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: DigitalPhonetics/IMS-Toucan

ChallengeDataContribution

ToucanTTS

Blizzard Challenge 2023

Improved Controllable Multilingual

Controllable Speakers

Support all Types of Languages

Multi Language and Multi Speaker

Articulatory Features and LAML

Tacotron2 FastSpeech2 HiFiGAN basic implementation complete