X-vector based TTS model packaging broken in tts.sh #5713

G-Thor · 2024-03-21T21:58:35Z

Describe the bug
PR #5579 broke xvector-conditioned TTS model packaging. In stage 9 of tts.sh, spk_xvector.ark was replaced with {spk_embed_tag}.ark, which in my recipe resolves to xvector.ark. That file does not exist whereas spk_xvector.ark does.

Basic environments:

OS information: Linux 4.18.0-513.18.1.el8_9.x86_64 Updated sphinx documents #1 SMP Wed Feb 21 21:34:36 UTC 2024 x86_64
python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
espnet version: espnet 202402
pytorch version: pytorch 2.1.0
Git hash: d0047402e830a3c53e8b590064af4bf70415fb3b
- Commit date: Mon Mar 4 22:19:02 2024 +0000
pytorch version [e.g. pytorch 1.4.0]

Task information:

Task: TTS
Recipe: talromur (but applies to all)
ESPnet2

To Reproduce
Steps to reproduce the behavior:

run any xvector-based recipe up until the model packaging stage (e.g. jtubespeech)

e.g. cd egs2/jtubespeech/tts1; ./run.sh --stop-stage 8

execute ./run.sh --stage 9 --stop-stage 9
Observe command output

To Fix
This error originates in the following lines, and can be fixed by modifying lines 1133 and 1134 of tts.sh, changing {spk_embed_tag}.ark to spk_{spk_embed_tag}.ark and {spk_embed_tag}.scp to spk_{spk_embed_tag}.scp :

espnet/egs2/TEMPLATE/tts1/tts.sh

Lines 1131 to 1135 in ca7716f

    
           if "${use_spk_embed}"; then 
        
               for dset in "${train_set}" ${test_sets}; do 
        
                   _opts+=" --option ${dumpdir}/${spk_embed_tag}/${dset}/${spk_embed_tag}.scp" 
        
                   _opts+=" --option ${dumpdir}/${spk_embed_tag}/${dset}/${spk_embed_tag}.ark" 
        
               done

I'm not sure whether or how this may affect the new speaker embedding implementation, perhaps the PR author @ftshijt has insight into that?
By the way, thanks for the great work on better integrating speaker embeddings into TTS recipes. I look forward to training an Icelandic speaker embedding model for multi-speaker TTS.

The text was updated successfully, but these errors were encountered:

ftshijt · 2024-03-22T01:36:14Z

Thanks for the note! You are correct, the packing should be updated. I will have a check tomorrow and make a PR soon (hopefully also to upload the pre-trained TTS model on the new speaker embedding at the same time)

G-Thor added the Bug bug should be fixed label Mar 21, 2024

ftshijt mentioned this issue Mar 25, 2024

Fix tts packing with new spk embedding #5715

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

X-vector based TTS model packaging broken in tts.sh #5713

X-vector based TTS model packaging broken in tts.sh #5713

G-Thor commented Mar 21, 2024

ftshijt commented Mar 22, 2024

X-vector based TTS model packaging broken in tts.sh #5713

X-vector based TTS model packaging broken in tts.sh #5713

Comments

G-Thor commented Mar 21, 2024

ftshijt commented Mar 22, 2024