Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-vector based TTS model packaging broken in tts.sh #5713

Open
G-Thor opened this issue Mar 21, 2024 · 1 comment
Open

X-vector based TTS model packaging broken in tts.sh #5713

G-Thor opened this issue Mar 21, 2024 · 1 comment
Labels
Bug bug should be fixed

Comments

@G-Thor
Copy link
Contributor

G-Thor commented Mar 21, 2024

Describe the bug
PR #5579 broke xvector-conditioned TTS model packaging. In stage 9 of tts.sh, spk_xvector.ark was replaced with {spk_embed_tag}.ark, which in my recipe resolves to xvector.ark. That file does not exist whereas spk_xvector.ark does.

Basic environments:

  • OS information: Linux 4.18.0-513.18.1.el8_9.x86_64 Updated sphinx documents #1 SMP Wed Feb 21 21:34:36 UTC 2024 x86_64
  • python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
  • espnet version: espnet 202402
  • pytorch version: pytorch 2.1.0
  • Git hash: d0047402e830a3c53e8b590064af4bf70415fb3b
    • Commit date: Mon Mar 4 22:19:02 2024 +0000
  • pytorch version [e.g. pytorch 1.4.0]

Task information:

  • Task: TTS
  • Recipe: talromur (but applies to all)
  • ESPnet2

To Reproduce
Steps to reproduce the behavior:

  1. run any xvector-based recipe up until the model packaging stage (e.g. jtubespeech)
  • e.g. cd egs2/jtubespeech/tts1; ./run.sh --stop-stage 8
  1. execute ./run.sh --stage 9 --stop-stage 9
  2. Observe command output

To Fix
This error originates in the following lines, and can be fixed by modifying lines 1133 and 1134 of tts.sh, changing {spk_embed_tag}.ark to spk_{spk_embed_tag}.ark and {spk_embed_tag}.scp to spk_{spk_embed_tag}.scp :

if "${use_spk_embed}"; then
for dset in "${train_set}" ${test_sets}; do
_opts+=" --option ${dumpdir}/${spk_embed_tag}/${dset}/${spk_embed_tag}.scp"
_opts+=" --option ${dumpdir}/${spk_embed_tag}/${dset}/${spk_embed_tag}.ark"
done

I'm not sure whether or how this may affect the new speaker embedding implementation, perhaps the PR author @ftshijt has insight into that?
By the way, thanks for the great work on better integrating speaker embeddings into TTS recipes. I look forward to training an Icelandic speaker embedding model for multi-speaker TTS.

@G-Thor G-Thor added the Bug bug should be fixed label Mar 21, 2024
@ftshijt
Copy link
Collaborator

ftshijt commented Mar 22, 2024

Thanks for the note! You are correct, the packing should be updated. I will have a check tomorrow and make a PR soon (hopefully also to upload the pre-trained TTS model on the new speaker embedding at the same time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed
Projects
None yet
Development

No branches or pull requests

2 participants