[Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) #3631

talipturkmen · 2024-03-14T16:57:19Z

Describe the bug

[Bug]

Hello,

In XTTSv2, in dataloader condition indices (condition start indice and end indice) are loading as audio samples which later will be compressed by 256 due to mel spectrogram extraction. Later these condition start and end indices are used to mask the ground truth audio codes. Since, audio codes compressed by 1024 times these start and end indices also has to be divided by 1024 but they are divided by perceiver_cond_length_compression which is set to 256 coming from the default args of GPT. In this case condition indices in dvae domain will refer to 4 times higher of what it should refer. So you will mask the wrong part of the target audio tokens. I couldn't find any place where they are set to 1024 correcty and I can not understand with this bug how it's able to train and finetune.

I'd appreciate if anyone shed lights on this topic.

Error lines:

TTS/TTS/tts/layers/xtts/gpt.py

Line 110 in dbf1a08

perceiver_cond_length_compression=256,

TTS/TTS/tts/layers/xtts/gpt.py

Line 414 in dbf1a08

cond_idxs[idx] = cond_idxs[idx] // self.perceiver_cond_length_compression

To Reproduce

xttsv2 finetuning

Expected behavior

...

Logs

...

Environment

- all xttsV2 versions

Additional context

No response

stale · 2024-04-22T05:35:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

talipturkmen added the bug Something isn't working label Mar 14, 2024

stale bot added the wontfix This will not be worked on but feel free to help. label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) #3631

[Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) #3631

talipturkmen commented Mar 14, 2024 •

edited

stale bot commented Apr 22, 2024

[Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) #3631

[Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) #3631

Comments

talipturkmen commented Mar 14, 2024 • edited

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

stale bot commented Apr 22, 2024

talipturkmen commented Mar 14, 2024 •

edited