branch in V4 version train it's working ? #33

lpscr · 2023-12-15T17:58:36Z

hi ! thank you very much for your work and this amazing repo

i try train the branch v4 i have something very wrong here
when i train about 3 hours it's not change i have noise all the steps i use this

1 . python preprocess.py
2. python model1.py

29000 steps v4 branch

in v3 or main branch after some steps i have this

5000 steps v3 or main branch

like you see in v4 , i get only noise i do something wrong

can you please tell me in the v4 train working ?
or what i do wrong

thank you for your time

The text was updated successfully, but these errors were encountered:

adelacvg · 2023-12-16T03:54:58Z

You haven't done anything wrong. Due to the model v4 having over 200 million parameters, the training process is very slow. I am currently experimenting with features such as offset noise, normalization, and cfg to make the training more stable. Your results seem quite normal, and theoretically, the convergence time of the v4 model is close to that of sd1.5. The previous three versions used smaller noise and predicted x0, resulting in faster training. However, v4 employs the classic approach of predicting noise as the target.

lpscr · 2023-12-16T23:20:18Z

this is so cool! I understand now. I'm going to retrain to see
thank you very much for explanation and quick reply .

rishikksh20 · 2023-12-27T06:32:22Z

@lpscr are you able to converge the model ?

rishikksh20 · 2024-01-08T10:47:14Z

@adelacvg checked you update the model arch on v4. Is implementation completed? and is new model converge faster?
I have collected lots of audio data now waiting for GPU availability to start training.

adelacvg · 2024-01-09T17:53:21Z

Yes, the previous training process was slow to converge due to issues with the UNet. Additionally, there were semantic problems caused by a bug in the diffusion training architecture from ControlNet. The current diffusion training framework is now based on Tortoise, eliminating any semantic faults. Furthermore, the architecture employs transformer blocks without updown, leading to much faster convergence.

rishikksh20 · 2024-01-09T18:39:38Z

Thanks :)
Are you using HuBERT only for context vector?
As my usecase is for non-english language so I thought to use Whisper layer 24 features rather than HuBERT.

adelacvg · 2024-01-17T14:19:18Z

Regarding contentvec, I chose it primarily to prevent timbre leakage. Hubert or Whisper have noticeable timbre leakage issues when trained using self-supervision. I have trained a model, and although there is some loss in audio quality during zero-shot scenarios, it performs better than the previous model on the same data scale.

rishikksh20 · 2024-02-28T11:18:12Z

Hi @adelacvg Is it possible to transfer bit Prosody and style also from NS2VC architecture not just voice?
For simply voice conversion it working good, although voice not match exactly but still fine

adelacvg · 2024-02-28T11:40:40Z

Certainly, but I believe that prosody and speed are better suited for GPT or an acoustic model. The diffusion part, working as a good decoder, should suffice.

rishikksh20 · 2024-02-28T11:49:11Z

Just need to ask one more question, Are semantic tokens like Hu-BERT, wav2vec, and ContentVec have prosody information?

adelacvg · 2024-02-28T12:30:10Z

Of course, prosody encompasses fundamental frequency, pause duration, intonation, and other essential information. Semantic tokens inherently carry duration information and intonation.

rishikksh20 · 2024-02-28T12:41:08Z

Yes, I have the same intuition because pronunciation is an integral part of linguistics.

rishikksh20 · 2024-02-29T08:12:24Z

Hi @adelacvg Have you checked YODAS : https://huggingface.co/datasets/espnet/yodas 370k hours dataset, although data quality is poor as music is there or some samples are empty but still good quality data for VC pretraining.
If you are not GPU poor 😢 you can pretrain this to YODAS 😅.

adelacvg · 2024-03-01T05:06:39Z

@rishikksh20 Thank you very much for your suggestion. However, I'm currently short on GPU resources, and all GPUs are being used for experiments with the AR TTS model based on GPT. The pre-trained model may be trained when there are available GPUs.

rishikksh20 · 2024-03-01T06:19:35Z

@adelacvg Everyone is GPU-poor, I am also waiting for my GPU to be vacated. By the way how's the progress with TTTS training do you have any sample to share?
I have tested the Hierspeech++'s Non-autoregressive Text to vector module along with NS2VC which acts as end-to-end TTS, and it is performing well. GPT-based Text to Vector which I have tested before shows lots of hallucination.

adelacvg · 2024-03-01T07:31:17Z

@rishikksh20 The model in the master branch of TTTS is based on Tortoise, and the results are comparable to Tortoise. I have provided a Colab link for testing the pre-trained model. For the v2 version, I would like to use a training method similar to Valle's, while still using Diffusion as the decoder, with the hope of achieving better zero-shot results.

rishikksh20 · 2024-04-30T08:12:19Z

For v4 I am planning to train on Encodec features for better speaker generalization as commented here #16 (comment) .
Has anyone tried this before or like to give me any heads up thought?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

branch in V4 version train it's working ? #33

branch in V4 version train it's working ? #33

lpscr commented Dec 15, 2023 •

edited

adelacvg commented Dec 16, 2023

lpscr commented Dec 16, 2023

rishikksh20 commented Dec 27, 2023

rishikksh20 commented Jan 8, 2024

adelacvg commented Jan 9, 2024

rishikksh20 commented Jan 9, 2024

adelacvg commented Jan 17, 2024

rishikksh20 commented Feb 28, 2024 •

edited

adelacvg commented Feb 28, 2024

rishikksh20 commented Feb 28, 2024

adelacvg commented Feb 28, 2024

rishikksh20 commented Feb 28, 2024

rishikksh20 commented Feb 29, 2024

adelacvg commented Mar 1, 2024

rishikksh20 commented Mar 1, 2024

adelacvg commented Mar 1, 2024

rishikksh20 commented Apr 30, 2024

branch in V4 version train it's working ? #33

branch in V4 version train it's working ? #33

Comments

lpscr commented Dec 15, 2023 • edited

adelacvg commented Dec 16, 2023

lpscr commented Dec 16, 2023

rishikksh20 commented Dec 27, 2023

rishikksh20 commented Jan 8, 2024

adelacvg commented Jan 9, 2024

rishikksh20 commented Jan 9, 2024

adelacvg commented Jan 17, 2024

rishikksh20 commented Feb 28, 2024 • edited

adelacvg commented Feb 28, 2024

rishikksh20 commented Feb 28, 2024

adelacvg commented Feb 28, 2024

rishikksh20 commented Feb 28, 2024

rishikksh20 commented Feb 29, 2024

adelacvg commented Mar 1, 2024

rishikksh20 commented Mar 1, 2024

adelacvg commented Mar 1, 2024

rishikksh20 commented Apr 30, 2024

lpscr commented Dec 15, 2023 •

edited

rishikksh20 commented Feb 28, 2024 •

edited