Retrieval-based-Voice-Conversion-WebUI (RVC) colab version .
Check out our Demo Video here!
The dataset for the pre-training model uses nearly 50 hours of high quality audio from the VCTK open source dataset.
High quality licensed song datasets will be added to the training-set often for your use, without having to worry about copyright infringement.
Please look forward to the pretrained base model of RVCv3, which has larger parameters, more training data, better results, unchanged inference speed, and requires less training data for training.
- Reduce tone leakage by replacing the source feature to training-set feature using top1 retrieval;
- Easy + fast training, even on poor graphics cards;
- Training with a small amounts of data (>=10min low noise speech recommended);
- Model fusion to change timbres (using ckpt processing tab->ckpt merge);
- Easy-to-use WebUI;
- UVR5 model to quickly separate vocals and instruments;
- High-pitch Voice Extraction Algorithm InterSpeech2023-RMVPE to prevent a muted sound problem. Provides the best results (significantly) and is faster with lower resource consumption than Crepe_full;
- AMD/Intel graphics cards acceleration supported;
- Intel ARC graphics cards acceleration with IPEX supported.
- ContentVec
- VITS
- HIFIGAN
- Gradio
- FFmpeg
- Ultimate Vocal Remover
- audio-slicer
- Vocal pitch extraction:RMVPE