GitHub - blaise-tk/RVC_CLI: RVC CLI enables seamless interaction with Retrieval-based Voice Conversion through commands or HTTP requests.

RVC_CLI: Retrieval-based Voice Conversion Command Line Interface

Installation

Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):

Windows

Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe main.py instead of the conventional python main.py command.

Linux

chmod +x install.sh
./install.sh

Getting Started

Download the necessary models and executables by running the following command:

python main.py prerequisites

More information about the prerequisites command here

For detailed information and command-line options, refer to the help command:

python main.py -h

This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.

Inference

Single Inference

python main.py infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_path "input_path" --output_path "output_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`input_path`	Yes	None	Full path to the input audio file	Full path to the input audio file
`output_path`	Yes	None	Full path to the output audio file	Full path to the output audio file
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full index file path	Full index file path
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python main.py infer -h for additional help.

Batch Inference

python main.py batch_infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_folder_path "input_folder_path" --output_folder_path "output_folder_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`input_folder_path`	Yes	None	Full path to the input audio folder (The folder may only contain audio files)	Full path to the input audio folder
`output_folder_path`	Yes	None	Full path to the output audio folder	Full path to the output audio folder
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full path to the index file	Full path to the index file
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python main.py batch_infer -h for additional help.

TTS Inference

python main.py tts_infer --tts_text "tts_text" --tts_voice "tts_voice" --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --output_tts_path "output_tts_path" --output_rvc_path "output_rvc_path" --pth_path "pth_path" --index_path "index_path"--split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`tts_text`	Yes	None	Text for TTS synthesis	Text for TTS synthesis
`tts_voice`	Yes	None	Voice for TTS synthesis	Voice for TTS synthesis
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`output_tts_path`	Yes	None	Full path to the output TTS audio file	Full path to the output TTS audio file
`output_rvc_path`	Yes	None	Full path to the input RVC audio file	Full path to the input RVC audio file
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full path to the index file	Full path to the index file
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python main.py tts_infer -h for additional help.

Training

Preprocess Dataset

python main.py preprocess --model_name "model_name" --dataset_path "dataset_path" --sampling_rate "sampling_rate"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`dataset_path`	Yes	None	Full path to the dataset folder (The folder may only contain audio files)	Full path to the dataset folder
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data

Refer to python main.py preprocess -h for additional help.

Extract Features

python main.py extract --model_name "model_name" --rvc_version "rvc_version" --f0method "f0method" --hop_length "hop_length" --sampling_rate "sampling_rate"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	No	v2	v1 or v2	Version of the model
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.

Start Training

python main.py train --model_name "model_name" --rvc_version "rvc_version" --save_every_epoch "save_every_epoch" --save_only_latest "save_only_latest" --save_every_weights "save_every_weights" --total_epoch "total_epoch" --sampling_rate "sampling_rate" --batch_size "batch_size" --gpu "gpu" --pitch_guidance "pitch_guidance" --overtraining_detector "overtraining_detector" --overtraining_threshold "overtraining_threshold" --pretrained "pretrained" --custom_pretrained "custom_pretrained" [--g_pretrained "g_pretrained"] [--d_pretrained "d_pretrained"]

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	No	v2	v1 or v2	Version of the model
`save_every_epoch`	Yes	None	1 to 50	Determine at how many epochs the model will saved at.
`save_only_latest`	No	False	True or False	Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
`save_every_weights`	No	True	True or False	This setting enables you to save the weights of the model at the conclusion of each epoch.
`total_epoch`	No	1000	1 to 10000	Specifies the overall quantity of epochs for the model training process.
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`batch_size`	No	8	1 to 50	It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
`gpu`	No	0	0 to ∞ separated by -	Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
`pitch_guidance`	No	True	True or False	By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
`overtraining_detector`	No	False	True or False	Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
`overtraining_threshold`	No	50	1 to 100	Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
`pretrained`	No	True	True or False	Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
`custom_pretrained`	No	False	True or False	Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
`g_pretrained`	No	None	Full path to pretrained file G, only if you have used custom_pretrained	Full path to pretrained file G
`d_pretrained`	No	None	Full path to pretrained file D, only if you have used custom_pretrained	Full path to pretrained file D

Refer to python main.py train -h for additional help.

Generate Index File

python main.py index --model_name "model_name" --rvc_version "rvc_version"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	Yes	None	v1 or v2	Version of the model

Refer to python main.py index -h for additional help.

UVR

python uvr.py [audio_file] [options]

Info and Debugging

Parameter Name	Required	Default	Valid Options	Description
`audio_file`	Yes	None	Any valid audio file path	The path to the audio file you want to separate, in any common format.
`-d`, `--debug`	No	False		Enable debug logging.
`-e`, `--env_info`	No	False		Print environment information and exit.
`-l`, `--list_models`	No	False		List all supported models and exit.
`--log_level`	No	info	info, debug, warning	Log level.

Separation I/O Params

Parameter Name	Required	Default	Valid Options	Description
`-m`, `--model_filename`	No	UVR-MDX-NET-Inst_HQ_3.onnx	Any valid model file path	Model to use for separation.
`--output_format`	No	WAV	Any common audio format	Output format for separated files.
`--output_dir`	No	None	Any valid directory path	Directory to write output files.
`--model_file_dir`	No	/tmp/audio-separator-models/	Any valid directory path	Model files directory.

Common Separation Parameters

Parameter Name	Required	Default	Valid Options	Description
`--invert_spect`	No	False		Invert secondary stem using spectrogram.
`--normalization`	No	0.9	Any float value	Max peak amplitude to normalize input and output audio to.
`--single_stem`	No	None	Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other	Output only a single stem.
`--sample_rate`	No	44100	Any integer value	Modify the sample rate of the output audio.

MDXC Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--mdxc_segment_size`	No	256	Any integer value	Size of segments for MDXC architecture.
`--mdxc_use_model_segment_size`	No	False		Use model default segment size instead of the value from the config file for MDXC architecture.
`--mdxc_overlap`	No	8	2 to 50	Amount of overlap between prediction windows for MDXC architecture.
`--mdxc_batch_size`	No	1	Any integer value	Batch size for MDXC architecture.
`--mdxc_pitch_shift`	No	0	Any integer value	Shift audio pitch by a number of semitones while processing for MDXC architecture.

MDX Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--mdx_segment_size`	No	256	Any integer value	Size of segments for MDX architecture.
`--mdx_overlap`	No	0.25	0.001 to 0.999	Amount of overlap between prediction windows for MDX architecture.
`--mdx_batch_size`	No	1	Any integer value	Batch size for MDX architecture.
`--mdx_hop_length`	No	1024	Any integer value	Hop length for MDX architecture.
`--mdx_enable_denoise`	No	False		Enable denoising during separation for MDX architecture.

Demucs Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--demucs_segment_size`	No	Default	Any integer value	Size of segments for Demucs architecture.
`--demucs_shifts`	No	2	Any integer value	Number of predictions with random shifts for Demucs architecture.
`--demucs_overlap`	No	0.25	0.001 to 0.999	Overlap between prediction windows for Demucs architecture.
`--demucs_segments_enabled`	No	True		Enable segment-wise processing for Demucs architecture.

VR Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--vr_batch_size`	No	4	Any integer value	Batch size for VR architecture.
`--vr_window_size`	No	512	Any integer value	Window size for VR architecture.
`--vr_aggression`	No	5	-100 to 100	Intensity of primary stem extraction for VR architecture.
`--vr_enable_tta`	No	False		Enable Test-Time-Augmentation for VR architecture.
`--vr_high_end_process`	No	False		Mirror the missing frequency range of the output for VR architecture.
`--vr_enable_post_process`	No	False		Identify leftover artifacts within vocal output for VR architecture.
`--vr_post_process_threshold`	No	0.2	0.1 to 0.3	Threshold for post-process feature for VR architecture.

Additional Features

Model Extract

python main.py model_extract --pth_path "pth_path" --model_name "model_name" --sampling_rate "sampling_rate" --pitch_guidance "pitch_guidance" --rvc_version "rvc_version" --epoch "epoch" --step "step"

Parameter Name	Required	Default	Valid Options	Description
`pth_path`	Yes	None	Path to the pth file	Full path to the pth file
`model_name`	Yes	None	Name of the model	Name of the model
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`pitch_guidance`	Yes	None	True or False	By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
`rvc_version`	Yes	None	v1 or v2	Version of the model
`epoch`	Yes	None	1 to 10000	Specifies the overall quantity of epochs for the model training process.
`step`	Yes	None	1 to ∞	Specifies the overall quantity of steps for the model training process.

Model Information

python main.py model_information --pth_path "pth_path"

Parameter Name	Required	Default	Valid Options	Description
`pth_path`	Yes	None	Path to the pth file	Full path to the pth file

Model Blender

python main.py model_blender --model_name "model_name" --pth_path_1 "pth_path_1" --pth_path_2 "pth_path_2" --ratio "ratio"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`pth_path_1`	Yes	None	Path to the first pth file	Full path to the first pth file
`pth_path_2`	Yes	None	Path to the second pth file	Full path to the second pth file
`ratio`	No	0.5	0.0 to 1	Value for blender ratio

Launch TensorBoard

python main.py tensorboard

Download Models

Run the download script with the following command:

python main.py download --model_link "model_link"

Parameter Name	Required	Default	Valid Options	Description
`model_link`	Yes	None	Link of the model (enclosed in double quotes; Google Drive or Hugging Face)	Link of the model

Refer to python main.py download -h for additional help.

Audio Analyzer

python main.py audio_analyzer --input_path "input_path"

Parameter Name	Required	Default	Valid Options	Description
`input_path`	Yes	None	Full path to the input audio file	Full path to the input audio file

Refer to python main.py audio_analyzer -h for additional help.

Prerequisites Download

python main.py prerequisites --pretraineds_v1 "pretraineds_v1" --pretraineds_v2 "--pretraineds_v2" --models "models" --exe "exe"

Parameter Name	Required	Default	Valid Options	Description
`pretraineds_v1`	No	True	True or False	Download pretrained models for v1
`pretraineds_v2`	No	True	True or False	Download pretrained models for v2
`models`	No	True	True or False	Download models for v1 and v2
`exe`	No	True	True or False	Download the necessary executable files for the CLI to function properly (FFmpeg and FFprobe)

API

python main.py api --host "host" --port "port"

Parameter Name	Required	Default	Valid Options	Description
`host`	No	127.0.0.1	Value for host IP	Value for host IP
`port`	No	8000	Value for port number	Value for port number

To use the RVC CLI via the API, utilize the provided script. Make API requests to the following endpoints:

Docs: /docs
Ping: /ping
Infer: /infer
Batch Infer: /batch_infer
TTS: /tts
Preprocess: /preprocess
Extract: /extract
Train: /train
Index: /index
Model Information: /model_information
Model Fusion: /model_fusion
Download: /download

Make POST requests to these endpoints with the same required parameters as in CLI mode.

Credits

The RVC CLI builds upon the foundations of the following projects:

ContentVec by auspicious3000
HIFIGAN by jik876
Gradio by gradio-app
FFmpeg by FFmpeg
audio-slicer by openvpi
python-audio-separator by karaokenerds
ultimatevocalremovergui by Anjok07
RMVPE by Dream-High
FCPE by CNChTu
VITS by jaywalnut310
So-Vits-SVC by svc-develop-team
Harmonify by Eempostor
Retrieval-based-Voice-Conversion-WebUI by RVC-Project
Mangio-RVC-Fork by Mangio621

We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
.github		.github
.vscode		.vscode
logs		logs
rvc		rvc
uvr		uvr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RVC_CLI.ipynb		RVC_CLI.ipynb
api.py		api.py
install.bat		install.bat
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt
uvr.py		uvr.py

License

blaise-tk/RVC_CLI

Folders and files

Latest commit

History

Repository files navigation