Fix D4C waveform decompositioning threshold (improves sound quality of variance models) #187
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello all,
Recently, users from the DiffSinger community have been experimenting with lowering the threshold of the D4C waveform decompositioning step as found in
binarizer_utils.py
. The default setting for this is quite high, which can cause the following issues in models using variance parameters (tension and voicing in particular):I've set the current threshold value at 0.25; there have been suggestions from the community to put an even lower value, though I have not tested that myself. The above-mentioned value has already significantly improved the quality of my latest model, which does support the tension parameter. This improvement in quality so far seems to be consistent across the board, with multiple positive reports from users so far. This is why I think it's a good idea that a lower threshold becomes the new default during waveform decomposition.
Initial findings were done by @UtaUtaUtau, who had this to say about it:
Regards,
Lotte V