Skip to content

Edresson/VoiceSplit

Repository files navigation

VoiceSplit

Pytorch unofficial implementation of VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Final project for SCC5830- Image Processing @ ICMC/USP.

Dataset

For the task we intend to use the LibriSpeech dataset initially. However, to use it in this task, we need to generate audios with overlappings voices.

Improvements

We use Si-SNR with PIT instead of Power Law compressed loss, because it allows us to achieve a better result ( comparison available in: https://github.com/Edresson/VoiceSplit).
We used the MISH activation function instead of ReLU and this has improved the result

Report

You can see a report of what was done in this repository here

Demos

Colab notebooks Demos:

Exp 1: link

Exp 2: link

Exp 3: link

Exp 4: link

Exp 5 (best): link

Site demo for the experiment with best results (Exp 5): https://edresson.github.io/VoiceSplit/

ToDos:

Create documentation for the repository and remove unused code

Future Works

  • Train VoiceSplit model with GE2E3k and Mean Squared Error loss function

Acknowledgment:

In this repository it contains codes of other collaborators, the due credits were given in the used functions:

Preprocessing: Eren Gölge @erogol

VoiceFilter Model: Seungwon Park @seungwonpark