Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support additional source separation models #19

Open
JeffreyCA opened this issue Oct 11, 2020 · 15 comments
Open

Support additional source separation models #19

JeffreyCA opened this issue Oct 11, 2020 · 15 comments
Labels
enhancement New feature or request

Comments

@JeffreyCA
Copy link
Owner

JeffreyCA commented Oct 11, 2020

Summary of other models

Model Supported? Paper Source code Vocals (SDR) Drums (SDR) Bass (SDR) Other (SDR) Avg (SDR) Notes
Spleeter Yes Link Yes 6.55 5.93 5.10 4.24 5.46
Demucs Yes Link Yes 6.29 6.08 5.83 4.12 5.58
Conv-Tasnet Yes Link Yes 6.81 6.08 5.66 4.37 5.73 Worse perceived quality than Demucs
X-UMX Yes Link Yes 5.53 6.33 4.54 6.50 5.73 Slow CPU separation
D3Net Yes Link Yes 7.24 7.01 5.25 4.53 6.01 Slow CPU separation
MMDenseLSTM No Link Yes 6.6 6.43 5.16 4.15 5.59 No pretrained models
Meta-TasNet No Link Yes 6.4 5.91 5.58 4.19 5.52 Issues with higher frequencies (sum of sources do not equal original) (pfnet-research/meta-tasnet#4)
Nachmani et al. No Link No 6.92 6.15 5.88 4.32 5.82
LaSAFT No Link Yes 7.33 5.68 5.63 4.87 5.88 Looks promising! Sum of sources do not equal original (ws-choi/Conditioned-Source-Separation-LaSAFT#3 (comment))
@JeffreyCA JeffreyCA changed the title Support other source separation models Support other source separation models (e.g. Demucs) Oct 11, 2020
@JeffreyCA JeffreyCA pinned this issue Oct 12, 2020
@JeffreyCA JeffreyCA added the enhancement New feature or request label Dec 18, 2020
@JeffreyCA JeffreyCA added this to the 1.2 milestone Dec 20, 2020
@JeffreyCA
Copy link
Owner Author

JeffreyCA commented Dec 22, 2020

I will prioritize adding the following models:

@JeffreyCA JeffreyCA changed the title Support other source separation models (e.g. Demucs) Support additional source separation models Dec 22, 2020
@JeffreyCA JeffreyCA modified the milestones: 1.2, 1.3 Dec 23, 2020
@JeffreyCA JeffreyCA removed this from the 2.0 milestone Jan 9, 2021
@ws-choi
Copy link

ws-choi commented Feb 1, 2021

Hi Jeffrey! I recommend to postpone adding LaSAFT features. We are going to re-organize the code structure, aligned with the camera-ready version of the ICASSP 2021 paper (our paper was accepted to ICASSP 2021). It might cause conflicts. We'll also upload check-points of models trained on the larger scale (n_fft of 4096; currently we only support 2048). We will finish refactoring until March. Thank you.

@JeffreyCA
Copy link
Owner Author

JeffreyCA commented Feb 1, 2021

Hi @ws-choi, thanks for the update! I meant to comment earlier but I intend to only support models where the separated sources closely add up to the original source. Will your changes help with this?

I'm not very familiar with these conferences so this is the first time hearing about ICASSP and it's being hosted in Toronto this year (although virtual)! Do you know what other conferences are there related to this research field?

@ws-choi
Copy link

ws-choi commented Feb 1, 2021

Will your changes help with this? =>
This time update will not support it, but future updates might.
Since I have to change the overall structure of training for it, I need more time.
I'll let you know if LaSAFT-Net provides such features :)

What other conferences are there related to this research field? =>
International Society for Music Information Retrieval (ISMIR) is the most relevant conference.

and other ML conferences such as Neurips, ICLR, ICML, AAAI, IJCAI, IJCNN and ECAI,
or signal processing conferences such as ICASSP, interspeech might also include the state-of-the-art papers in this domain.

@JeffreyCA
Copy link
Owner Author

Awesome, thanks!

@JeffreyCA
Copy link
Owner Author

D3Net support is coming very soon!

@jacksongoode
Copy link

Would love to see LaSAFT!

@Ma5onic
Copy link
Contributor

Ma5onic commented Apr 26, 2022

@JeffreyCA, Could you please add support for the kuielab MDX-Net models? both leaderboard A and leaderboard B?
Their best model scored a 9.00 for the SDR of vocal separation, compared to the hybrid demucs model which scored a SDR of 8.13.

Model Comparison:
https://paperswithcode.com/sota/music-source-separation-on-musdb18
That list is a good reference as it lists open source models that have better than the scores that you mentioned in your original comment:

Summary of other models

Model Supported? Paper Source code Vocals (SDR) Drums (SDR) Bass (SDR) Other (SDR) Avg (SDR) Notes
Spleeter Yes Link Yes 6.55 5.93 5.10 4.24 5.46
Demucs Yes Link Yes 6.29 6.08 5.83 4.12 5.58
Conv-Tasnet Yes Link Yes 6.81 6.08 5.66 4.37 5.73 Worse perceived quality than Demucs
X-UMX Yes Link Yes 5.53 6.33 4.54 6.50 5.73 Slow CPU separation
D3Net Yes Link Yes 7.24 7.01 5.25 4.53 6.01 Slow CPU separation
MMDenseLSTM No Link Yes 6.6 6.43 5.16 4.15 5.59 No pretrained models
Meta-TasNet No Link Yes 6.4 5.91 5.58 4.19 5.52 Issues with higher frequencies (sum of sources do not equal original) (pfnet-research/meta-tasnet#4)
Nachmani et al. No Link No 6.92 6.15 5.88 4.32 5.82
LaSAFT No Link Yes 7.33 5.68 5.63 4.87 5.88 Looks promising! Sum of sources do not equal original (ws-choi/Conditioned-Source-Separation-LaSAFT#3 (comment))

Could you also update the demucs installer to also include their hybrid model?
I've used both before and will try to help out, but I can't guarantee that I can get it integrated.

@JeffreyCA
Copy link
Owner Author

Thanks for the suggestion, I'll check that out. The latest Spleeter Web already supports Demucs v3, which is the Hybrid version.
I'm always open to contributions 🙂

@Ma5onic
Copy link
Contributor

Ma5onic commented Apr 26, 2022

Awesome! I'll look at the way you deploy your containers and try to follow the same structure.

Here is a presentation that breaks down how it works:
https://ws-choi.github.io/personal/presentations/slide/2021-08-21-aicrowd

The readme was updated since I forked it:
kuielab/mdx-net-submission@80f5983
They finally added notes for adding custom models and it seems that someone already trained an improved version using the UVR dataset:
kuielab/mdx-net-submission@3dc5581
https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/MDX-Net-B
The model achieved a 9.708 SDR score on aicrowd's private testset

@JeffreyCA
Copy link
Owner Author

It's also a bit more complex as it requires Demucs 2, and Spleeter Web uses v3.

@Ma5onic
Copy link
Contributor

Ma5onic commented May 10, 2022

I'll try to get an isolated container working for the default kuielab code, then I'll see if it will work with Demucs v3 by changing the requirements.txt to the latest demucs pip release. (I highly doubt that it'll be that easy, but I'll try nonetheless)
I do have hope however, because the README of demucs v3, they mention that model a couple times & make direct comparisons to it:

When trained only on MusDB HQ, Hybrid Demucs achieved a SDR of 7.33 on the MDX test set, and 8.11 dB with 200 extra training tracks. It is particularly efficient for drums and bass extraction, although KUIELAB-MDX-Net performs better for vocals and other accompaniments.

@Ma5onic
Copy link
Contributor

Ma5onic commented Oct 2, 2022

After further investigation, I found that mdx-net uses the Demucs v2 code but downloads the Demucs v3 model. It can be installed without conflict by using anaconda/miniconda.
I just realized this, @ws-choi is one of the main contributors to that project (mdx-net).

@dts350z
Copy link

dts350z commented Dec 22, 2022

Can we have the Spleeter model with Piano (5 stems instead of 4)?

@Ma5onic
Copy link
Contributor

Ma5onic commented Jan 15, 2023

@dts350z, I think that @JeffreyCA implemented the changes that you asked for:
See pull #458
The 5 Stem Spleeter model got merged to the main branch 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants