Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional questions #49

Open
Minseung-Kim opened this issue Feb 4, 2022 · 7 comments
Open

Additional questions #49

Minseung-Kim opened this issue Feb 4, 2022 · 7 comments

Comments

@Minseung-Kim
Copy link

Hello again,

I am trying to reproduce the deepxi framework in the torch (tensorflow is not so familiar to me.. lol) and have some questions.

  1. The Demand voicebank (valentini) dataset provides a training set in the form of (noisy, clean) pairs for each utterance.

When we subtract the clean from noisy, we can get the corresponding noise signal.

For the Demand voicebank dataset, did you use only those dataset pairs (they were provided)? or an additional clean or noise dataset?

In my previous question, you said that the noise recording used to corrupt the clean speech is randomly selected. (this imply noise recording should be longer than clean speech)

If then, Could you tell me how kind of additional noise recording did you use? and Have you used additional clean speech other than provided in Demand voicebank dataset?

  1. In the training step, Deepxi uses both the training set and validation set.

As far as I know, the validation set is often used for early stopping. Is the validation set in deepxi framework also be used for this purpose?

Could you explain to me how the validation set was used?

Thank you!

@anicolson
Copy link
Owner

Hi Minseung-Kim,

For 1):

That is indeed how we get the noise for DEMAND-VB, and we only use the noise from DEMAND-VB (no external noise set is used).

Any one of those noise samples can then be used to corrupt a clean speech recording. i.e., we no longer treat them as clean speech and noise pairs, we treat them as two independent sets (this is only for the training set only).

If you look at

def add_noise(self, s, d, s_len, d_len, snr):
,

A noise sample is randomly selected to corrupt a clean speech recording (only if its length is equal to or greater than the clean speech recording). If a noise sample does not meet this condition, another noise sample is selected. This continues until a noise sample is randomly selected that meets the condition.

For 2):

If I was re-implementing this framework now, I would certainly use early stopping. But, back in 2019, a maximum amount of epochs was specified, and the epoch that attained the highest validation scores was selected as the epoch to be tested.

I hope this helps, please let me know if something I said is not clear.

@anicolson
Copy link
Owner

On a side note, I am also using PyTorch and PyTorch Lightning now, let me know if you are interested in helping to update this repository to something PyTorch based :)

@Minseung-Kim
Copy link
Author

Thank you for the reply! Now I understand.
DEMAND-VB has two types of the training set, 28spk and 56spk.
What type of set did you use?

@anicolson
Copy link
Owner

We have only used the 28 speaker version.

@Minseung-Kim
Copy link
Author

Oh, thank you for the response.
In 28 speaker case, Since DEMAND-VB dataset doesn't provide a separate validation set, is it OK to set utterances of 2 spks (e.g., p.286 and p.287) of 28 as a validation set while rest 26 spks of 28 as a training set?

Or, is there any different way to set a validation set (in your experience)?

@anicolson
Copy link
Owner

Hi Minseung-Kim,

Using two of the speakers for the validation set has been the standard way. I have not personally seen it done another way :)

@zuowanbushiwo
Copy link

@anicolson
Where can I download DEMAND_VB Dataset, like deep_xi_dataset.zip. Is there a script for preparing the DEMAND_VB data ?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants