Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create the dataset #11

Open
pankaj2701 opened this issue May 21, 2018 · 2 comments
Open

How to create the dataset #11

pankaj2701 opened this issue May 21, 2018 · 2 comments

Comments

@pankaj2701
Copy link

I am wondering how to create the training dataset. I have understood the format but don't know how to create one. Do we have to manually annotate the training dataset. Manually annotation would be difficult, is there any utility which can create an approximate training data which can then refined manually.

@pankaj2701
Copy link
Author

Manual annotation of data with music, noise and speech mixed would be difficult.
Can we provide data with noise, speech or music separately. Labelling would be easier in this case.

@jtkim-kaist
Copy link
Owner

Our provided recorded dataset was manually annotated by two professional engineers.

If you want to construct the training set, do the followings:

  1. run the VAD to the clean speech to get the label.

  2. Add the noise to that clean speech using FanT tool or voice box(Matlab implemented)

  3. Then you can get the labels with noisy speech

If you don't have clean speech manual annotation cannot be avoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants