Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training the net with smaller batch sizes #29

Open
lucashcarneiro opened this issue Aug 14, 2019 · 0 comments
Open

Training the net with smaller batch sizes #29

lucashcarneiro opened this issue Aug 14, 2019 · 0 comments

Comments

@lucashcarneiro
Copy link

Hi,

I am a DSP engineer and acoustician. I'm new in the field of machine learning and neural networks and I am learning a lot working on your code :).
I am trying to understand the feasibility to squeeze your VAD graph architecture to be used in real-time. Right now I am able to retrain the neural net modifying a few parameters (mainly the frame window and overlap) using the D2 dataset in your paper. Although, it looks like the best ways to improve the graph computation are:

  1. To use a less complex feature extractor other than the MRCG (I pretend to investigate this hypothesis later).
  2. To retrain the neural net for smaller batch sizes with D2. In your paper your use 4096 which corresponds to several seconds of audio.

I 've tried to perform this training multiple times for different batch sizes (2048,1024,512,...) and I failed so far in this quest. Sometimes the accuracy of the training achieves a high value but it never generalizes for the test data.

I believe something is wrong on the training parameters and that's because I am contacting you for guidelines. Did you ever try to train the neural net for smaller batch sizes? Something should be changed on the net architecture or on the training parameters?

What is the recommended relative size of the audio to be tested over batch size? More clearly: may I test a graph trained with a large batch size with a very small audio sample? My intuition is that it is not possible. That's because I would like to retrain the net with a smaller batch size so that I could perform sequential tests with an audio buffer of reasonable size (let's say 500 ms).

Finally, do you have any recommendation for simplifying the network computational complexity without sacrificing too much in performance?

I appreciate your time in sharing your knowledge and experience,
Cheers,
Lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant