Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of ram #1

Open
jesuistay opened this issue Oct 1, 2017 · 3 comments
Open

Running out of ram #1

jesuistay opened this issue Oct 1, 2017 · 3 comments

Comments

@jesuistay
Copy link

jesuistay commented Oct 1, 2017

I couldnt get HTK to work properly, possibly due to bad installation. But seamed to work fine with librosa.

However when it comes to '===> Reading audio files... it seams like the for loop going over the audio paths just fills up my 8gb of ram and swap. And this is only on the 28539 files from train-clean-100. And it doesn't produce any files at this stage.
Is there a trick I am missing to get the preprocessor going without reading everything into ram all at once?
Eta was over 1 hour and it broke down after 54% of the train-clean-100 dataset.

@hirofumi0810
Copy link
Owner

Hi, @jesuistay

Only 1 file will be loaded In each loop for the memory efficiency, so I don't know why.
Which loop do you mean?
There are 3 loops in librispeech/inputs/input_data.py.

@jesuistay
Copy link
Author

jesuistay commented Oct 2, 2017

First one: for i, audio_path in enumerate(tqdm(audio_paths)):
To me it looks like it traverses the entire dataset and creating the dict, in order to calculate the mean std I assume.

For now I've just skipped the all the normalization and just writing the npy files after I get the input_data_utt from librosa.

@jesuistay
Copy link
Author

I managed to get HTK working, but the ram problem still confuses me. I had to increase my swap partition to 16 gb (+ 8gb of ram) just to manage and preprocess the clean-100

wolverineq pushed a commit to wolverineq/asr_preprocessing that referenced this issue Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants