Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The length of the audio after noise reduction is shortened #71

Open
1003657663 opened this issue May 22, 2023 · 3 comments
Open

The length of the audio after noise reduction is shortened #71

1003657663 opened this issue May 22, 2023 · 3 comments

Comments

@1003657663
Copy link

Hello, I am looking for a fast noise reduction model, yours just meets my requirements, thank you for your efforts.

I'm using this model to do preprocessing for my speech recognition model. My processing process is to receive part of the audio through websocket, then denoise, then perform VAD, and then splicing the entire audio for speech recognition processing, so I perform denoising. Noisy audio is only part of an entire sentence.

My requirement is that multiple segments of audio after voice segment noise reduction can be spliced together perfectly, but after I use real_time_processing_tf_lite.py to process the sound, there will be blank parts in the spliced audio, causing the sound to freeze.

In the figure below, the upper part is the audio before processing, and the lower part is the audio after processing. It can be seen that the audio after processing is the same length as the audio before processing, but the part with waveform is shorter, so the two parts after processing Audio cannot be stitched together directly.

Can multiple segments of noise-reduced audio be spliced together perfectly? I'm new to coding and I'm not very familiar with it. Can you help me realize it?

image

@WaterBoiledPizza
Copy link

Assuming you haven't changed the parameters in the code, so the block length is still 512 and the block shift is still 128 (75% overlap):

The enhancement process does not start at the start of the input audio, but zeros at about the length of 3 block lengths = 384 samples, and the input audio will slowly shift in and out of the input buffer 128 samples at a time, hence the extra silence at the start:
image

If you slice your audio without any overlapping, the reconstructed audio will have blank parts per sliced audio length. So try overlapping in your slicing, which can reduce the effect of the blank parts.

@1003657663
Copy link
Author

Here comes another problem
I tried cutting a complete wav into multiple segments and overlapping them, compared with directly inputting the entire wav file
If I execute real_time_processing_tf_lite.py every time
Generate new input_details_1, input_details_2, output_details_1, output_details_2, then my output waveform is always different from the input complete wav.
If I keep multiplexing input_details_1, input_details_2, output_details_1, output_details_2 as global variables, then if I capture the sound from the microphone and input continuously, after too much sample data, it seems that the final waveform will have a lot of noise
I understand that this program real_time_processing_tf_lite.py is used to process a long wav file, if I want to read the audio stream from the microphone and continuously perform noise reduction,
How should I modify it so that the noise reduction results of each short audio are consistent with the noise reduction results of the entire long audio?

@zqlsnr
Copy link

zqlsnr commented Aug 18, 2023

用Pyaudio或者sounddevice实现,每次还是喂给模型512,每次替换512里面的256即可。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants