changes in trim_silence function regarding the agressive trimming issue #46

amoljagadambe · 2021-05-18T10:54:47Z

How to use this template

Under each heading below is a short outline of the information required. When submitting a PR, please delete this text and replace it with your own.

The CLA section can be deleted entirely.

Description

update the trimming function using rolling windows to create the mask around the signal
{“fixes #{35}”}

If needed follow up with as much detail as required.

Type of PR

Bugfix
Feature implementation
[ *] Refactor of code (without functional changes)
Documentation improvements
Test improvements

Documentation

The most important and tweakable part is threshold_value do play around with the value to get your desired trimming

CLA

To protect you, the project, and those who choose to use Mycroft technologies in systems they build, we ask all contributors to sign a Contributor License Agreement.

This agreement clarifies that you are granting a license to the Mycroft Project to freely use your work. Additionally, it establishes that you retain the ownership of your contributed code and intellectual property. As the owner, you are free to use your code in other work, obtain patents, or do anything else you choose with it.

If you haven't already signed the agreement and been added to our public Contributors repo then please head to https://mycroft.ai/cla to initiate the signing process.

krisgesling

Hey there, thanks for helping out with the project!

This is an interesting approach. Quite different to what I was expecting to address this issue.

The biggest question I have is whether the benefits of this approach outweigh the added complexity of the implementation?

The complexity arises in that we would be adding both numpy and pandas as requirements for this small task. It's also more complex for people wanting to read, understand and work on the code in the future.

Interested to hear your thoughts, and what the benefits that this approach brings over doing something like adding a manual buffer of x seconds to the start and end trims.

Thanks again :)

krisgesling · 2021-05-19T05:00:46Z

backend/app/audio.py

@@ -1,28 +1,28 @@
 """audio processing, etc"""
 from pydub import AudioSegment
+import pandas as pd
+import numpy as np
+import soundfile as sf


 class Audio:
    silence_threshold = -50.0


It seems this would no longer be used anywhere is that right?

no it will be used in saving the audio

krisgesling · 2021-05-19T05:03:29Z

backend/app/audio.py

-        while sound[trim_ms:trim_ms + Audio.chunk_size].dBFS \
-                < Audio.silence_threshold and trim_ms < len(sound):
-            trim_ms += Audio.chunk_size
+    def _detect_leading_silence(sound: bytearray, sample_rate: int) -> list:


The name of this method should probably be changed to reflect the new nature of it. It is no longer "detecting leading silence" it is generating a mask.

yes I am working on it

krisgesling · 2021-05-19T05:14:37Z

backend/app/audio.py



 class Audio:
    silence_threshold = -50.0
-    chunk_size = 10
+    threshold_value = 0.0007  # Tweak the value of threshold to get the aggressive trimming


What does this threshold value represent? ie what are the units?

this is the mask threshold value it will cut the mean value got from rolling windows and discard the value below threshold

krisgesling · 2021-05-19T05:14:44Z

backend/app/audio.py

+        y_mean = y.rolling(window=int(sample_rate / 20),
+                           min_periods=1,
+                           center=True).max()


Can you break this down for me? What are we getting from this and how do we know that detects silence?

krisgesling · 2021-05-19T05:17:21Z

backend/app/audio.py


-        return trim_ms
+        return [True if mean > Audio.threshold_value else False for mean in y_mean]


The result of a comparison is already a Boolean, eg x > y evaluates to either True or False. So this can be simplified to something like:

return [mean > Audio.threshold_value for mean in y_mean]

yup I think this is right

krisgesling · 2021-05-19T05:18:48Z

backend/app/audio.py

-        trimmed_sound = sound[start_trim:duration - end_trim]
+        sound, rate = sf.read(path + ".wav")
+        mask = Audio._detect_leading_silence(sound, rate)
+        trimmed_sound = sound[mask]


Is this going to remove all periods of silence, rather than only silence at the beginning and end of the clip?

it will remove all the silence from signal

I may be misunderstanding but won't this also remove any apparent "silence" during the speech? Meaning any natural pauses will be cut out and the speech will sound very run together?

I don't think this is the behaviour we are after for TTS recording. Is that your intended use case?

krisgesling · 2021-05-19T05:19:57Z

backend/app/audio.py

+import pandas as pd
+import numpy as np
+import soundfile as sf


We generally keep these as their full length names. This makes the code throughout more readable as you don't need to return and check what sf refers to for example.

amoljagadambe · 2021-05-19T07:45:52Z

I think you above statement is right NumPy and pandas are adding stress into the alpine version of python. meanwhile, I am also working on a different approach which is less heavy and computationally efficient

amoljagadambe · 2021-05-19T07:47:03Z

above approach also need some tweak in dockerfile

amoljagadambe · 2021-05-19T08:06:24Z

'ffmpeg -i {} -ab 160k -ac 2 -ar 44100 -vn {}.wav -y'.format( webm_file_name, path )

why are we using 2 channels while saving the file

krisgesling · 2021-05-26T01:13:54Z

Hey, glad that you're finding the project helpful and able to modify it to fit your use case.

The removal of all silence makes more sense now as it seems you're recording single words rather than whole sentences.

It feels like these changes would be well suited to be configuration options set in the docker-compose.yaml eg:

stereo vs mono
remove all silence vs trim start and finish

changes in trim_silence function regarding the agressive trimming issue

d9fea1a

krisgesling added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label May 19, 2021

krisgesling reviewed May 19, 2021

View reviewed changes

amoljagadambe added 4 commits May 25, 2021 16:38

removed the end trim and divide the start trim

3dfeb73

change the channel to mono

ce72d01

remove the gunicorn configs

07765f1

added the new corpus and it's variable to docker compose

1a4319e

amoljagadambe added 5 commits May 31, 2021 15:32

added the new button

4751a4a

added corpus for sentence

29f5f5e

added sentences without special symbol

f349877

added the speak button

55952e9

change the port to http for frontend

e8edfb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes in trim_silence function regarding the agressive trimming issue #46

changes in trim_silence function regarding the agressive trimming issue #46

amoljagadambe commented May 18, 2021 •

edited

krisgesling left a comment

krisgesling May 19, 2021

amoljagadambe May 19, 2021

krisgesling May 19, 2021

amoljagadambe May 19, 2021

krisgesling May 19, 2021

amoljagadambe May 19, 2021

krisgesling May 19, 2021

krisgesling May 19, 2021

amoljagadambe May 19, 2021

krisgesling May 19, 2021

amoljagadambe May 19, 2021

krisgesling May 24, 2021

krisgesling May 19, 2021

amoljagadambe commented May 19, 2021

amoljagadambe commented May 19, 2021

amoljagadambe commented May 19, 2021

krisgesling commented May 26, 2021


		return trim_ms
		return [True if mean > Audio.threshold_value else False for mean in y_mean]

changes in trim_silence function regarding the agressive trimming issue #46

Are you sure you want to change the base?

changes in trim_silence function regarding the agressive trimming issue #46

Conversation

amoljagadambe commented May 18, 2021 • edited

How to use this template

Description

Type of PR

Documentation

CLA

krisgesling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amoljagadambe commented May 19, 2021

amoljagadambe commented May 19, 2021

amoljagadambe commented May 19, 2021

krisgesling commented May 26, 2021

amoljagadambe commented May 18, 2021 •

edited