Improve alignment accuracy by normalizing audio features #625

IbrahimAmin1 · 2023-12-13T09:39:30Z

Audio data should be pre-processed using the Wav2Vec2Processor (Wav2Vec2FeatureExtractor), I have noticed considerable alignment improvement (Mean absolute error) when audio is normalized (zero mean and unit variance) using the processor before the forward pass.

Other than that, Each Hugging face Wav2Vec2 Feature Extractor configuration should contain the same config used during fine-tuning these models (e.g. normalization, attention_mask usage, etc..)

A typical hugging face Wav2Vec2 Feature Extractor config file is as follows:

{
  "do_normalize": true,
  "feature_size": 1,
  "padding_side": "right",
  "padding_value": 0.0,
  "return_attention_mask": true,
  "sampling_rate": 16000
}

To maintain backwards compatibility, I have opted to let the user determine if Pre-processing should be applied or not, but chose to set Pre-processing as the default option as it improves alignment considerably.

…c2Processor before the Forward pass

Fix a typo in the preprocess argument

Improve alignment accuracy by normalizing audio features using Wav2Ve…

356b5f7

…c2Processor before the Forward pass

IbrahimAmin1 changed the title ~~Improve alignment accuracy by normalizing audio features using Wav2Ve…~~ Improve alignment accuracy by normalizing audio features Dec 13, 2023

Update alignment.py

4c7631c

Fix a typo in the preprocess argument

gillens mentioned this pull request Feb 21, 2024

Question about PyPi releases #700

Open

HHousen mentioned this pull request Mar 25, 2024

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve alignment accuracy by normalizing audio features #625

Improve alignment accuracy by normalizing audio features #625

IbrahimAmin1 commented Dec 13, 2023

Improve alignment accuracy by normalizing audio features #625

Are you sure you want to change the base?

Improve alignment accuracy by normalizing audio features #625

Conversation

IbrahimAmin1 commented Dec 13, 2023