CQCC size? #61

JJun-Guo · 2023-05-10T07:29:03Z

Why after extracting cqcc features, the time dimension becomes 66, not the duration of the original audio?

SuperKogito · 2023-05-10T07:39:22Z

The resulting cqcc features should be a 2d array with the shape (num_frames x num_ceps).
The duration of the signal is not part of the dimension but the number of frames analysed is.

JJun-Guo · 2023-05-10T11:20:02Z

The resulting cqcc features should be a 2d array with the shape (num_frames x num_ceps). The duration of the signal is not part of the dimension but the number of frames analysed is.

yes,but i got the same size cqcc with different duration audio samples,they have the same num_frames with 66

SuperKogito · 2023-05-10T11:31:17Z

This could be a bug, try to play with the frame length, the frame hop and the number of ceps. If the error persists, please provide a small reproduce-able example in Python that displays the error and I will try to review the code this weekend. If you have a possible solution feel free to open a PR.

guanlongzhao · 2024-03-20T18:05:04Z

I ran into the same issue, I think it is due to an incorrect shape handling here.

The output of dct(x=resampled_features, type=dct_type, axis=1, norm="ortho") actually has shape (num_original_ceps, num_frames), so [:, :num_ceps] would reduce cqccs to (num_original_ceps, num_ceps)

A simple unit test to reproduce this (you can run it in Google Colab),

!pip install spafe

import numpy as np
from spafe.features import cqcc
from spafe.utils import preprocessing

fs = 16000

# 10s audio, 998 10ms frames with window size 25ms 
wav = np.zeros(fs * 10, dtype=np.float32)

cqccs = cqcc.cqcc(wav,
                  fs=fs,
                  num_ceps=40,
                  pre_emph=True,
                  pre_emph_coeff=0.97,
                  window=preprocessing.SlidingWindow(0.025, 0.01, "hanning"),
                  nfft=512,
                  low_freq=0,
                  high_freq=fs/2,
                  dct_type=2,
                  lifter=None,
                  normalize=None,
                  )
assert cqccs.shape == (998, 40), f'Expect shape (num_frames x num_ceps) 998 x 40, actual shape {cqccs.shape}'

I'm not too familiar with cqcc, but if these lines are working as intended (i.e., resampling the frequency bins), then the fix is simply changing this line to cqccs = dct(x=resampled_features, type=dct_type, axis=1, norm="ortho")[:num_ceps, :]

@SuperKogito could you please confirm if I understood correctly? Thanks!

SuperKogito · 2024-03-22T14:05:33Z

thank you both for reporting this.
The error is in the processing and not in the cropping of the array as you suggested. Please refer to #63 for a solution. This will include two changes:

Use a resampling_ration=1.0
Changing

https://github.com/SuperKogito/spafe/blob/b6b1428df52694c95bb295a6ec291ae442053fcc/spafe/features/cqcc.py#L286C27-L286C43
to
log_features = np.log(features_no_zero.T)

I still need to review the litterature and update the docs before publishing this. I will try to do this on the weekend or next week. In the mean time you can use #63

SuperKogito self-assigned this Mar 22, 2024

SuperKogito added the bug Something isn't working label Mar 22, 2024

SuperKogito linked a pull request Mar 22, 2024 that will close this issue

fix cqcc size bug #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQCC size? #61

CQCC size? #61

JJun-Guo commented May 10, 2023

SuperKogito commented May 10, 2023

JJun-Guo commented May 10, 2023

SuperKogito commented May 10, 2023

guanlongzhao commented Mar 20, 2024

SuperKogito commented Mar 22, 2024

CQCC size? #61

CQCC size? #61

Comments

JJun-Guo commented May 10, 2023

SuperKogito commented May 10, 2023

JJun-Guo commented May 10, 2023

SuperKogito commented May 10, 2023

guanlongzhao commented Mar 20, 2024

SuperKogito commented Mar 22, 2024