Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQCC size? #61

Open
JJun-Guo opened this issue May 10, 2023 · 5 comments · May be fixed by #63
Open

CQCC size? #61

JJun-Guo opened this issue May 10, 2023 · 5 comments · May be fixed by #63
Assignees
Labels
bug Something isn't working

Comments

@JJun-Guo
Copy link

Why after extracting cqcc features, the time dimension becomes 66, not the duration of the original audio?

@SuperKogito
Copy link
Owner

The resulting cqcc features should be a 2d array with the shape (num_frames x num_ceps).
The duration of the signal is not part of the dimension but the number of frames analysed is.

@JJun-Guo
Copy link
Author

The resulting cqcc features should be a 2d array with the shape (num_frames x num_ceps). The duration of the signal is not part of the dimension but the number of frames analysed is.

yes,but i got the same size cqcc with different duration audio samples,they have the same num_frames with 66

@SuperKogito
Copy link
Owner

This could be a bug, try to play with the frame length, the frame hop and the number of ceps. If the error persists, please provide a small reproduce-able example in Python that displays the error and I will try to review the code this weekend. If you have a possible solution feel free to open a PR.

@guanlongzhao
Copy link

I ran into the same issue, I think it is due to an incorrect shape handling here.

The output of dct(x=resampled_features, type=dct_type, axis=1, norm="ortho") actually has shape (num_original_ceps, num_frames), so [:, :num_ceps] would reduce cqccs to (num_original_ceps, num_ceps)

A simple unit test to reproduce this (you can run it in Google Colab),

!pip install spafe

import numpy as np
from spafe.features import cqcc
from spafe.utils import preprocessing

fs = 16000

# 10s audio, 998 10ms frames with window size 25ms 
wav = np.zeros(fs * 10, dtype=np.float32)

cqccs = cqcc.cqcc(wav,
                  fs=fs,
                  num_ceps=40,
                  pre_emph=True,
                  pre_emph_coeff=0.97,
                  window=preprocessing.SlidingWindow(0.025, 0.01, "hanning"),
                  nfft=512,
                  low_freq=0,
                  high_freq=fs/2,
                  dct_type=2,
                  lifter=None,
                  normalize=None,
                  )
assert cqccs.shape == (998, 40), f'Expect shape (num_frames x num_ceps) 998 x 40, actual shape {cqccs.shape}'

I'm not too familiar with cqcc, but if these lines are working as intended (i.e., resampling the frequency bins), then the fix is simply changing this line to cqccs = dct(x=resampled_features, type=dct_type, axis=1, norm="ortho")[:num_ceps, :]

@SuperKogito could you please confirm if I understood correctly? Thanks!

@SuperKogito SuperKogito self-assigned this Mar 22, 2024
@SuperKogito SuperKogito added the bug Something isn't working label Mar 22, 2024
@SuperKogito SuperKogito linked a pull request Mar 22, 2024 that will close this issue
@SuperKogito
Copy link
Owner

thank you both for reporting this.
The error is in the processing and not in the cropping of the array as you suggested. Please refer to #63 for a solution. This will include two changes:

  • Use a resampling_ration=1.0
  • Changing

https://github.com/SuperKogito/spafe/blob/b6b1428df52694c95bb295a6ec291ae442053fcc/spafe/features/cqcc.py#L286C27-L286C43
to
log_features = np.log(features_no_zero.T)

I still need to review the litterature and update the docs before publishing this. I will try to do this on the weekend or next week. In the mean time you can use #63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants