The replicate results don't match the demo. #21

quizt35 · 2024-05-11T11:12:54Z

Hello! Thanks for sharing the pre-trained models and demos.
I would like to replicate the demo results using a pretrained model. I used the data from the first row of the double-talk and converted the mp3 to wav format (single channel, 16000Hz, 16bit) for convenience. Based on the speech titles downloaded from the demo page, I selected the same pkl file to process the original speech. However, there is a significant difference between the spectrograms from the demo page and those generated using the pre-trained model. I've checked every steps and can't find the reason. Could you help me understand why?

model tag: v1.0.1
This code i used is below:

import os
from aec_eval import get_system_ckpt
import numpy as np
import librosa
import soundfile as sf

ckpt_dir = "v1.0.1_models/aec/"
name = "meta_aec_16_combo_rl_4_1024_512_r2"
date = "2022_10_19_23_43_22"
epoch = 110

ckpt_loc = os.path.join(ckpt_dir, name, date)

system, kwargs, outer_learnable = get_system_ckpt(
    ckpt_loc,
    epoch,
)
fit_infer = system.make_fit_infer(outer_learnable=outer_learnable)
fs = 16000

out_dir = "metaAF_output"
os.makedirs(out_dir, exist_ok=True)

u, _ = librosa.load("u.wav", sr=fs)
d, _ = librosa.load("d.wav", sr=fs)
s, _ = librosa.load("s.wav", sr=fs)
e = d - s

d_input = {"u": u[None, :, None], "d": d[None, :, None],
           "s": s[None, :, None], "e": e[None, :, None]
           }
pred = system.infer({"signals": d_input, "metadata": {}}, fit_infer=fit_infer)[0]
pred = np.array(pred[0, :, 0])

sf.write(os.path.join(out_dir, f"_out.wav"), pred, fs)

Looking forward to hearing from you, thanks!

The text was updated successfully, but these errors were encountered:

jmcasebeer · 2024-05-12T15:44:06Z

Hello and thanks for the question.

The demo files are all rescaled to [-1, 1] for playback (see website footnote), which is not how the AEC data was setup for training. A previous github issue here noted this issue as well and rescaled d = d / 10.

If you want to replicate my results fully, I would recommend downloading the data from the AEC challenge and using that.

quizt35 · 2024-05-14T02:50:32Z

Thanks for your reply. By setting a scale, I can get a more reasonable result, but there are still some minor issues. As shown in the figure below, there are similar impulses in the first few seconds of the speech. I'm wondering if this is due to the window or the format of the original speech. I will also follow your suggestion to test on the AEC Challenge datasets.

quizt35 · 2024-05-14T06:14:21Z

Additionally, should the URL for JAX in the ‘ReadMe - GPU Setup’ be https://storage.googleapis.com/jax-releases/jax_cuda_releases.html?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The replicate results don't match the demo. #21

The replicate results don't match the demo. #21

quizt35 commented May 11, 2024

jmcasebeer commented May 12, 2024

quizt35 commented May 14, 2024

quizt35 commented May 14, 2024

The replicate results don't match the demo. #21

The replicate results don't match the demo. #21

Comments

quizt35 commented May 11, 2024

jmcasebeer commented May 12, 2024

quizt35 commented May 14, 2024

quizt35 commented May 14, 2024