Some minor bugs inside Hifi-Codec code #7

rishikksh20 · 2023-05-06T08:13:38Z

As I am analyzing new HiFi-codec code I encountered three small bugs:

Torchaudio Melspectrogram :
Here :

AcademiCodec/HiFi-Codec/train.py

Line 31 in 3ee7baf

melspec = MelSpectrogram(sample_rate=24000, n_fft=s, hop_length=s//4, n_mels=64, wkwargs={"device": device}).to(device)

MelSpectrogram not imported before use :

from torchaudio.transforms import MelSpectrogram

Modules not present inside HiFi-Codec:
Here :

AcademiCodec/HiFi-Codec/msstftd.py

Line 16 in 3ee7baf

from modules import NormConv2d

modules not present inside HiFi-Codec folder. So, neede to copy or change modules reference from other model's modules implementation.
Shape of input tensor x, here :

AcademiCodec/HiFi-Codec/vqvae.py

Line 33 in 3ee7baf

c = self.encoder(x.unsqueeze(1))

While my testing with 24 khz mono channel wav shape of x before line 33 comes out -> [Batch, Samples, 1] and after .unsqueeze(1) operation at line 33 it becomes [batch, 1, samples, 1] a 4D tensor which supposed to be 3D tensor. So shape of x needed to be check before line 33 and if it has 3 dimensions and last dimension is 1 then we needed to squeeze last dimension.
After modifying and correct the shape of x, code is working fine without an error, and I am able to get desired output.

Thanks @yangdongchao .

The text was updated successfully, but these errors were encountered:

listener17 · 2023-05-06T08:26:54Z

@rishikksh20 I guess similar issues also exist for the other shared codecs?

yangdongchao · 2023-05-06T09:38:46Z

Thanks for your help, I clean some code before I push to the github, so some errors may exist.

rishikksh20 · 2023-05-06T09:40:28Z

As this repo is still a work in progress having some minor bugs are understandable, my focus currently on HiFi-Codec as I am testing that.
But yes, there are some minor bugs on other Codec code which is simple to correct, one of those is in Encodec_16k_320:

AcademiCodec/Encodec_16k_320/test.py

Line 119 in 3ee7baf

compressed = soundstream.encode(wav)

in line 119 and 121 , it should be

compressed = model.encode(wav)
print('after compression:',compressed.shape)
out = model.decode(compressed)

as

AcademiCodec/Encodec_16k_320/test.py

Line 108 in 3ee7baf

model = SoundStream(n_filters=32, D=256, ratios=[6, 5, 4, 2])

the Soundstream assigne to variable model not soundstream .

rishikksh20 mentioned this issue May 6, 2023

Remove minor bugs from HiFi-Codec code #8

Merged

rishikksh20 changed the title ~~Some minor missing dependency~~ Some minor bugs inside Hifi-Codec code May 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor bugs inside Hifi-Codec code #7

Some minor bugs inside Hifi-Codec code #7

rishikksh20 commented May 6, 2023 •

edited

listener17 commented May 6, 2023

yangdongchao commented May 6, 2023

rishikksh20 commented May 6, 2023 •

edited

Some minor bugs inside Hifi-Codec code #7

Some minor bugs inside Hifi-Codec code #7

Comments

rishikksh20 commented May 6, 2023 • edited

listener17 commented May 6, 2023

yangdongchao commented May 6, 2023

rishikksh20 commented May 6, 2023 • edited

rishikksh20 commented May 6, 2023 •

edited

rishikksh20 commented May 6, 2023 •

edited