请问 Encodec_24k_32d 和 Encodec_16k_320 其实是 SoundStream 嘛 #21

yt605155624 · 2023-06-12T04:27:33Z

Hi, dongchao
我最近在调研 AudioLM 系列的文章，发现了你复现的 SoundStorm 版本比较完整打算进一步复现（因为现在 https://github.com/yangdongchao/SoundStorm 只有 S2 没有 S1），然后又看到了 AcademiCodec 这个仓库，我查看 Encodec_24k_32d 和 Encodec_16k_320 的 test.py 和训练文件 main3_ddp.py，发现加载的模型是 SoundStream

AcademiCodec/Encodec_16k_320/main3_ddp.py

Line 10 in d03142b

from net3 import SoundStream

所以是不是这两个文件本质是 SoundStream 模型，只有 Encodec_24k_240d 才是 EnCodec 模型？

yt605155624 · 2023-06-12T04:42:13Z

我又看了下 https://github.com/yangdongchao/AcademiCodec/blob/master/Encodec_24k_32d/net3.py 的 SoundStream 类和 https://github.com/yangdongchao/AcademiCodec/blob/master/Encodec_24k_240d/model.py 的 Encodec 类，发现两者是一样的（除了类名），所以想知道这几个目录到底是 Encodec 还是 SoundStream 因为从论文上看

相对于 SoundStream 加入了之前忽略的 waveform 重建 loss
比 SoundStream 效果好, 比 SoundStream (Lyra v2) 慢
repo: https://github.com/facebookresearch/encodec
额外训练了一个基于 Transformer 的小型语言模型，目标是在单个 CPU 核心上实现比实时更快的端到端压缩/解压缩速度，在单个时间步忽略了码本之间的潜在互信息。这样可以加速推断过程

而首页又提到你们认为 SoundStream 比 EnCodec 效果好

所以想确定一下这几个目录在训练的时候，是用的 SoundStream 的策略还是 EnCodec 的策略？

ps. 看到首页说希望有更多开发者贡献，同时觉得这个仓库的代码有很多冗余的情况（比如多个 SoundStream、Encodec 的代码其实可以合并成一个），我刚好有一些开源方面的经验，如果可以梳理清楚逻辑，希望可以共建 yangdongchao/AcademiCodec 和 yangdongchao/SoundStorm 这两个仓库

yt605155624 mentioned this issue Jun 13, 2023

refactor repo #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问 Encodec_24k_32d 和 Encodec_16k_320 其实是 SoundStream 嘛 #21

请问 Encodec_24k_32d 和 Encodec_16k_320 其实是 SoundStream 嘛 #21

yt605155624 commented Jun 12, 2023 •

edited

yt605155624 commented Jun 12, 2023 •

edited

请问 Encodec_24k_32d 和 Encodec_16k_320 其实是 SoundStream 嘛 #21

请问 Encodec_24k_32d 和 Encodec_16k_320 其实是 SoundStream 嘛 #21

Comments

yt605155624 commented Jun 12, 2023 • edited

yt605155624 commented Jun 12, 2023 • edited

yt605155624 commented Jun 12, 2023 •

edited

yt605155624 commented Jun 12, 2023 •

edited