Syncnet loss does not converge #146

kavita-gsphk · 2024-05-13T12:04:33Z

I am training syncnet on avspeech dataset with train_syncnet_sam.py . My training loss is stuck at 0.69 even after 500k steps. Lr and bs are 5e-5 and 64 , respectively.

I have gone through all the issues, but I haven't found any workable solution. If anyone has any suggestions, it will be a great help.

For preprocessing, I followed all the steps suggested here except the video split part. My videos average length is 7.1s (videos are in range 0-15s) and total length of training dataset is roughly 30.5hr

The text was updated successfully, but these errors were encountered:

waptak · 2024-05-24T03:27:15Z

I have the same issue , it's always around 0.69

linqiu0-0 · 2024-05-25T23:40:25Z

How long does it take to train 500k steps?

openalldoors · 2024-05-27T01:51:58Z

需要更多高质量的训练集我降到0.37了

linqiu0-0 · 2024-05-27T02:03:50Z

需要更多高质量的训练集我降到0.37了

请问你用了多少数据，训练了多少batch，大概需要多久呀？

openalldoors · 2024-05-27T02:45:29Z

2万个不到5秒的视频文件跑了36万step 我中途修改了一下训练集增加了一些高质量的训练数据。按照作者的说法我的训练集数据可能还远远不够。我先试试吧毕竟炼丹靠玄学

linqiu0-0 · 2024-05-27T02:47:50Z

非常感谢！请问你的batch size是多少呢？是的炼丹靠玄学😂

openalldoors · 2024-05-27T02:48:40Z

16

linqiu0-0 · 2024-05-27T10:25:35Z

#30 那我觉得你和这个issue里面的loss progress还挺相似的, 希望就在眼前

kavita-gsphk · 2024-05-28T20:27:49Z

@linqiu0-0 it took around 4.6 days to finish 500k steps

kavita-gsphk · 2024-05-28T20:33:15Z

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge?

openalldoors · 2024-05-29T01:32:04Z

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge?
1、你可以从B站筛选到符合条件的视频，自己再处理，这会消耗大量的时间。
2、加入更多的高质量数据集，可以显著加速loss的收敛，前提是你的数据集确实是高质量，而不是你以为是高质量。

1129571 · 2024-06-03T01:17:01Z

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge?，您使用哪个数据集来获得高质量？所以，你是说如果我包括一个更高质量的数据集，那么损失就会收敛？
1、你可以从B站筛选到符合条件的视频，自己再处理，这会消耗大量的时间。
2、加入更多的高质量数据集，可以显著加速loss的收敛，前提是你的数据集确实是高质量，而不是你以为是高质量。

请教一下，您说的高质量数据集标准是什么，另外学习率是多少，感激不尽

openalldoors · 2024-06-03T02:45:57Z

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

1129571 · 2024-06-03T02:59:26Z

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

感谢，码率、人脸我都检查过，全1080p，音画syncnet_python检测我省略了，通过降低学习率过了0.69的坎。但现在训练很慢，160w steps才到了0.44左右，而且貌似有过拟合的趋势，看见你的恢复怀疑是数据集质量的问题，能请教一下您的音画同步步骤吗

openalldoors · 2024-06-03T03:13:59Z

音画syncnet检测别省除非你有百分之百的把握你可以试着看看eval的log 如果eval数据异常的话 loss值会异常显著大于1 （大于6也是有可能的）你需要去排查。

jibingyangsf · 2024-06-03T06:27:23Z

2万个不到5秒的视频文件跑了36万step 我中途修改了一下训练集增加了一些高质量的训练数据。按照作者的说法我的训练集数据可能还远远不够。我先试试吧毕竟炼丹靠玄学

请问作者这套源码不需要调整网络结构和损失函数就可以直接训练384吗？

jibingyangsf · 2024-06-03T06:29:25Z

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

感谢，码率、人脸我都检查过，全1080p，音画syncnet_python检测我省略了，通过降低学习率过了0.69的坎。但现在训练很慢，160w steps才到了0.44左右，而且貌似有过拟合的趋势，看见你的恢复怀疑是数据集质量的问题，能请教一下您的音画同步步骤吗

这直接用syncnet_python 去跑一个开源项目 AV offset 0 就代表同步了。我也有个问题作者的源代码确定可以不用改就能跑288或者384 512的训练吗？不是说网络结构和损失函数都要和96*96 有区别吗？这里你懂不？

1129571 · 2024-06-03T06:50:30Z

音画syncnet检测别省除非你有百分之百的把握你可以试着看看eval的log 如果eval数据异常的话 loss值会异常显著大于1 （大于6也是有可能的）你需要去排查。

感谢分享经验，我目前是train0.42，eval0.45-0.43波动，因为显存小训练慢所以还不太好判断。

openalldoors · 2024-06-03T07:10:35Z

要看eval 每一条的输出看均值看不出问题来

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncnet loss does not converge #146

Syncnet loss does not converge #146

kavita-gsphk commented May 13, 2024

waptak commented May 24, 2024

linqiu0-0 commented May 25, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024 •

edited

kavita-gsphk commented May 28, 2024

kavita-gsphk commented May 28, 2024

openalldoors commented May 29, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

jibingyangsf commented Jun 3, 2024

jibingyangsf commented Jun 3, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

Syncnet loss does not converge #146

Syncnet loss does not converge #146

Comments

kavita-gsphk commented May 13, 2024

waptak commented May 24, 2024

linqiu0-0 commented May 25, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024

openalldoors commented May 27, 2024

linqiu0-0 commented May 27, 2024 • edited

kavita-gsphk commented May 28, 2024

kavita-gsphk commented May 28, 2024

openalldoors commented May 29, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

jibingyangsf commented Jun 3, 2024

jibingyangsf commented Jun 3, 2024

1129571 commented Jun 3, 2024

openalldoors commented Jun 3, 2024

linqiu0-0 commented May 27, 2024 •

edited