Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training doesn't work on custom datasets #575

Open
pouvoirdasha opened this issue Jul 19, 2023 · 12 comments
Open

Training doesn't work on custom datasets #575

pouvoirdasha opened this issue Jul 19, 2023 · 12 comments

Comments

@pouvoirdasha
Copy link

Hi! First of all, thank you for your work!
I have tried to train the program on a custom dataset which I have created following your instructions. I have tried both - .mp4 and folders with .png, and none of them work - I get several errors when I try to run it:

  • caught valueerror in dataloader worker process 0
  • value error 'a' cannot be empty unless no samples are taken

The problem seems to be raised because of torch, any ideas how I can solve it?

image

@pouvoirdasha
Copy link
Author

Okay, I tried also training on mgif dataset that you have in your paper and here is what I got :
image

@AliaksandrSiarohin
Copy link
Owner

AliaksandrSiarohin commented Jul 19, 2023

For mgif, everything seems ok to me, just need to wait longer. Actually hard to say, you should check if nvidia-smi using the gpu.
For custom dataset, probably your naming is incorect, or you should disable id_sampling in the config. (see

id_sampling: True
)

@fjesikfjdskl
Copy link

Hi! First of all, thank you for your work! I have tried to train the program on a custom dataset which I have created following your instructions. I have tried both - .mp4 and folders with .png, and none of them work - I get several errors when I try to run it:

  • caught valueerror in dataloader worker process 0
  • value error 'a' cannot be empty unless no samples are taken

The problem seems to be raised because of torch, any ideas how I can solve it?

image

This problem can be solved by looking at the file name in line 103 of frames_dataset. Can you tell me what the training steps are for your custom dataset?

@fjesikfjdskl
Copy link

@AliaksandrSiarohin Hello, can you tell me that custom dataset training only needs to place.mp4 or.png files in the folder and modify the configuration file, and finally run run.py? I only did the training test on a few hundred images placed in the price folder, but I got the log.txt result as shown below. Is this result normal?
image

@pouvoirdasha
Copy link
Author

pouvoirdasha commented Jul 20, 2023

Hi! Thanks forfor your replies, in the config file I have specified 30 epochs, 2 repeats and [15,25] as epoch milestones (I have a dataset of videos split into two folders test and train, and each of folders contains separate folders (video1, ... video_n) with the videos split frame by frame in 900 png files each approx). I have tried different values for epochs/repeats etc but it doesn't seem to change the issue...

I sue nvidia geforce rtx 3090 GPUs to train the model, and even for the mgif I haven't got any results after approx. an hour of training (log.txt is empty)

@fjesikfjdskl
Copy link

@pouvoirdasha How do you choose between test sets and training sets? I put all the training images in a folder and started running python run.py.

@bajiua
Copy link

bajiua commented Oct 12, 2023

你好!首先,感谢您的工作! 我尝试在按照您的指示创建的自定义数据集上训练该程序。我已经尝试过 - .mp4 和带有 .png 的文件夹,但它们都不起作用 - 当我尝试运行它时,我遇到了几个错误:

  • 在数据加载器工作进程 0 中捕获值错误
  • 值错误“a”不能为空,除非没有采样

这个问题似乎是因为火炬而提出的,我有什么想法可以解决它吗?

图像

Hello,I had the same problem when training on mgif datasets. Have you solved it? How was it resolved? I am looking forward to your reply and would appreciate it very much if you could reply me.

@bajiua
Copy link

bajiua commented Oct 12, 2023

你好!首先,感谢您的工作!我尝试在按照您的指示创建的自定义数据集上训练该程序。我已经尝试过 - .mp4 和带有 .png 的文件夹,但它们都不起作用 - 当我尝试运行它时,我遇到了几个错误:

  • 在数据加载器工作进程 0 中捕获值错误
  • 值错误“a”不能为空,除非没有采样

这个问题似乎是因为火炬而提出的,我有什么想法可以解决它吗?
图像

这个问题可以通过查看frames_dataset第103行的文件名来解决。您能告诉我您的自定义数据集的训练步骤是什么吗?

Hello, I saw that you said that this problem could be solved by looking at the file name in line 103 of the frames_dataset. How to modify it? I would appreciate it if you could reply me.

@G-force78
Copy link

In line 70

if os.path.exists(os.path.join(root_dir, 'train')):

change the path to your training directory if os.path.exists(os.path.join(root_dir, 'yourpathhere/train')):

@bajiua
Copy link

bajiua commented Oct 16, 2023

在第 70 行

如果 os.path.exists(os.path.join(root_dir, 'train')):

如果 os.path.exists(os.path.join(root_dir, 'yourpathhere/train')) 则更改训练目录的路径:

First of all, thank you, but after I changed it, I also modified it in the yaml file, but it didn't work. Is there anything else I need to modify?

@G-force78
Copy link

I'm not sure sorry I couldnt get training to work at all, no matter what I did it would not recognise frames_dataset as a module.. If you like you can try swapping the files out for ones I've modified
here where I solved some of the problems.. Good luck
https://github.com/G-force78/articulated-animation

@bajiua
Copy link

bajiua commented Oct 17, 2023

我不确定抱歉,我根本无法接受培训,无论我做什么,它都不会将frames_dataset识别为模块。如果您愿意,您可以尝试将文件换成我在这里修改过的 文件一些问题..祝你好运 https://github.com/G-force78/铰接动画

Thank you. Did you also encounter the same problem as me in first-order-try? Are the changes you made in articulated animation already runnable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants