Something went wrong when I "Training with multiple seen domains" #21

7749546 · 2020-07-28T09:44:07Z

The loss is nan. I tried to reduce the leraning rate. But it didn't work. Could you please give me some advice?

hytseng0509 · 2020-07-29T09:34:36Z

Which pytorch version do you use?

7749546 · 2020-07-29T13:12:09Z

Thank you for your reply. My torch version is 1.2.0. But I can run this command：python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below：

7749546 · 2020-07-29T13:29:52Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

ContestantsD · 2022-03-19T10:21:52Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too

04556338896 · 2022-04-11T12:56:59Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too
Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

ContestantsD · 2022-04-11T13:02:29Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too
Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

04556338896 · 2022-04-11T13:14:04Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too
Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

ContestantsD · 2022-04-11T13:37:06Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too
Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

torch换成cuda11的，然后run一下，根据报错调吧，具体是哪儿我忘了，反正最后会卡在一个导入模型的地方，在那里调一下位置就好

04556338896 · 2022-04-11T13:41:08Z

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too
Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

torch换成cuda11的，然后run一下，根据报错调吧，具体是哪儿我忘了，反正最后会卡在一个导入模型的地方，在那里调一下位置就好

谢谢！

sx1999 · 2023-03-30T02:05:40Z

Thank you for your reply. My torch version is 1.2.0. But I can run this command：python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below：

hello, the dataset links in the code are invalid, could you please provide me with your datasets? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something went wrong when I "Training with multiple seen domains" #21

Something went wrong when I "Training with multiple seen domains" #21

7749546 commented Jul 28, 2020

hytseng0509 commented Jul 29, 2020

7749546 commented Jul 29, 2020

7749546 commented Jul 29, 2020

ContestantsD commented Mar 19, 2022

04556338896 commented Apr 11, 2022

ContestantsD commented Apr 11, 2022

04556338896 commented Apr 11, 2022

ContestantsD commented Apr 11, 2022

04556338896 commented Apr 11, 2022

sx1999 commented Mar 30, 2023

Something went wrong when I "Training with multiple seen domains" #21

Something went wrong when I "Training with multiple seen domains" #21

Comments

7749546 commented Jul 28, 2020

hytseng0509 commented Jul 29, 2020

7749546 commented Jul 29, 2020

7749546 commented Jul 29, 2020

ContestantsD commented Mar 19, 2022

04556338896 commented Apr 11, 2022

ContestantsD commented Apr 11, 2022

04556338896 commented Apr 11, 2022

ContestantsD commented Apr 11, 2022

04556338896 commented Apr 11, 2022

sx1999 commented Mar 30, 2023