Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative loss #5

Open
EmreOzkose opened this issue Oct 22, 2023 · 4 comments
Open

Negative loss #5

EmreOzkose opened this issue Oct 22, 2023 · 4 comments

Comments

@EmreOzkose
Copy link

Hi, thanks for sharing this great work.

I am trying to train univl and got negative loss. Is it okay? Have you ever observed this issue? I am using small batch (12). The default epoch was 1 in the code, but I set it to 40 as in the paper.

Screenshot from 2023-10-22 17-35-52

@EmreOzkose
Copy link
Author

I saw it in the paper now. CMLM and CMFM losses are negative
Screenshot from 2023-10-22 17-50-58

@EmreOzkose
Copy link
Author

I am training univl on YouCook2. When I only change batch-size=12, epoch=40 and num_thread_reader=8, I got

2023-10-22 15:06:46,979:INFO: Epoch: 1/40, Step: 20/814, Lr: 0.00009939, Loss: 1.866647
2023-10-22 15:07:03,972:INFO: Epoch: 1/40, Step: 40/814, Lr: 0.00009877, Loss: 3.736308
2023-10-22 15:07:20,059:INFO: Epoch: 1/40, Step: 60/814, Lr: 0.00009816, Loss: 4.082280
2023-10-22 15:07:37,250:INFO: Epoch: 1/40, Step: 80/814, Lr: 0.00009754, Loss: 3.955996
2023-10-22 15:07:54,521:INFO: Epoch: 1/40, Step: 100/814, Lr: 0.00009693, Loss: 3.716820
2023-10-22 15:08:10,713:INFO: Epoch: 1/40, Step: 120/814, Lr: 0.00009632, Loss: 3.448043
2023-10-22 15:08:27,875:INFO: Epoch: 1/40, Step: 140/814, Lr: 0.00009570, Loss: 3.177194
2023-10-22 15:08:45,059:INFO: Epoch: 1/40, Step: 160/814, Lr: 0.00009509, Loss: 2.960228
2023-10-22 15:09:01,251:INFO: Epoch: 1/40, Step: 180/814, Lr: 0.00009447, Loss: 2.960163
2023-10-22 15:09:18,414:INFO: Epoch: 1/40, Step: 200/814, Lr: 0.00009386, Loss: 2.978737
2023-10-22 15:09:35,416:INFO: Epoch: 1/40, Step: 220/814, Lr: 0.00009325, Loss: 2.989994
2023-10-22 15:09:51,538:INFO: Epoch: 1/40, Step: 240/814, Lr: 0.00009263, Loss: 3.486469
2023-10-22 15:10:08,708:INFO: Epoch: 1/40, Step: 260/814, Lr: 0.00009202, Loss: 3.878906
2023-10-22 15:10:25,854:INFO: Epoch: 1/40, Step: 280/814, Lr: 0.00009140, Loss: 4.187306
2023-10-22 15:10:41,975:INFO: Epoch: 1/40, Step: 300/814, Lr: 0.00009079, Loss: 4.434147
2023-10-22 15:10:59,252:INFO: Epoch: 1/40, Step: 320/814, Lr: 0.00009018, Loss: 4.645205
2023-10-22 15:11:16,365:INFO: Epoch: 1/40, Step: 340/814, Lr: 0.00008956, Loss: 4.827587
2023-10-22 15:11:32,559:INFO: Epoch: 1/40, Step: 360/814, Lr: 0.00008895, Loss: 4.988645
2023-10-22 15:11:49,671:INFO: Epoch: 1/40, Step: 380/814, Lr: 0.00008833, Loss: 5.133381
....

and

Epoch1: R@1: 0.0003 - R@5: 0.0024 - R@10: 0.0047 - Median R: 1420.0
Epoch2: R@1: 0.0009 - R@5: 0.0033 - R@10: 0.0047 - Median R: 1250.0

Is it an expected behavior? num_thread_reader affects a lot. For example, if I set num_thread_reader as 0, I got:

2023-10-22 17:06:43,930:INFO: Epoch: 1/40, Step: 20/814, Lr: 0.00009939, Loss: 1.779154
2023-10-22 17:07:01,318:INFO: Epoch: 1/40, Step: 40/814, Lr: 0.00009877, Loss: 2.946439
2023-10-22 17:07:17,514:INFO: Epoch: 1/40, Step: 60/814, Lr: 0.00009816, Loss: 3.340093
2023-10-22 17:07:34,819:INFO: Epoch: 1/40, Step: 80/814, Lr: 0.00009754, Loss: 3.044905
2023-10-22 17:07:52,372:INFO: Epoch: 1/40, Step: 100/814, Lr: 0.00009693, Loss: 1.260845
2023-10-22 17:08:08,849:INFO: Epoch: 1/40, Step: 120/814, Lr: 0.00009632, Loss: -0.255153
2023-10-22 17:08:26,377:INFO: Epoch: 1/40, Step: 140/814, Lr: 0.00009570, Loss: -1.452571
2023-10-22 17:08:44,079:INFO: Epoch: 1/40, Step: 160/814, Lr: 0.00009509, Loss: -2.369938
2023-10-22 17:09:00,729:INFO: Epoch: 1/40, Step: 180/814, Lr: 0.00009447, Loss: -3.111302
2023-10-22 17:09:18,303:INFO: Epoch: 1/40, Step: 200/814, Lr: 0.00009386, Loss: -3.725561
2023-10-22 17:09:35,800:INFO: Epoch: 1/40, Step: 220/814, Lr: 0.00009325, Loss: -4.231674
2023-10-22 17:09:52,260:INFO: Epoch: 1/40, Step: 240/814, Lr: 0.00009263, Loss: -4.650835
2023-10-22 17:10:09,855:INFO: Epoch: 1/40, Step: 260/814, Lr: 0.00009202, Loss: -4.995438
2023-10-22 17:10:27,374:INFO: Epoch: 1/40, Step: 280/814, Lr: 0.00009140, Loss: -5.271745
2023-10-22 17:10:43,840:INFO: Epoch: 1/40, Step: 300/814, Lr: 0.00009079, Loss: -5.477592
2023-10-22 17:11:01,326:INFO: Epoch: 1/40, Step: 320/814, Lr: 0.00009018, Loss: -5.562934
2023-10-22 17:11:18,837:INFO: Epoch: 1/40, Step: 340/814, Lr: 0.00008956, Loss: -5.491091
2023-10-22 17:11:35,302:INFO: Epoch: 1/40, Step: 360/814, Lr: 0.00008895, Loss: -4.861334
2023-10-22 17:11:52,936:INFO: Epoch: 1/40, Step: 380/814, Lr: 0.00008833, Loss: -4.227199
2023-10-22 17:12:10,386:INFO: Epoch: 1/40, Step: 400/814, Lr: 0.00008772, Loss: -3.648329
2023-10-22 17:12:26,914:INFO: Epoch: 1/40, Step: 420/814, Lr: 0.00008710, Loss: -3.130327

@ikodoh
Copy link
Contributor

ikodoh commented Oct 23, 2023

First, we observed that the loss is negative value and it doesn't matter to the performance.
I'm also surprised that num_thread_reader significantly affects the training. I wonder that you only change num_thread_reader and the other parameters are set identical. If so, I recommend to run the model with default setting.

@EmreOzkose
Copy link
Author

I searched a little bit and saw this topic. I think the issue of num_thread_reader is expected behavior. Unfortunately, I don't have a hardware which I can set batch to 128 :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants