Inconsistency in training loss (300W-LP) and testing loss (AFLW2000). What should be the convergence criterion and when to save best model? #131

GKG1312 · 2024-03-06T12:50:29Z

Hi @natanielruiz @tfygg ,
I am training the model again on 300W-LP dataset with filtered filenames. There is high fluctuation in the training loss as mentioned in previous issues as well #6 and #10. Even though for some iteration the losses are very low:

Epoch [25/25], Iter [600/3825] Losses: Yaw 2.5382, Pitch 25.2214, Roll 18.6293
Epoch [25/25], Iter [700/3825] Losses: Yaw 3.4427, Pitch 56.4101, Roll 60.4185
Epoch [25/25], Iter [800/3825] Losses: Yaw 3.8120, Pitch 10.9580, Roll 12.5700
Epoch [25/25], Iter [900/3825] Losses: Yaw 6.2587, Pitch 36.2516, Roll 29.5404
Epoch [25/25], Iter [1000/3825] Losses: Yaw 3.9143, Pitch 13.5918, Roll 11.6238
Epoch [25/25], Iter [1100/3825] Losses: Yaw 2.8406, Pitch 16.2069, Roll 11.7216
Epoch [25/25], Iter [1200/3825] Losses: Yaw 3.1640, Pitch 6.9615, Roll 3.9374
Epoch [25/25], Iter [1300/3825] Losses: Yaw 4.6969, Pitch 8.0815, Roll 9.0429
Epoch [25/25], Iter [1400/3825] Losses: Yaw 3.1008, Pitch 6.8233, Roll 4.4145
Epoch [25/25], Iter [1500/3825] Losses: Yaw 3.5320, Pitch 53.3095, Roll 41.4802
Epoch [25/25], Iter [1600/3825] Losses: Yaw 3.7685, Pitch 7.2890, Roll 8.7627
Epoch [25/25], Iter [1700/3825] Losses: Yaw 3.2166, Pitch 19.6407, Roll 12.9610
Epoch [25/25], Iter [1800/3825] Losses: Yaw 3.6263, Pitch 6.8446, Roll 5.8751
Epoch [25/25], Iter [1900/3825] Losses: Yaw 3.7254, Pitch 12.2385, Roll 9.0497
Epoch [25/25], Iter [2000/3825] Losses: Yaw 4.3334, Pitch 10.8476, Roll 4.3712
Epoch [25/25], Iter [2100/3825] Losses: Yaw 4.8823, Pitch 13.0971, Roll 17.6704
Epoch [25/25], Iter [2200/3825] Losses: Yaw 2.9647, Pitch 5.1831, Roll 5.9912
Epoch [25/25], Iter [2300/3825] Losses: Yaw 2.6243, Pitch 20.3848, Roll 10.7074
Epoch [25/25], Iter [2400/3825] Losses: Yaw 4.3780, Pitch 16.6918, Roll 10.1041
Epoch [25/25], Iter [2500/3825] Losses: Yaw 2.6419, Pitch 29.8599, Roll 23.3731
Epoch [25/25], Iter [2600/3825] Losses: Yaw 3.0582, Pitch 23.6246, Roll 15.0430
Epoch [25/25], Iter [2700/3825] Losses: Yaw 4.5449, Pitch 11.4036, Roll 9.0669
Epoch [25/25], Iter [2800/3825] Losses: Yaw 3.3777, Pitch 6.4258, Roll 4.7266
Epoch [25/25], Iter [2900/3825] Losses: Yaw 4.5212, Pitch 8.0623, Roll 5.5993
Epoch [25/25], Iter [3000/3825] Losses: Yaw 3.5405, Pitch 11.6594, Roll 9.8117
Epoch [25/25], Iter [3100/3825] Losses: Yaw 2.8780, Pitch 10.0156, Roll 9.4295
Epoch [25/25], Iter [3200/3825] Losses: Yaw 3.9240, Pitch 8.4466, Roll 4.5813
Epoch [25/25], Iter [3300/3825] Losses: Yaw 4.6378, Pitch 8.8315, Roll 8.9284

While testing the model on AFLW2000 I am getting slight high error in Yaw.
Test error in degrees of the model on the 1969 test images. Yaw: 13.6368, Pitch: 7.7751, Roll: 6.1729
I am saving my model based on best iteration where all errors are minimum.

Can you help me to find out the reason. Following is the command I am using to train the model (I am continuing training over a saved model):
train_hopenet.py --data_dir ".\300W_LP" --filename_list "300W_LP_filename_filtered.txt" --snapshot "Pruned_Hopenet_0.5.pth" --batch_size 32 --dataset "Pose_300W_LP" --num_epochs 25 --alpha 1 --output_string "prunedReTrain_0.5_1st" --lr 0.00001

Note : Just my observation, the error for Yaw is less as compare to others while training. However at time of testing on AFLW2000, error in Yaw is largest. Is this because of the dataset? Did you observed anything like that when you trained and tested your model?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in training loss (300W-LP) and testing loss (AFLW2000). What should be the convergence criterion and when to save best model? #131

Inconsistency in training loss (300W-LP) and testing loss (AFLW2000). What should be the convergence criterion and when to save best model? #131

GKG1312 commented Mar 6, 2024

Inconsistency in training loss (300W-LP) and testing loss (AFLW2000). What should be the convergence criterion and when to save best model? #131

Inconsistency in training loss (300W-LP) and testing loss (AFLW2000). What should be the convergence criterion and when to save best model? #131

Comments

GKG1312 commented Mar 6, 2024