Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the lowest loss value can reach? #9

Closed
fengjian0106 opened this issue Nov 16, 2016 · 16 comments
Closed

what is the lowest loss value can reach? #9

fengjian0106 opened this issue Nov 16, 2016 · 16 comments

Comments

@fengjian0106
Copy link

hi, I have trained a yolo-small model to step 4648, but most of loss values are greater than 1.0, and the result of test is not very well. I want to know how well can loss value be, and could you please show some key parameters when training, e.g learning rate, training time, the final loss value, and so on.

I train the model on iMac(4 GHz Inter Core i7, 16GB memory), CPU mode.

thank you!

@thtrieu
Copy link
Owner

thtrieu commented Nov 16, 2016

What batch size are you using? Because without the batch size, step number cannot say anything about how far you've gone. According to the author of YOLO, he used pretty powerful machine and the training have two stages with the first stage (training convolution layer with average pool) takes about a week. So you should be patient if you're not that far from the beginning.

Training deep net is more of an art than science. So my suggestion is you first train your model on a small data size first to see if the model is able to overfit over training set, if not then there's a problem to solve before proceeding. Notice due to data augmentation built in the code, you can't really reach 0.0 for the loss.

I've trained a few configs on my code and the loss can shrink down well from > 10.0 to around 0.5 or below (parameters C, B, S are not relevant since the loss is averaged across the output tensor). I usually start with default learning rate 1e-5, and batch size 16 or even 8 to speed up the loss first until it stops decreasing and seem to be unstable.

Then, learning rate will be decreased down to 1e-6 and batch size increase to 32 and 64 whenever I feel that the loss get stuck (and testing still does not give good result). You can switch to other adaptive learning rate training algorithm (e.g. Adadelta, Adam, etc) if you feel like familiar with them by editing ./yolo/train.py/yolo_loss()

You can also look at the learning rate policy the YOLO author used, inside .cfg files.

Best of luck

@KentChun33333
Copy link

KentChun33333 commented Nov 16, 2016

@thtrieu What a nice suggestion !

I also encountered similar issues, and find out that pre-trained weight might be a really help. More, quality and quantities of data-itself is really important especially while training a yolo-style network, it just too hard to converge well ...

I am still struggling on this '

@fengjian0106
Copy link
Author

@thtrieu thank you~

In my first round of training, the batch size is 12. I get your point when you say patient.

My final goal is to find the bounding box of object which is not in the Imagenet, so I do the training without pre-trained model.

Thanks again!

@thtrieu thtrieu closed this as completed Nov 17, 2016
@thtrieu
Copy link
Owner

thtrieu commented Dec 26, 2016

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

@fengjian0106
Copy link
Author

@thtrieu Yes, I am looking forward to it.

@thtrieu thtrieu reopened this Jan 10, 2017
@thtrieu
Copy link
Owner

thtrieu commented Jan 10, 2017

I have updated the code for many cycles since then, so it will affect the scaling of loss value. But mechanism is the same. Here are my notes:

  1. You should really re-use the trained weights, this is a supported feature in darkflow. Preferably 2 or 3 first layers taken from the original YOLO would be good.

  2. Before training, run a fine-tuning on some trained models to see the loss value. These are converged values, so that is your goal to get down around these numbers. (Approximately 1.5 ~ 1.7)

  3. Make sure to overfit a very small training dataset before going further. This makes sure the logic is working.

  4. When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.

  5. Occasionally visualise the prediction and see what kind of mistake the model is making. In my case it was predicting almost all classes to be person due to heavily skewed data. When I gradually set the weight for class term in the loss objective higher, this mistake get less severe. Notice replicating other class’s data to achieve balance will result in an unnatural distribution of training data. So I would advise against this.

Good luck, I'd love to hear update from your training.

@MisayaZ
Copy link

MisayaZ commented Feb 7, 2017

@thtrieu I run a fine-tuning on tiny-yolo-voc models, but the loss value is approximately 6, not 1.5~1.7.

@thtrieu
Copy link
Owner

thtrieu commented Feb 7, 2017

I don't have much experience in YOLOv2, maybe @ryansun1900 does.

Here is why YOLOv2's loss is much higher than that of v1:

  • In v2, there are 13 x 13 x 5 = 845 proposal bounding boxes, each with its own confidence (objectness) and conditional class probability terms.
  • In v1, there are only 7 x 7 x 2 = 98 proposal bounding boxes, sharing the same confidence term as well as conditional class probability terms.

So the output volume of v2 is much larger than v1 (21125 vs 1470), and so is the loss.

@ryansun1900
Copy link
Contributor

So far, I don't have much experience in training large data too.
But thtrieu's explanation is correct. The loss implementation is different between yolov1 & yolov2. I think the loss difference is reasonable.

@thtrieu thtrieu closed this as completed Feb 17, 2017
@ghost
Copy link

ghost commented Apr 11, 2017

thanks for the good tips :)

@Shameendra
Copy link

Shameendra commented Sep 13, 2019

Hi ,

  1. When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.

@thtrieu can you please explain what do you mean by increase the deapth? How do we do it? By changing something in the cfg file? I am training for 9 classes with yolov2 and have creazed a cfg file called yolov2-tiny-9c.cfg. SO i make changes in this file or in the original yolov2-tiny.cfg file?

@CdAB63
Copy link

CdAB63 commented Jun 14, 2020

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

Screenshot_20200615_104714

@luthfi07
Copy link

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

Screenshot_20200615_104714

hey can you tell me how to print chart like this when you training your model?

@gitgurra
Copy link

hey can you tell me how to print chart like this when you training your model?

I think he's using AlexeyAB's repo which has GUI support.

@NayabZahra
Copy link

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

I want to get complete loss function computation as I am facing a problem in understanding it

@krkrman
Copy link

krkrman commented May 5, 2022

ranging

do not write the parameter dont_show in the training command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests