Adding training and validation accuracy to the training process #264

borasy · 2017-05-30T08:06:25Z

During training:
step1 - loss 240.92623901367188 - moving ave loss 240.92623901367188
step 2 - loss 241.2866668701172 - moving ave loss 240.96228179931643
step 3 - loss 239.79562377929688 - moving ave loss 240.84561599731447

How do I add training accuracy and validation accuracy?

step1 - loss 240.92623901367188 - moving ave loss 240.92623901367188 - train 0.221
step 2 - loss 241.2866668701172 - moving ave loss 240.96228179931643 - train 0.222
step 3 - loss 239.79562377929688 - moving ave loss 240.84561599731447 - train 0.223
Finished 1 Epoch, validation 0.210

Costyv95 · 2017-06-13T08:41:40Z

There are 2 methods of doing that. You can split the data set in train and validate sets inside the code or just send 2 separate data sets, one for train and one for validate when you call the flow module.

Anyway, in order to do that you should add some new parameters in default.py file, then modify the functions _batch, parse and shuffle from data.py (both yolo and yolov2 folders) and modify the method train() in flow.py file(here you only have to run another batch (every iteration or once a number of iterations) using the same tensorflow session, but without returning the train_op so you don't modify the weights). You can also add another tf.summary.FileWriter for validation so you can visualize your validation loss graph using tensorboard.

I personally chose to send 2 different data sets . It was pretty straight forward. I hope I was clear enough.

agjayant · 2017-06-14T06:46:27Z

@Costyv95 Can you share your code with the added parameters and the changes that you have suggested ?

Costyv95 · 2017-06-14T13:37:44Z

Yes, no problem. I will upload the files here. If you have any question , just ask.

diff.zip

crazylyf · 2017-07-04T11:50:10Z

@Costyv95 Does the validation set contribute to the gradient update in your implementation?

crazylyf · 2017-07-04T12:16:20Z

I got it, validation samples does not contribute to gradient update.

Costyv95 · 2017-07-04T12:35:50Z

Yes, validation is only for a preview of the model results outside the training set.

Costyv95 · 2017-07-04T12:38:03Z

Hi, Sorry. I didn't notice the last mail. Yes, validation is only for a preview of the model results outside the training set. On Tuesday, July 4, 2017, 3:16:24 PM GMT+3, yfliu <notifications@github.com> wrote: I got it, validation samples does not contribute to gradient update. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

dream-will · 2017-07-06T06:19:07Z

@Costyv95 I just want konw how to run it after modify the original code? Thanks very much!

dream-will · 2017-07-06T07:02:47Z

@Costyv95 I run it like this ./flow --model cfg/yolo.cfg --train --dataset "/home/thinkjoy/lwl/modify-darkflow-master/data/VOCdevkit/VOC2007/JPEGImages" --annotation "/home/thinkjoy/lwl/modify-darkflow-master/data/VOCdevkit/VOC2007/Annotations" --gpu 1.0
here is error(modify the code as the same as you)
File "/home/thinkjoy/lwl/modify-darkflow-master/darkflow/net/flow.py", line 82, in train
feed_dict[self.learning_rate] = lr
AttributeError: 'TFNet' object has no attribute 'learning_rate'

Costyv95 · 2017-07-06T09:09:19Z

This happens because the code I gave you has some modifications for adaptive learning rate and there is one more change you have to do . You find it here: 124d55d

And you should add --val_dataset and val_annotation to arguments for having a validation loss.

dream-will · 2017-07-07T07:24:51Z

@Costyv95 Can we control that how much steps to validate one time, I just think one step one val is a little waste time for training? Thanks!

dream-will · 2017-07-07T07:33:12Z

@Costyv95 And have you achieved adding the accuracy when val?

Costyv95 · 2017-07-07T08:48:36Z

@dream-will For validation once in N steps, you can easily add an argument(val_steps) in defaults.py and in the train method in flow.py you just run the code that's after the "#validation time" inside an if statement like this:

` #validation time

if i % self.FLAGS.val_steps == 0:
	(x_batch, datum) = next(val_batches)
	feed_dict = {
	    loss_ph[key]: datum[key] 
	        for key in loss_ph }
	feed_dict[self.inp] = x_batch
	feed_dict.update(self.feed)
	feed_dict[self.learning_rate] = lr

	fetches = [loss_op, self.summary_op] 
	fetched = self.sess.run(fetches, feed_dict)
	loss = fetched[0]

	if loss_mva_valid is None: loss_mva_valid = loss
	loss_mva_valid = .9 * loss_mva_valid + .1 * loss

	self.val_writer.add_summary(fetched[1], step_now)

	form = 'VALIDATION step {} - loss {} - moving ave loss {}'
	self.say(form.format(step_now, loss, loss_mva_valid))`

In the defaults.py just add this line:

self.define('val_steps', '1', 'evaluate validation loss every #val_steps iterations')

I don't quite get the second question about adding the accuracy.

dream-will · 2017-07-07T08:57:01Z

@Costyv95 Thanks for your answer,the second question just mean when validate ,we not only get the validation loss but also get the validation accuracy?

Costyv95 · 2017-07-07T09:10:55Z

@dream-will For that you have to implement yourself a custom accuracy method that compares the GT bboxes and the predicted bboxes (to get the predicted bboxes, see the code used in prediction) , but I don't see a reason for that because the loss is enough . Be aware that the validation you see is only on a random mini batch from the validation set, but this represent very well the testing loss on a big enough validation dataset.

dream-will · 2017-07-07T11:17:37Z

@Costyv95 ok,thanks

alvinxiii · 2018-02-23T05:58:38Z

Hi @Costyv95 . I'm having problem to output val loss values. I modified all files by following your instructions and codes. This is the following errors
File "flow", line 6, in <module> cliHandler(sys.argv) File "/home/alxe/ML/darkflow/darkflow/cli.py", line 26, in cliHandler tfnet = TFNet(FLAGS) File "/home/alxe/ML/darkflow/darkflow/net/build.py", line 64, in __init__ self.framework = create_framework(*args) File "/home/alxe/ML/darkflow/darkflow/net/framework.py", line 59, in create_framework return this(meta, FLAGS) File "/home/alxe/ML/darkflow/darkflow/net/framework.py", line 15, in __init__ self.constructor(meta, FLAGS) File "/home/alxe/ML/darkflow/darkflow/net/yolo/__init__.py", line 20, in constructor misc.labels(meta, FLAGS) #We're not loading from a .pb so we do need to load the labels File "/home/alxe/ML/darkflow/darkflow/net/yolo/misc.py", line 36, in labels with open(file, 'r') as f: TypeError: coercing to Unicode: need string or buffer, NoneType found

Costyv95 · 2018-02-23T11:07:23Z

Can you print the value of file variable ?

alvinxiii · 2018-02-23T11:13:13Z

@Costyv95 no I can't. This is what I run:
python flow --model cfg/tiny-yolo-voc-1c.cfg --train --dataset train/images --annotation train/annotations --load bin/yolo.weights --gpu 1.0 --epoch 300

Costyv95 · 2018-02-23T11:16:50Z

What I meant by the "file variabile" is the variabile used at line 36 in misc.py, because I cannot really understand what's wrong with your code.
You don't have any --val_dataset argument? How you implemented the change ? You split the dataset inside the code or you added the --val_dataset argument?

khanh101 · 2018-02-27T04:52:20Z

@Costyv95 Hi, I have copy and paste your files on diff.zip then i tried to train with command

"flow --train --model ./coke/yolo-coke-2c.cfg --annotation ./coke/train/annotations --dataset ./coke/train/images --gpu 1.0 --batch 8 --save 1000 --val_dataset ./coke/validation/images --val_annotation ./coke/validation/annotations

But it still got error
`
[nkhanh@localhost khanh]$ ./run_coke.sh

Parsing ./coke/yolo-coke-2c.cfg
Loading None ...
Finished in 0.0001392364501953125s
Traceback (most recent call last):
File "/usr/local/bin/flow", line 6, in
cliHandler(sys.argv)
File "/usr/local/lib64/python3.6/site-packages/darkflow/cli.py", line 26, in cliHandler
tfnet = TFNet(FLAGS)
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/build.py", line 64, in init
self.framework = create_framework(*args)
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/framework.py", line 59, in create_framework
return this(meta, FLAGS)
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/framework.py", line 15, in init
self.constructor(meta, FLAGS)
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/yolo/init.py", line 20, in constructor
misc.labels(meta, FLAGS) #We're not loading from a .pb so we do need to load the labels
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/yolo/misc.py", line 36, in labels
with open(file, 'r') as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType
`

galaktyk · 2018-03-26T05:16:52Z

@khanh1412
in misc.py line29
change it to your custom labels file.
file = 'labels.txt'

P.S. It's a temporary solution.

jplnasa5 · 2018-05-05T07:58:22Z

@Costyv95
how to understand remove '[' and ']'?

Enter training ...
Traceback (most recent call last):
File "flow", line 6, in
cliHandler(sys.argv)
File "/Users/sisyphus/darkflow/darkflow/cli.py", line 33, in cliHandler
print('Enter training ...'); tfnet.train()
File "/Users/sisyphus/darkflow/darkflow/net/flow.py", line 54, in train
arg_steps = self.FLAGS.steps[1:-1] # remove '[' and ']'
TypeError: 'NoneType' object is not subscriptable

…u#264) - the png -> jpg bug is solved - the wrong usage of glob is solved

pansu7 · 2019-03-05T06:51:19Z

hi @Costyv95

where should i add tf.summary.FileWriter for validation to visualize validation loss graph using tensorboard.

thanks

jiansfoggy · 2019-03-17T03:33:56Z

@Costyv95 I tried your zip file, diff.zip. But the terminal tells me that --val_dataset is an invalid argument. Do I need to change other files?

Physicing · 2019-05-02T16:03:05Z

@Costyv95 I tried your zip file, diff.zip. But the terminal tells me that --val_dataset is an invalid argument. Do I need to change other files?

You should replace all the files including yolo-data and yolov2-data ones. you should simply copy and paste respectively to the related folders by changing their names by just "data" to simply change the file in them.

HSHunterR · 2019-05-25T09:11:18Z

@KhanhHH
add a code “self.define('labels', 'labels.txt', 'path to labels file')” to the "def setDefaults(self):" in "darkflow\defaults.py", then you can use "--labels xxx.txt" as former.

jiansfoggy · 2019-05-25T17:15:16Z

Thank you so much!!!!!!!!

…

On May 25, 2019, at 3:11 AM, Jack ***@***.***> wrote: @KhanhHH <https://github.com/KhanhHH> add a code “self.define('labels', 'labels.txt', 'path to labels file')” to the "def setDefaults(self):" in "darkflow\defaults.py", then you can use "--labels xxx.txt" as former. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#264?email_source=notifications&email_token=AEIP7DOZYIFBYZHV22RVIADPXD7ELA5CNFSM4DNH2ANKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHLNQQ#issuecomment-495892162>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEIP7DPOGFQ2EYSSZFPZSL3PXD7ELANCNFSM4DNH2ANA>.

HSHunterR · 2019-05-26T04:27:34Z

@Costyv95
Hello, I want to know how to set the path "gs://bucket_hand_detection_2" in "darkflow\defaults.py"? my python(3.7) can't find this path, it throw error; and what dose the "bucket" represent?

attanzion · 2019-07-03T10:36:46Z

@Costyv95
Hello, I want to know how to set the path "gs://bucket_hand_detection_2" in "darkflow\defaults.py"? my python(3.7) can't find this path, it throw error; and what dose the "bucket" represent?

Same error but checkpoint are saved normally, so i don't know what is this error @Costyv95

SlavaKeshkov · 2019-07-03T14:51:50Z

thanks @Costyv95!

akmeraki · 2019-07-09T17:42:58Z

Hi, @Costyv95 Yolo trains and outputs validation loss , but after 1000 steps it throws an error. File FileNotFoundError: [Errno 2] No such file or directory: 'gsutil': 'gsutil'.

Error:

zhe0503 · 2019-10-25T11:57:47Z

@akmeraki hello i met the same problem, did you find any solution about this error ?

UsernameIsName-bot · 2020-01-06T17:50:43Z

@zhe0503 @akmeraki I hope its not too late but all i did was type "pip install gsutil" and it solved the problem!!

LaiqueTauseef · 2020-03-30T11:39:15Z

Hey guys.
What would I need to do If I want to get the accuracy of the whole trained model ?
For instance I am training my model and I stop the training at some point. Now I have the last saved checkpoint and I want to calculate the accuracy upto the last checkpoint.
The files in the cpkt folder are named as,

checkpoint
yolo-new-50.data-00000-of-00001
yolo-new-50.index
yolo-new-50.meta
yolo-new-50.profile

I would appreciate the help guys.

mfaramarzi · 2020-04-30T22:03:46Z

@Costyv95
I followed your kind instructions carefully but it seems that train.py does not recognize --val_... arguments. Would you please help me? Error is as below:
ERROR - Invalid argument: --val_dataset

mfaramarzi · 2020-04-30T22:39:20Z

This happens because the code I gave you has some modifications for adaptive learning rate and there is one more change you have to do . You find it here: 124d55d

And you should add --val_dataset and val_annotation to arguments for having a validation loss.

It doesnt work for me. I get an error as below:
ERROR - Invalid argument: --val_dataset

gangooteli mentioned this issue Dec 18, 2017

Cross validation #314

Closed

Ridhwanluthra mentioned this issue Dec 30, 2017

train loss and validation loss tending to 0 #494

Closed

samuelefiorini added a commit to samuelefiorini/darkflow that referenced this issue Oct 11, 2018

Few patches: - it is now possible to specify a validation set (thtrie…

ce77d6d

…u#264) - the png -> jpg bug is solved - the wrong usage of glob is solved

Adding training and validation accuracy to the training process #264

Adding training and validation accuracy to the training process #264

Comments

borasy commented May 30, 2017

Costyv95 commented Jun 13, 2017

agjayant commented Jun 14, 2017

Costyv95 commented Jun 14, 2017

crazylyf commented Jul 4, 2017

crazylyf commented Jul 4, 2017

Costyv95 commented Jul 4, 2017

Costyv95 commented Jul 4, 2017 via email

dream-will commented Jul 6, 2017

dream-will commented Jul 6, 2017

Costyv95 commented Jul 6, 2017 • edited

dream-will commented Jul 7, 2017

dream-will commented Jul 7, 2017

Costyv95 commented Jul 7, 2017

dream-will commented Jul 7, 2017

Costyv95 commented Jul 7, 2017

dream-will commented Jul 7, 2017

alvinxiii commented Feb 23, 2018 • edited

Costyv95 commented Feb 23, 2018

alvinxiii commented Feb 23, 2018

Costyv95 commented Feb 23, 2018 • edited

khanh101 commented Feb 27, 2018

galaktyk commented Mar 26, 2018 • edited

jplnasa5 commented May 5, 2018

pansu7 commented Mar 5, 2019

jiansfoggy commented Mar 17, 2019

Physicing commented May 2, 2019

HSHunterR commented May 25, 2019

jiansfoggy commented May 25, 2019 via email

HSHunterR commented May 26, 2019

attanzion commented Jul 3, 2019

SlavaKeshkov commented Jul 3, 2019

akmeraki commented Jul 9, 2019

zhe0503 commented Oct 25, 2019

UsernameIsName-bot commented Jan 6, 2020

LaiqueTauseef commented Mar 30, 2020

mfaramarzi commented Apr 30, 2020

mfaramarzi commented Apr 30, 2020

Costyv95 commented Jul 6, 2017 •

edited

alvinxiii commented Feb 23, 2018 •

edited

Costyv95 commented Feb 23, 2018 •

edited

galaktyk commented Mar 26, 2018 •

edited