Help needed!!! #359

hiren-2911 · 2020-10-18T08:05:25Z

Please help.

Hello @pierluigiferrari I am trying to train SSD300 model but i am not getting good results.
The predictor is predicting too many bounding boxes all with same confidence level.
I really want to learn about this model and train it on my won custom dataset.
I read all the previous issue.
There were two similar issue regarding same problem on SSD7 model. But they havent been answered yet.
Please help me to understand where is the actual problem.

Ezzysci · 2020-10-28T14:24:14Z

I have the same problem. I get too many bounding boxes with the confidence level of 0.09 and no bounding boxes when I set it to 0.5. I hope someone can help. I am very lost.

hiren-2911 · 2020-10-29T06:09:10Z

Hey!
I solved the problem.
Make sure you use the latest port (i.e. ssd-keras_0.9.0)
keras version-2.1.0
tensorflow-gpu==1.15
when you use this on the latest version you will get the error about img_dim_ordering()=='th' change it to img_data_format()=='channels_last'.
train it for around 500 epochs if your data set is quite small.
And yeah it works good!

hiren-2911 · 2020-10-29T06:10:00Z

if you are still facing the problem please feel free to post it, i will try to solve it

Ezzysci · 2020-10-29T15:05:59Z

I had this problem before and I solved it. I haven't tried the 500 epochs. I'm seeing though that he did some changes to his code. In a commit on march, did you implement those changes there: https://github.com/IntranelConsulting/ssd_keras_tf2/commits/master/keras_loss_function

Ezzysci · 2020-10-29T15:20:33Z

Okay, I see, I don't have tensorflow-gpu, I'll get the one specific version you mentioned. Thank you for the help! I greatly appreciate it. Still curious about those commits he made but it looks like it's for the TF2.2.0rc. Thanks for the input Hiren.

hiren-2911 · 2020-10-30T05:41:35Z

No i haven't seen those commits, but the version 0.9.0 works fine.
If you want i can send the working model with significant changes made.
the model will run even if you dont use tensorflow-gpu but the training will be too slow.
so i prefer tensorflow-gpu==1.15 for this port!

Ezzysci · 2020-11-07T01:41:15Z

Hi HIren, this is a nightmare for me: I keep get this warning

WARNING:tensorflow:Gradients do not exist for variables ['conv1_1/bias:0', 'conv1_2/bias:0', 'conv2_1/bias:0', 'conv2_2/bias:0', 'conv3_1/bias:0', 'conv3_2/bias:0', 'conv3_3/bias:0', 'conv4_1/bias:0', 'conv4_2/bias:0', 'conv4_3/bias:0', 'conv5_1/bias:0', 'conv5_2/bias:0', 'conv5_3/bias:0', 'fc6/bias:0', 'fc7/bias:0', 'conv6_1/bias:0', 'conv6_2/bias:0', 'conv7_1/bias:0', 'conv7_2/bias:0', 'conv8_1/bias:0', 'conv8_2/bias:0', 'conv9_1/bias:0', 'conv4_3_norm/conv4_3_norm_gamma:0', 'conv9_2/bias:0', 'conv4_3_norm_mbox_conf/bias:0', 'fc7_mbox_conf/bias:0', 'conv6_2_mbox_conf/bias:0', 'conv7_2_mbox_conf/bias:0', 'conv8_2_mbox_conf/bias:0', 'conv9_2_mbox_conf/bias:0', 'conv4_3_norm_mbox_loc/bias:0', 'fc7_mbox_loc/bias:0', 'conv6_2_mbox_loc/bias:0', 'conv7_2_mbox_loc/bias:0', 'conv8_2_mbox_loc/bias:0', 'conv9_2_mbox_loc/bias:0'] when minimizing the loss.

This suggests that no training is happening. I guess I will have to use the same versions you have. I get no predictions at all. Do you get this warning?

Ezzysci · 2020-11-07T01:42:02Z

@hiren-2911 I'm sorry I didn't reply earlier, I was busy with a lot of work. I just came back to this project today. My apologies.

hiren-2911 · 2020-11-07T13:21:06Z

@Ezzysci
Its okay!

Yeah i used to get those warnings earlier, but when i started using the versions which i mentioned, the warnings vanished.

Ezzysci · 2020-11-10T01:43:48Z

@hiren-2911 which tensorflow version are you using. I know you're using tensorflow-gpu 1.15 but which tensor flow are you using. I'm getting errors now after I downgraded the keras.

Ezzysci · 2020-11-10T01:50:50Z

I'm using python 3.8 and the latest tensorflow. I downgraded the keras version and it no longer works. It looks like it's not compatible with the latest tensorflow.

hiren-2911 · 2020-11-10T04:58:11Z

@Ezzysci i have mentioned that use tensorflow-gpu==1.15

DoraemonSlayer69 · 2020-11-27T19:58:01Z

WARNING:tensorflow:Gradients do not exist for variables ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0', 'classes1/bias:0', 'classes2/bias:0', 'classes3/bias:0', 'classes4/bias:0', 'classes5/bias:0', 'classes6/bias:0', 'boxes1/bias:0', 'boxes2/bias:0', 'boxes3/bias:0', 'boxes4/bias:0', 'boxes5/bias:0', 'boxes6/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0', 'classes1/bias:0', 'classes2/bias:0', 'classes3/bias:0', 'classes4/bias:0', 'classes5/bias:0', 'classes6/bias:0', 'boxes1/bias:0', 'boxes2/bias:0', 'boxes3/bias:0', 'boxes4/bias:0', 'boxes5/bias:0', 'boxes6/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0', 'classes1/bias:0', 'classes2/bias:0', 'classes3/bias:0', 'classes4/bias:0', 'classes5/bias:0', 'classes6/bias:0', 'boxes1/bias:0', 'boxes2/bias:0', 'boxes3/bias:0', 'boxes4/bias:0', 'boxes5/bias:0', 'boxes6/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0', 'classes1/bias:0', 'classes2/bias:0', 'classes3/bias:0', 'classes4/bias:0', 'classes5/bias:0', 'classes6/bias:0', 'boxes1/bias:0', 'boxes2/bias:0', 'boxes3/bias:0', 'boxes4/bias:0', 'boxes5/bias:0', 'boxes6/bias:0'] when minimizing the loss.

I still get this issue but im using tensorflow gpu 2.3. and keras 2.1
and if i use tensorflow-1.1.5 i get an error in the operation of tensors in the anchorboxes file
anyone know how to fix this

Ezzysci · 2020-11-27T20:02:00Z

where did you get tensorflow-1.1.5 I cannnot use pip to install it. What is the error in the operation of tensors in the anchorboxes file, which code is it at? The 2.3 error has far as I have analyzed has to do with the eager execution and that the y_true is empty when first being computed in the model.fit. I'm still trying to solve it.

DoraemonSlayer69 · 2020-11-27T21:09:03Z

Yeah i tried by disabling eager execution using tensorflow but now i get a different error
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Index out of range using input dim 0; input has only 0 dims
[[{{node loss_3/predictions_loss/strided_slice_6}}]]
[[loss_3/predictions_loss/cond/else/_50/mul/_1349]]
(1) Invalid argument: Index out of range using input dim 0; input has only 0 dims
[[{{node loss_3/predictions_loss/strided_slice_6}}]]
0 successful operations.
0 derived errors ignored.

and using tf 1-1.5 i get the error when doing a np.linalgspace operation in line no 198

and the input to the model.fit_generator im feeding 2 genereators where each have images as the X_label and y_true labels as a numpy array of the box and class coordinates

DoraemonSlayer69 · 2020-11-27T21:11:57Z

where did you get tensorflow-1.1.5 I cannnot use pip to install it. What is the error in the operation of tensors in the anchorboxes file, which code is it at? The 2.3 error has far as I have analyzed has to do with the eager execution and that the y_true is empty when first being computed in the model.fit. I'm still trying to solve it.

u can get tensorflow 1.1.5 by just doing pip install tensorflow-gpu==1.1.5

Ezzysci · 2020-11-27T21:16:47Z

Yes, I get the same error. If you do the hours of troubleshooting I did you get this: y_pred shape (None, 8732, 23)
y_true shape (None, None, None)

This is the feed. Slicing that y_true gives an error. My theory to fix this is to force it to be the same shape as y_pred but I don't know how to do it. I'm a noob. I haven't used 1.1.5 because hiren said it was 1.15 and I couldn't find it. This is how inexperienced I am. In either case, I wanted to port it to 2.0 so that I am on the latest and greatest. This was supposed to be one implementation and I was going to use another one with it. The project is pretty much falling apart, sadly. I'm still going to try to solve that indexing slicing issue by finding a way to set the y_true to the same shape as y_pred since this is the given for the function.

Ezzysci · 2020-11-27T21:23:47Z

Cannot convert a partially known TensorShape to a Tensor: This is what I get as the error when I try to either reshape or set the shape. It really is sad that there's little help on this.

DoraemonSlayer69 · 2020-11-27T21:33:22Z

Oh thanks man even im a noob too
but how did u extract the ypred and ytrue shape?

Ezzysci · 2020-11-27T21:39:33Z

The issue occurs in the loss function, this is the reason it's not getting trained. My interpretation is that it is not receiving any value for y_true; while in tensorflow 1.0 it would leave a placeholder for that tensor and then fill it with the values. I turned off eager execution and then I did a print function in the compute loss function. This gives me the input to the loss function. Eager Execution I think is used when fitting the model although it's turned off for tensorflow.

DoraemonSlayer69 · 2020-11-27T21:51:31Z

yeah its turned on by default if u want u can turn it on

DoraemonSlayer69 · 2020-11-27T22:22:28Z

Bro i fixed the error for some reason if i pass the images and bounding boxes as a numpy array instead of getting them using generators , i no longer get that gradient is zero warnings

temp = next(train_gen)
temp1 = next(validation_gen)

x_train = temp[0]
y_train = temp[1]

x_val = temp1[0]
y_val = temp1[1]
initial_epoch = 0
final_epoch = 1
steps_per_epoch = 10

history = ssd_model.fit(x_train,y_train,
steps_per_epoch=steps_per_epoch,
epochs=final_epoch,
validation_data=(x_val,y_val),
validation_steps=math.ceil(validation_dataset_size/batch_size),
initial_epoch=initial_epoch)

Tensor("IteratorGetNext:1", shape=(None, 68194, 14), dtype=float32)
Tensor("functional_1/predictions/concat:0", shape=(None, 68194, 14), dtype=float32)
Tensor("IteratorGetNext:1", shape=(None, 68194, 14), dtype=float32)
Tensor("functional_1/predictions/concat:0", shape=(None, 68194, 14), dtype=float32)
6/10 [=================>............] - ETA: 3s - loss: 1753.3718WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 10 batches). You may need to use the repeat() function when building your dataset.
Tensor("IteratorGetNext:1", shape=(1, 68194, 14), dtype=float32)
Tensor("functional_1/predictions/concat:0", shape=(1, 68194, 14), dtype=float32)
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 101 batches). You may need to use the repeat() function when building your dataset.
6/10 [=================>............] - 9s 2s/step - loss: 1753.3718 - val_loss: 1625.2026

Since i passed just one batch and steps were given wrong it halted abrubtly but yeah it works now

Ezzysci · 2020-11-27T22:27:21Z

so now, you'll just use next
temp = next(train_gen)
temp1 = next(validation_gen)

without the indices.

DoraemonSlayer69 · 2020-11-27T22:36:33Z

yeah but we just have to get the whole dataset using the next() function so i think we have to do that til lthe number of filenames is exceeded and then u feed the x and y train

DoraemonSlayer69 · 2020-11-28T05:45:19Z

so now, you'll just use next
temp = next(train_gen)
temp1 = next(validation_gen)

without the indices.

I finally found a best fix for the gradients being zero thing
the reason beacuse model.fit_generator expects the generator to return a tuple of (images,labels) as return type by in the generate function of obj_detection_2d data generation file it returns a list just make that function return a tuple and u wont have any of the errors and u can use the generators as usual in the model.fit_generator method and it will work

Ezzysci · 2020-11-28T05:47:33Z

makes perfect sense. I tried to make it return a tensor before and it didn't work. I will try to turn it into a tuple, is there a straightforward way to do this?

DoraemonSlayer69 · 2020-11-28T05:53:02Z

yeah
just search for the variable ret in generate fucntion
and before u do yield ret
since ret is a list
just do ret = tuple(ret)

DoraemonSlayer69 · 2020-11-28T05:54:01Z

U might get resource exhausted error so u just fiddle with the batch size but the gradient becoming zero wont happen

Ezzysci · 2020-11-29T16:30:39Z

@DoraemonSlayer69 Thank you for the input. I really appreciate the help.

DoraemonSlayer69 · 2020-11-29T19:32:55Z

Cool man
although now im getting an empty list in my Y_pred_decoded even thought the loss is in single digits for validation

Ezzysci · 2020-11-29T19:46:27Z

Did you fully port to tf2.0 in every file? https://github.com/IntranelConsulting/ssd_keras_tf2/commits/master

DoraemonSlayer69 · 2020-11-29T20:27:16Z

Yeah in that i am using the loss function only so i updated only that
i am not using the decode detections layer

Ezzysci · 2020-11-29T20:53:19Z

The y_pred function will need to be ported to 2.0 as well. Can you show me what your code is for y_pred_decoded?

DoraemonSlayer69 · 2020-11-29T21:11:10Z

sure its the file ssd_output_decoder
in that im using decode_detections_fast

Ezzysci · 2020-11-29T21:16:22Z

Well you will get an empty array because the confidence level is 0.5 which quite high. After like six to 10 iterations you should be able to see it populate. I trained mine for 100 and started seeing it populate after about 15 iterations.

DoraemonSlayer69 · 2020-11-29T21:19:52Z

Oh what i did was i trained the model for like 5 to 10 epochs and since loss wasnt changing much and on that model i call the predict function and get the ypred and then use it on the decode_detections_fast

Ezzysci · 2020-11-29T21:21:08Z

Train it for 100 epochs and 100 steps.

DoraemonSlayer69 · 2020-11-29T21:23:10Z

Oh ok will do man
although the training time is very fucked up since im training on the Wider faces dataset

zarif101 · 2020-11-29T22:12:01Z

I just want to give a big thanks to both of you (DoraemonSlayer69 and Ezzysci) because I was able to get it working thanks to your suggestions.

DoraemonSlayer69 · 2020-11-29T22:19:32Z

Cool Man
hopefully i get mine working after training for 100 epochs

stephentandjiria · 2020-12-11T12:12:07Z

Thankyou @DoraemonSlayer69 @Ezzysci, I got it working because of you two!

stale · 2020-12-19T08:42:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

around-star · 2020-12-19T11:09:04Z

Hey @DoraemonSlayer69 and @Ezzysci , I am working on pedestrian detection using PennFudanPed dataset and having issues in the bounding box prediction which sometimes give negative coordinates. Btw, I made some changes in the network using MobileNetV2 as a base layer, no. of classes is 2 , aspect ratios are 0.5 and 1.0/3.0 across all the predcition layers. I even increased the value of alpha in the loss function to 4 to prioritize the localization loss, but it dosent seem to work. Can you guys help me with this?

stale bot added the stale label Dec 19, 2020

stale bot removed the stale label Dec 19, 2020

anashad mentioned this issue Feb 15, 2021

WARNING:tensorflow:Gradients do not exist for variables ['conv4_3/bias:0',...] when minimizing the loss. #374

Open

Help needed!!! #359

Help needed!!! #359

Comments

hiren-2911 commented Oct 18, 2020

Please help.

Ezzysci commented Oct 28, 2020

hiren-2911 commented Oct 29, 2020

hiren-2911 commented Oct 29, 2020

Ezzysci commented Oct 29, 2020

Ezzysci commented Oct 29, 2020

hiren-2911 commented Oct 30, 2020

Ezzysci commented Nov 7, 2020

Ezzysci commented Nov 7, 2020

hiren-2911 commented Nov 7, 2020

Ezzysci commented Nov 10, 2020

Ezzysci commented Nov 10, 2020

hiren-2911 commented Nov 10, 2020

DoraemonSlayer69 commented Nov 27, 2020

Ezzysci commented Nov 27, 2020

DoraemonSlayer69 commented Nov 27, 2020 • edited

DoraemonSlayer69 commented Nov 27, 2020

Ezzysci commented Nov 27, 2020

Ezzysci commented Nov 27, 2020

DoraemonSlayer69 commented Nov 27, 2020

Ezzysci commented Nov 27, 2020 • edited

DoraemonSlayer69 commented Nov 27, 2020

DoraemonSlayer69 commented Nov 27, 2020

Ezzysci commented Nov 27, 2020

DoraemonSlayer69 commented Nov 27, 2020

DoraemonSlayer69 commented Nov 28, 2020

Ezzysci commented Nov 28, 2020

DoraemonSlayer69 commented Nov 28, 2020

DoraemonSlayer69 commented Nov 28, 2020

Ezzysci commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

Ezzysci commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

Ezzysci commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

Ezzysci commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

Ezzysci commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

zarif101 commented Nov 29, 2020

DoraemonSlayer69 commented Nov 29, 2020

stephentandjiria commented Dec 11, 2020

stale bot commented Dec 19, 2020

around-star commented Dec 19, 2020 • edited

DoraemonSlayer69 commented Nov 27, 2020 •

edited

Ezzysci commented Nov 27, 2020 •

edited

around-star commented Dec 19, 2020 •

edited