Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions regarding the paper #70

Closed
CodeJjang opened this issue Jun 1, 2020 · 4 comments
Closed

Some questions regarding the paper #70

CodeJjang opened this issue Jun 1, 2020 · 4 comments

Comments

@CodeJjang
Copy link

I really enjoyed reading the paper, and the results look really promising.
Would love to ask several small details I did not understand though:

  1. Why isn't the Soft IoU trained jointly with the RetinaNet losses, but instead trained separately, after training the RetinaNet with it's regular losses? Did this approach give better results?
  2. Can you please elaborate more on the intuition why would this layer help us detecting objects better in cluttered scenes?
    You wrote in the paper:

... a bounding box which partially overlaps an object can still have a high objectness score, c, signifying high confidence that the object appears in the bounding box. For the same detection, we expect c_iou to be low, due to the partial overlap.

Ok, I understand it would punish smaller IoU detections, thus making detections more tight, and sensitive to occlusions like you said.
However, we have the smooth L1 loss from the regression which should penalize the network for having tight boxes already, so how does this log loss mixing IoU and predicted c_iou score achieving your target?

  1. In the ablation experiments, the basic RetinaNet outperformed the Base+NMS version, which to my understanding is effectively the same, and it's mentioned that it might be due to better implementations in the RetinaNet you tested with.
    The gap there is huge, did you try implement your architecture on top of THAT specific RetinaNet framework? I'd expect then the Base+NMS version to be equal and perhaps the final results would significally raise as well.
    May I know which RetinaNet implementation you used in the ablation tests? I guess it wasn't the base Keras implementation which this architecture is built on...

  2. Lastly, isn't it weird that Faster R-CNN got such a low score?
    Did you try adjusting it's anchors to smaller object sizes?

Thanks!

@eg4000
Copy link
Owner

eg4000 commented Jun 3, 2020

Hi,

  1. It was easier to first train classification and detection heads and then train the IoU head on the space of valid detections (detections with hard scores greater than a predefined threshold). Training everything together can be tricky because at the beginning you can't extract those valid detections. It is certainly possible, but requires proper balancing of the trained components and additional research.

  2. Some boxes visually look like correct localizations but their overlap with GT is small or even zero. It happens a lot in retail environment. You can see some examples in the presentation - It is true that the standard heads already attempt to learn it. However in such complex scenarios the space of object/non object is not always visually separable (see slide 19 for example) so the learned overlap rate contributes additional information. We also found it advantageous for identifying false positives. For instance, in slides 24-26 (cars in parking lots) some of the valid boxes are actually bushes or shadows. They got high hard scores but can be eliminated thanks to the low soft scores. Here are some examples of the soft scores after EM-merger:soft_scores

  3. I'm sorry, but I can't track the version of the original implementation. The thing is that we significantly improved the model between the submission date and the the time the paper got accepted so we published the new version of the code which in fact produces better results for both the baseline and the full method. In practice, you can view the RetinaNet and Base+NMS as the same experiment.

  4. We did adjust the anchors but used the original Faster-RCNN (which was still considered a popular baseline for comparison when we started this research...). The results will definitely get much better on Mask-RCNN or Cascade-RCNN. Our method can be implemented on those models as well.

Regards,
Eran.

@CodeJjang
Copy link
Author

CodeJjang commented Jun 3, 2020

@eg4000 Thanks, I got everything.
However I must say that regarding 2, it doesn't always happen that the soft score is lower for out of dataset objects... For instance:
soft_score_fail

Here we can clearly see that the soft score is way higher than the hard score.

Also, any chance you can please publish your recent achieved scores?
Are they coming from the weights in #9, or is it related to the CRF mecahnism shown in the above paper but does not exist in the code?

@eg4000
Copy link
Owner

eg4000 commented Jun 3, 2020

We managed to get better scaled soft scores in past experiments. See #60, #8.
It might have learned the wrong scale from the distribution of overlap rates in the train samples.
In any case, you don't need to compare the hard score with the soft score but rather (normalized) soft scores with other soft scores.

The latest mAP was ~52%. You can compute it here. The CRF is a separate paper.

Regards,
Eran.

@CodeJjang
Copy link
Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants