Inconsistent result of coco metrics and eval side-by-side. #11190

3zhang · 2024-04-16T09:44:03Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am using the latest TensorFlow Model Garden release and TensorFlow 2.
I am reporting the issue to the correct repository. (Model Garden official or research directory)
I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://www.kaggle.com/datasets/marcosgabriel/photovoltaic-system-thermography

2. Describe the bug

I trained a mask-rcnn model (fine-tuned from mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8) based on the dataset above. The training was done successfully. However the evaluation result is confusing. The AP@IoU=0.50/0.75 are very high (~1). But based on the side-by-side image result, it should not be that high.

Evaluate annotation type bbox
DONE (t=0.14s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.815
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.815
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.854
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.854
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type segm
DONE (t=0.14s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.836
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.836
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.042
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.409
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.874
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.874
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000z

3. Steps to reproduce

eval config:
eval_config: {
metrics_set: "coco_detection_metrics"
metrics_set: "coco_mask_metrics"
eval_instance_masks: true
use_moving_averages: false
batch_size: 2
include_metrics_per_category: false
}

eval_input_reader: {
label_map_path: "/content/labelmap.txt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/content/tf_record/sample_val.record"
}
load_instance_masks: true
mask_type: PNG_MASKS
}

4. Expected behavior

By roughly manually calculation, the precision is 0.99382716049, the recall is 0.92528735632. I know this is not AP. But it clearly shows that AP should not be ~1.

5. Additional context

6. System information

Evaluation was done on google colab.

3zhang added models:research models that come under research directory type:bug Bug in the code labels Apr 16, 2024

google-ml-butler bot assigned laxmareddyp Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent result of coco metrics and eval side-by-side. #11190

Inconsistent result of coco metrics and eval side-by-side. #11190

3zhang commented Apr 16, 2024 •

edited

Inconsistent result of coco metrics and eval side-by-side. #11190

Inconsistent result of coco metrics and eval side-by-side. #11190

Comments

3zhang commented Apr 16, 2024 • edited

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

3zhang commented Apr 16, 2024 •

edited