Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent result of coco metrics and eval side-by-side. #11190

Open
3 tasks done
3zhang opened this issue Apr 16, 2024 · 0 comments
Open
3 tasks done

Inconsistent result of coco metrics and eval side-by-side. #11190

3zhang opened this issue Apr 16, 2024 · 0 comments
Assignees
Labels
models:research models that come under research directory type:bug Bug in the code

Comments

@3zhang
Copy link

3zhang commented Apr 16, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://www.kaggle.com/datasets/marcosgabriel/photovoltaic-system-thermography

2. Describe the bug

I trained a mask-rcnn model (fine-tuned from mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8) based on the dataset above. The training was done successfully. However the evaluation result is confusing. The AP@IoU=0.50/0.75 are very high (~1). But based on the side-by-side image result, it should not be that high.

Evaluate annotation type bbox
DONE (t=0.14s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.815
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.815
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.854
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.854
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type segm
DONE (t=0.14s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.836
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.836
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.042
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.409
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.874
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.874
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000z

image

3. Steps to reproduce

eval config:
eval_config: {
metrics_set: "coco_detection_metrics"
metrics_set: "coco_mask_metrics"
eval_instance_masks: true
use_moving_averages: false
batch_size: 2
include_metrics_per_category: false
}

eval_input_reader: {
label_map_path: "/content/labelmap.txt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/content/tf_record/sample_val.record"
}
load_instance_masks: true
mask_type: PNG_MASKS
}

4. Expected behavior

By roughly manually calculation, the precision is 0.99382716049, the recall is 0.92528735632. I know this is not AP. But it clearly shows that AP should not be ~1.

5. Additional context

6. System information

Evaluation was done on google colab.

@3zhang 3zhang added models:research models that come under research directory type:bug Bug in the code labels Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:research models that come under research directory type:bug Bug in the code
Projects
None yet
Development

No branches or pull requests

2 participants