Example Commands

02-2020 0.49 pAUDC, 0.64 processing time

$ python obj_detect_tracking.py \
 --model_path obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --is_load_from_pb --get_tracking \
 --tracking_objs Person,Vehicle --min_confidence 0.85 \
 --resnet50 --rpn_test_post_nms_topk 300 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

This is for processing AVI videos. For MP4 videos, run without --use_lijun. Add --log_time_and_gpu to get GPU utilization and time profile.

05-2020, added EfficientDet

The EfficientDet (CVPR 2020) (D7) is reported to be more than 12 mAP better than the Resnet-50 FPN model we used on COCO.

I have made the following changes based on the code from early May:

Added multi-level ROI align with the final detection boxes since we need the FPN box features for deep-SORT tracking. Basically since one-stage object detection models have box predictions at each feature level, I added a level index variable to keep track of each box's feature level so that in the end they can be efficiently backtracked to the original feature map and crop the features.
Similar to the MaskRCNN model, I modified the EfficientDet to allow NMS on only some of the COCO classes (currently we only care about person and vehicle) and save computations.

Example command [d0 model from early May]:

$ python obj_detect_tracking.py \
 --model_path efficientdet-d0 \
 --efficientdet_modelname efficientdet-d0 --is_efficientdet \
 --efficientdet_max_detection_topk 5000 \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

This is for processing AVI videos. I have tried it with pyav==6.2.0. Install it by

$ sudo apt-get install -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
$ sudo pip install av==6.2.0

For MP4 videos, run without --use_lijun. Add --log_time_and_gpu to get GPU utilization and time profile.

Example command with a partial frozen graph [d0-TFv1.15] (slightly faster):

$ python obj_detect_tracking.py \
 --model_path efficientd0_tfv1.15_1280x720.pb --is_load_from_pb \
 --efficientdet_modelname efficientdet-d0 --is_efficientdet \
 --efficientdet_max_detection_topk 5000 \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

[05/04/2020] Tried to optimize the frozen model with TensorRT by:

$ python tensorrt_optimize_tf1.15.py efficientd0_tfv1.15_1280x720.pb \
efficientd0_tfv1.15_1280x720_trt_fp16.pb --precision_mode FP16

But it does not work:

2020-05-04 22:11:48.850233: F tensorflow/core/framework/op_kernel.cc:875] Check failed: mutable_output(index) == nullptr (0x7f82d4244ff0 vs. nullptr)
Aborted (core dumped)

Run object detection and visualization on images. This could be used to reproduce the official repo's tutorial output:

$ python obj_detect_imgs.py --model_path efficientdet-d0 \
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
--img_lst imgs.lst --out_dir test_d0_json \
--visualize --vis_path test_d0_vis --vis_thres 0.4 \
--max_size 1920 --short_edge_size 1080 \
--efficientdet_max_detection_topk 5000

10-2020, comparing EfficientDet with MaskRCNN on video datasets

VIRAT

Models	COCO-validation-AP-80classes	VIRAT Person-Val-AP	VIRAT Vehicle-Val-AP	VIRAT Bike-Val-AP
MaskRCNN, R50-FPN	0.389	0.374	0.943	0.367
MaskRCNN, R101-FPN	0.407	0.378	0.947	0.399
EfficientDet-d2	0.425	0.371	0.949	0.293
EfficientDet-d6	0.513	0.422	0.947	0.355

AVA-Kinetics

Models	COCO-validation-AP-80classes	AVA-Kinetics Train-Person-AP	AVA-Kinetics Val-Person-AP
MaskRCNN, R101-FPN	0.407	0.664	0.682
EfficientDet-d2	0.425	0.650	0.680
EfficientDet-d6	0.513	0.623	0.658

VIRAT consists of mostly small person boxes, while AVA-Kineitcs has much bigger ones. So it seems EfficientDet is slightly better on detecting small person. However, EfficientDet-d6 is about 2.4x the inference time of MaskRCNN-R101-FPN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMMANDS.md

COMMANDS.md

Example Commands

02-2020 0.49 pAUDC, 0.64 processing time

05-2020, added EfficientDet

10-2020, comparing EfficientDet with MaskRCNN on video datasets

Files

COMMANDS.md

Latest commit

History

COMMANDS.md

File metadata and controls

Example Commands

02-2020 0.49 pAUDC, 0.64 processing time

05-2020, added EfficientDet

10-2020, comparing EfficientDet with MaskRCNN on video datasets