Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

Latest commit

 

History

History
157 lines (141 loc) · 5.66 KB

COMMANDS.md

File metadata and controls

157 lines (141 loc) · 5.66 KB

Example Commands

02-2020 0.49 pAUDC, 0.64 processing time

$ python obj_detect_tracking.py \
 --model_path obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --is_load_from_pb --get_tracking \
 --tracking_objs Person,Vehicle --min_confidence 0.85 \
 --resnet50 --rpn_test_post_nms_topk 300 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

This is for processing AVI videos. For MP4 videos, run without --use_lijun. Add --log_time_and_gpu to get GPU utilization and time profile.

05-2020, added EfficientDet

The EfficientDet (CVPR 2020) (D7) is reported to be more than 12 mAP better than the Resnet-50 FPN model we used on COCO.

I have made the following changes based on the code from early May:

  • Added multi-level ROI align with the final detection boxes since we need the FPN box features for deep-SORT tracking. Basically since one-stage object detection models have box predictions at each feature level, I added a level index variable to keep track of each box's feature level so that in the end they can be efficiently backtracked to the original feature map and crop the features.
  • Similar to the MaskRCNN model, I modified the EfficientDet to allow NMS on only some of the COCO classes (currently we only care about person and vehicle) and save computations.

Example command [d0 model from early May]:

$ python obj_detect_tracking.py \
 --model_path efficientdet-d0 \
 --efficientdet_modelname efficientdet-d0 --is_efficientdet \
 --efficientdet_max_detection_topk 5000 \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

This is for processing AVI videos. I have tried it with pyav==6.2.0. Install it by

$ sudo apt-get install -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
$ sudo pip install av==6.2.0

For MP4 videos, run without --use_lijun. Add --log_time_and_gpu to get GPU utilization and time profile.

Example command with a partial frozen graph [d0-TFv1.15] (slightly faster):

$ python obj_detect_tracking.py \
 --model_path efficientd0_tfv1.15_1280x720.pb --is_load_from_pb \
 --efficientdet_modelname efficientdet-d0 --is_efficientdet \
 --efficientdet_max_detection_topk 5000 \
 --video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
 --version 2 --is_coco_model --use_partial_classes  --frame_gap 8 \
 --get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
 --max_size 1280 --short_edge_size 720 \
 --use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
 --max_cosine_distance 0.5 --nn_budget 5

[05/04/2020] Tried to optimize the frozen model with TensorRT by:

$ python tensorrt_optimize_tf1.15.py efficientd0_tfv1.15_1280x720.pb \
efficientd0_tfv1.15_1280x720_trt_fp16.pb --precision_mode FP16

But it does not work:

2020-05-04 22:11:48.850233: F tensorflow/core/framework/op_kernel.cc:875] Check failed: mutable_output(index) == nullptr (0x7f82d4244ff0 vs. nullptr)
Aborted (core dumped)

Run object detection and visualization on images. This could be used to reproduce the official repo's tutorial output:

$ python obj_detect_imgs.py --model_path efficientdet-d0 \
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
--img_lst imgs.lst --out_dir test_d0_json \
--visualize --vis_path test_d0_vis --vis_thres 0.4 \
--max_size 1920 --short_edge_size 1080 \
--efficientdet_max_detection_topk 5000

10-2020, comparing EfficientDet with MaskRCNN on video datasets

  1. VIRAT
Models COCO-validation-AP-80classes VIRAT Person-Val-AP VIRAT Vehicle-Val-AP VIRAT Bike-Val-AP
MaskRCNN, R50-FPN 0.389 0.374 0.943 0.367
MaskRCNN, R101-FPN 0.407 0.378 0.947 0.399
EfficientDet-d2 0.425 0.371 0.949 0.293
EfficientDet-d6 0.513 0.422 0.947 0.355
  1. AVA-Kinetics
Models COCO-validation-AP-80classes AVA-Kinetics Train-Person-AP AVA-Kinetics Val-Person-AP
MaskRCNN, R101-FPN 0.407 0.664 0.682
EfficientDet-d2 0.425 0.650 0.680
EfficientDet-d6 0.513 0.623 0.658

VIRAT consists of mostly small person boxes, while AVA-Kineitcs has much bigger ones. So it seems EfficientDet is slightly better on detecting small person. However, EfficientDet-d6 is about 2.4x the inference time of MaskRCNN-R101-FPN.