Skip to content

zhanghengdev/awesome-video-object-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Video-Object-Detection

Intro

This is a list of awesome articles about object detection from video.

Datasets

ImageNet VID Challenge

VisDrone Challenge

Paper list

2016

Seq-NMS for Video Object Detection

[Arxiv]

  • Date: Feb 2016
  • Motivation: Smoothing the final bounding box predictions across time.
  • Summary: Constructing a temporal graph from overlapping bounding box detections across the adjacent frames, and using dynamic programming to select bounding box sequences with the highest overall detection score.

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

[Arxiv] [Code]

  • Date: Apr 2016
  • Summary: Using a video object detection pipeline that involves predicting optical flow first, then propagating image level predictions according to the flow, and finally using a tracking algorithm to select temporally consistent high confidence detections.
  • Performance: 73.8% mAP on ImageNet VID validation.

Object Detection from Video Tubelets with Convolutional Neural Networks

[Arxiv] [Code]

  • Date: Apr 2016

Deep Feature Flow for Video Recognition

[Arxiv] [Code]

  • Date: Nov 2016
  • Performance: 73.0% mAP on ImageNet VID validation at 29 fps on a Titan X GPU.

2017

Object Detection in Videos with Tubelet Proposal Networks

[Arxiv]

  • Date: Feb 2017

Flow-Guided Feature Aggregation for Video Object Detection

[Arxiv] [Code]

  • Date: Mar 2017
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 76.3% mAP at 1.4 fps or 78.4% (combined with Seq-NMS) at 1.1 fps on ImageNet VID validation on a Titan X GPU.

Detect to Track and Track to Detect

[Arxiv] [Summary] [Code]

  • Date: Oct 2017
  • Motivation: Smoothing the final bounding box predictions across time.
  • Summary: Proposing a ConvNet architecture that solves detection and tracking problems jointly and applying a Viterbi algorithm to link the detections across time.
  • Performance: 79.8% mAP on ImageNet VID validation.

Towards High Performance Video Object Detection

[Arxiv]

  • Date: Nov 2017
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 78.6% mAP on ImageNet VID validation at 13 fps on a Titan X GPU.

Video Object Detection with an Aligned Spatial-Temporal Memory

[Arxiv] [Summary] [Code] [Demo]

  • Date: Dec 2017
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 80.5% mAP on ImageNet VID validation.

2018

Object Detection in Videos by High Quality Object Linking

[Arxiv]

  • Date: Jan 2018

Towards High Performance Video Object Detection for Mobiles

[Arxiv]

  • Date: Apr 2018
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 60.2% mAP on ImageNet VID validation at 25.6 fps on mobiles.

Optimizing Video Object Detection via a Scale-Time Lattice

[Arxiv] [Summary] [Code]

  • Date: Apr 2018
  • Performance: 79.4% mAP at 20 fps or 79.0% at 62 fps on ImageNet VID validation on a Titan X GPU.

Object Detection in Video with Spatiotemporal Sampling Networks

[Arxiv] [Summary]

  • Date: Mar 2018
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 78.9% mAP or 80.4% (combined with Seq-NMS) on ImageNet VID validation.

Fully Motion-Aware Network for Video Object Detection

[Paper] [Summary]

  • Date: Stp. 2018
  • Motivation: Producing powerful spatiotemporal features.
  • Performance: 78.1% mAP or 80.3% (combined with Seq-NMS) on ImageNet VID validation.

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

[Arxiv] [Summary]

  • Date: Nov 2018
  • Motivation: Smoothing the final bounding box predictions across time.
  • Performance: 83.5% of mAP with FGFA and Deformable ConvNets v2 on ImageNet VID validation.

2019

AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling

[arXiv]

  • Date: Feb 2019
  • Motivation: Adaptively rescale the input image resolution to improve both accuracy and speed for video object detection.
  • Performance: 75.5% of mAP on ImageNet VID validation for 4 different multi-scale training (600, 480, 360, 240).

Improving Video Object Detection by Seq-Bbox Matching

[pdf]

  • Date: Feb 2019
  • Motivation: Smoothing the final bounding box predictions across time (box-level method).
  • Performance: 80.9% of mAP (offline detection) and 78.2% of mAP (online detection) both at 38 fps on a Titan X GPU.

Comparison table

Paper Date Base detector Backbone Tracking? Optical flow? Online? mAP(%) FPS (Titan X)
Seq-NMS Feb 2016 R-FCN ResNet101 no no no 76.8 2.3
T-CNN Apr 2016 RCNN DeepIDNet+CRAFT yes no no 73.8 -
DFF Nov 2016 R-FCN ResNet101 no yes yes 73.0 29
TPN Feb 2017 TPN GoogLeNet yes no no 68.4 -
FGFA Mar 2017 R-FCN ResNet101 no yes yes 76.3 1.4
FGFA + Seq-NMS 29 Mar 2017 R-FCN ResNet101 no yes no 78.4 1.14
D&T Oct 2017 R-FCN (15 anchors) ResNet101 yes no no 79.8 7.09
STMN Dec 2017 R-FCN ResNet101 no no no 80.5 -
Scale-time-lattice 16 Apr 2018 Faster RCNN (15 anchors) ResNet101 no no no 79.6 20
Scale-time-lattice Apr 2018 Faster RCNN (15 anchors) ResNet101 no no no 79.0 62
SSN (per-frame baseline for STSN) Mar 2018 R-FCN Deformable ResNet101 no no yes 76.0 -
STSN Mar 2018 R-FCN Deformable ResNet101 no no yes 78.9 -
STSN+Seq-NMS Mar 2018 R-FCN Deformable ResNet101 no no no 80.4 -
MANet Sep. 2018 R-FCN ResNet101 no yes yes 78.1 5
MANet+Seq-NMS Sep. 2018 R-FCN ResNet101 no yes no 80.3 -
Tracklet-Conditioned Detection Nov 2018 R-FCN ResNet101 yes no yes 78.1 -
Tracklet-Conditioned Detection+DCNv2 Nov 2018 R-FCN ResNet101 yes no yes 82.0 -
Tracklet-Conditioned Detection+DCNv2+FGFA Nov 2018 R-FCN ResNet101 yes no yes 83.5 -
Seq-Bbox Matching Feb 2019 YOLOv3 darknet53 no no no 80.9 38
Seq-Bbox Matching Feb 2019 YOLOv3 darknet53 no no yes 78.2 38

Releases

No releases published

Packages

No packages published