Awesome Video-Object-Detection

This is a list of awesome articles about object detection from video.

Datasets

ImageNet VID Challenge

VisDrone Challenge

Site: http://aiskyeye.com/

Paper list

2016

Seq-NMS for Video Object Detection

[Arxiv]

Date: Feb 2016
Motivation: Smoothing the final bounding box predictions across time.
Summary: Constructing a temporal graph from overlapping bounding box detections across the adjacent frames, and using dynamic programming to select bounding box sequences with the highest overall detection score.

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

[Arxiv] [Code]

Date: Apr 2016
Summary: Using a video object detection pipeline that involves predicting optical flow first, then propagating image level predictions according to the flow, and finally using a tracking algorithm to select temporally consistent high confidence detections.
Performance: 73.8% mAP on ImageNet VID validation.

Object Detection from Video Tubelets with Convolutional Neural Networks

[Arxiv] [Code]

Date: Apr 2016

Deep Feature Flow for Video Recognition

[Arxiv] [Code]

Date: Nov 2016
Performance: 73.0% mAP on ImageNet VID validation at 29 fps on a Titan X GPU.

2017

Object Detection in Videos with Tubelet Proposal Networks

[Arxiv]

Date: Feb 2017

Flow-Guided Feature Aggregation for Video Object Detection

[Arxiv] [Code]

Date: Mar 2017
Motivation: Producing powerful spatiotemporal features.
Performance: 76.3% mAP at 1.4 fps or 78.4% (combined with Seq-NMS) at 1.1 fps on ImageNet VID validation on a Titan X GPU.

Detect to Track and Track to Detect

[Arxiv] [Summary] [Code]

Date: Oct 2017
Motivation: Smoothing the final bounding box predictions across time.
Summary: Proposing a ConvNet architecture that solves detection and tracking problems jointly and applying a Viterbi algorithm to link the detections across time.
Performance: 79.8% mAP on ImageNet VID validation.

Towards High Performance Video Object Detection

[Arxiv]

Date: Nov 2017
Motivation: Producing powerful spatiotemporal features.
Performance: 78.6% mAP on ImageNet VID validation at 13 fps on a Titan X GPU.

Video Object Detection with an Aligned Spatial-Temporal Memory

[Arxiv] [Summary] [Code] [Demo]

Date: Dec 2017
Motivation: Producing powerful spatiotemporal features.
Performance: 80.5% mAP on ImageNet VID validation.

2018

Object Detection in Videos by High Quality Object Linking

[Arxiv]

Date: Jan 2018

Towards High Performance Video Object Detection for Mobiles

[Arxiv]

Date: Apr 2018
Motivation: Producing powerful spatiotemporal features.
Performance: 60.2% mAP on ImageNet VID validation at 25.6 fps on mobiles.

Optimizing Video Object Detection via a Scale-Time Lattice

[Arxiv] [Summary] [Code]

Date: Apr 2018
Performance: 79.4% mAP at 20 fps or 79.0% at 62 fps on ImageNet VID validation on a Titan X GPU.

Object Detection in Video with Spatiotemporal Sampling Networks

[Arxiv] [Summary]

Date: Mar 2018
Motivation: Producing powerful spatiotemporal features.
Performance: 78.9% mAP or 80.4% (combined with Seq-NMS) on ImageNet VID validation.

Fully Motion-Aware Network for Video Object Detection

[Paper] [Summary]

Date: Stp. 2018
Motivation: Producing powerful spatiotemporal features.
Performance: 78.1% mAP or 80.3% (combined with Seq-NMS) on ImageNet VID validation.

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

[Arxiv] [Summary]

Date: Nov 2018
Motivation: Smoothing the final bounding box predictions across time.
Performance: 83.5% of mAP with FGFA and Deformable ConvNets v2 on ImageNet VID validation.

2019

AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling

[arXiv]

Date: Feb 2019
Motivation: Adaptively rescale the input image resolution to improve both accuracy and speed for video object detection.
Performance: 75.5% of mAP on ImageNet VID validation for 4 different multi-scale training (600, 480, 360, 240).

Improving Video Object Detection by Seq-Bbox Matching

[pdf]

Date: Feb 2019
Motivation: Smoothing the final bounding box predictions across time (box-level method).
Performance: 80.9% of mAP (offline detection) and 78.2% of mAP (online detection) both at 38 fps on a Titan X GPU.

Comparison table

Paper	Date	Base detector	Backbone	Tracking?	Optical flow?	Online?	mAP(%)	FPS (Titan X)
Seq-NMS	Feb 2016	R-FCN	ResNet101	no	no	no	76.8	2.3
T-CNN	Apr 2016	RCNN	DeepIDNet+CRAFT	yes	no	no	73.8	-
DFF	Nov 2016	R-FCN	ResNet101	no	yes	yes	73.0	29
TPN	Feb 2017	TPN	GoogLeNet	yes	no	no	68.4	-
FGFA	Mar 2017	R-FCN	ResNet101	no	yes	yes	76.3	1.4
FGFA + Seq-NMS	29 Mar 2017	R-FCN	ResNet101	no	yes	no	78.4	1.14
D&T	Oct 2017	R-FCN (15 anchors)	ResNet101	yes	no	no	79.8	7.09
STMN	Dec 2017	R-FCN	ResNet101	no	no	no	80.5	-
Scale-time-lattice	16 Apr 2018	Faster RCNN (15 anchors)	ResNet101	no	no	no	79.6	20
Scale-time-lattice	Apr 2018	Faster RCNN (15 anchors)	ResNet101	no	no	no	79.0	62
SSN (per-frame baseline for STSN)	Mar 2018	R-FCN	Deformable ResNet101	no	no	yes	76.0	-
STSN	Mar 2018	R-FCN	Deformable ResNet101	no	no	yes	78.9	-
STSN+Seq-NMS	Mar 2018	R-FCN	Deformable ResNet101	no	no	no	80.4	-
MANet	Sep. 2018	R-FCN	ResNet101	no	yes	yes	78.1	5
MANet+Seq-NMS	Sep. 2018	R-FCN	ResNet101	no	yes	no	80.3	-
Tracklet-Conditioned Detection	Nov 2018	R-FCN	ResNet101	yes	no	yes	78.1	-
Tracklet-Conditioned Detection+DCNv2	Nov 2018	R-FCN	ResNet101	yes	no	yes	82.0	-
Tracklet-Conditioned Detection+DCNv2+FGFA	Nov 2018	R-FCN	ResNet101	yes	no	yes	83.5	-
Seq-Bbox Matching	Feb 2019	YOLOv3	darknet53	no	no	no	80.9	38
Seq-Bbox Matching	Feb 2019	YOLOv3	darknet53	no	no	yes	78.2	38

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
imgs		imgs
MANet.md		MANet.md
README.md		README.md
STMN.md		STMN.md
STSN.md		STSN.md

zhanghengdev/awesome-video-object-detection

Folders and files

Latest commit

History

Repository files navigation

Awesome Video-Object-Detection

Datasets

ImageNet VID Challenge

VisDrone Challenge

Paper list

2016

Seq-NMS for Video Object Detection

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

Object Detection from Video Tubelets with Convolutional Neural Networks

Deep Feature Flow for Video Recognition

2017

Object Detection in Videos with Tubelet Proposal Networks

Flow-Guided Feature Aggregation for Video Object Detection

Detect to Track and Track to Detect

Towards High Performance Video Object Detection

Video Object Detection with an Aligned Spatial-Temporal Memory

2018

Object Detection in Videos by High Quality Object Linking

Towards High Performance Video Object Detection for Mobiles

Optimizing Video Object Detection via a Scale-Time Lattice

Object Detection in Video with Spatiotemporal Sampling Networks

Fully Motion-Aware Network for Video Object Detection

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

2019

AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling

Improving Video Object Detection by Seq-Bbox Matching

Comparison table

About

Topics

Resources

Stars

Watchers

Forks