Detect to Track and Track to Detect

This project is loosely based on this paper. Here is a (probably non-exhaustive) list of differences between this implementation and the paper:

The paper suggests sampling adjacent frames (so stride 1) to be passed through the model. This implementation instead determines the stride for each example pair by sampling from a discrete laplacian distribution.
The paper suggests sampling at most 2k images per class, and 10 frames per video to address dominant classes and very long videos. This implementation instead starts by uniformly sampling a class/video, then uniformly sampling from images containing that class/frames from that video. This solves the same problem while maximizing sample diversity within classes/videos.
Training follows the approximate joint training scheme, instead of using an alternating training scheme. See this paper for a summary of training schemes commonly used for two-stage networks. I am very interested in seeing if the use of a differentiable RoI-Warping layer will significantly improve the performance of this training scheme, so that may be implemented at some point.
Any gradients related to losses contributed by anchors crossing the image boundary are not used. They are masked out before the backward pass.
Focal loss is used instead of Binary Cross-Entropy loss with Online Hard Example Mining.
This implementation uses a slightly modified version of the Viterbi algorithm for tubelet linking. The difference between this implementation of the viterbi algorithm vs the version used in the paper is that at each time step an additional "path" is added, beginning at that time step and only containing a single state. This accomodates tubelets beginning in the middle of the sequence. See detect_to_track/viterbi.py for additional details.

This project depends on my assorted collection of machine learning utilities, which can be found here. The library is very immature so please pay close attention to the version number specified in this project's requirements.txt.

The following operations are currently only implemented in CUDA (and not CPU) so this project will require an Nvidia GPU to run.

ROIPool
PSROIPool
PointwiseCorrelation

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
cfg		cfg
detect_to_track		detect_to_track
tests		tests
.gitignore		.gitignore
README.md		README.md
checks.py		checks.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cfg

cfg

detect_to_track

detect_to_track

tests

tests

.gitignore

.gitignore

README.md

README.md

checks.py

checks.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

Detect to Track and Track to Detect

About

Releases

Packages

Languages

jfc4050/detect-to-track

Folders and files

Latest commit

History

Repository files navigation

Detect to Track and Track to Detect

About

Topics

Resources

Stars

Watchers

Forks

Languages