Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Sauradip Nag^1,2,+ Xiatian Zhu^1,3 Yi-Zhe Song^1,2 Tao Xiang^1,2

¹CVSSP, University of Surrey, UK ²iFlyTek-Surrey Joint Research Center on Artificial Intelligence, UK
³Surrey Institute for People-Centred Artificial Intelligence, UK

⁺corresponding author

Accepted to ECCV 2022

Paper | Project Page

Updates

(June, 2022) We released SPOT training and inference code for ActivityNetv1.3 dataset.
(June, 2022) SPOT is accepted by ECCV 2022.

Summary

First single-stage proposal-free framework for Semi-Supervised Temporal Action Detection (SS-TAD) task.
Being single-stage, it does not suffers from the notorius Proposal Error Propagation problem.
Proposed a novel pre-text task for Action Detection based on the notion of Random Foreground.
A novel Boundary Refinement strategy is proposed based on contrastive learning.
With just 10% labeled videos majority of the existing TAD approaches are surpassed in terms of performance.

Abstract

Existing temporal action detection (TAD) methods rely on a large number of training data with segment-level annotations. Collecting and annotating such a training set is thus highly expensive and unscalable. Semi-supervised TAD (SS-TAD) alleviates this problem by leveraging unlabeled videos freely available at scale. However, SS-TAD is also a much more challenging problem than supervised TAD, and consequently much under-studied. Prior SS-TAD methods directly combine an existing proposal-based TAD method and a SSL method. Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. Such a novel design effectively eliminates the dependence between localization and classification by cutting off the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for prediction refinement, and a new pretext task for self-supervised model pre-training. Extensive experiments on two standard benchmarks show that our SPOT outperforms state-of-the-art alternatives, often by a large margin.

Architecture

Getting Started

Requirements

Python 3.7
PyTorch == 1.9.0 (Please make sure your pytorch version is atleast 1.8)
NVIDIA GPU
Kornia

Environment Setup

It is suggested to create a Conda environment and install the following requirements

pip3 install -r requirements.txt

Download Features

Download the video features and update the Video paths/output paths in config/anet.yaml file. For now ActivityNetv1.3 dataset config is available. We are planning to release the code for THUMOS14 dataset soon.

Dataset	Feature Backbone	Pre-Training	Link
ActivityNet	TSN	Kinetics-400	Google Drive
THUMOS	TSN	Kinetics-400	Google Drive
ActivityNet	I3D	Kinetics-400	Google Drive
THUMOS	I3D	Kinetics-400	Google Drive

Model Training

To train SPOT from scratch run the following command. The training configurations can be adjusted from config/anet.yaml file. This training includes both Pre-training and the fine-tuning stages.

python spot_train.py

Model Inference

We provide the pretrained models containing the checkpoint for I3D features on ActivityNetv1.3 . It can be found in the Link

After downloading the checkpoints, the checkpoints path can be saved in config/anet.yaml file. The model inference can be then performed using the following command

python spot_inference.py

Model Evaluation

To evaluate our TAGS model run the following command.

python eval.py

Performance

Qualitative Results

TO-DO Checklist

Support for THUMOS14 dataset
Enable multi-gpu training

Acknowlegement

This code repository has borrowed some parts of SSTAP and BMN. We thank the author for open-sourcing their codes and clarifying the doubts.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@article{nag2022temporal,
  title={Temporal Action Detection with Global Segmentation Mask Learning},
  author={Nag, Sauradip and Zhu, Xiatian and Song, Yi-Zhe and Xiang, Tao},
  journal={arXiv preprint arXiv:2207.06580},
  year={2022}
}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
asserts		asserts
config		config
data		data
evaluation		evaluation
features		features
output		output
spot_lib		spot_lib
utils		utils
1003.mp4		1003.mp4
1003.pdf		1003.pdf
README.md		README.md
SPOT_poster.pdf		SPOT_poster.pdf
eval.py		eval.py
requirements.txt		requirements.txt
spot_best_score.json		spot_best_score.json
spot_inference.py		spot_inference.py
spot_model.py		spot_model.py
spot_output_semi.csv		spot_output_semi.csv
spot_train.py		spot_train.py
spot_train_eval.sh		spot_train_eval.sh

sauradip/SPOT

Folders and files

Latest commit

History

Repository files navigation