Skip to content

dairui01/MS-TCT

Repository files navigation

[CVPR 2022] MS-TCT

[Paper Link]

In this repository, we provide an implementation of "MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection" on Charades dataset (Localization setting, i.e., Charades_v1_localize). If you want to train and evaluate MS-TCT, you can follow the following steps. For MultiTHUMOS, you can follow the training process here.

Prepare the I3D feature

Like the previous works (e.g. TGM, PDAN), MS-TCT is built on top of the pre-trained I3D features. Thus, feature extraction is needed before training the network.

  1. Please download the Charades dataset (24 fps version) from this link.
  2. Follow this repository to extract the snippet-level I3D feature.

Dependencies

Please satisfy the following dependencies to train MS-TCT correctly:

  • pytorch 1.9
  • python 3.8
  • timm 0.4.12
  • pickle5
  • scikit-learn
  • numpy

Quick Start

  1. Change the rgb_root to the extracted feature path in the train.py.
  2. Use ./run_MSTCT_Charades.sh for training on Charades-RGB. The best logits will be saved automatically in ./save_logit.
  3. Use python Evaluation.py -pkl_path /best_logit_path/ to evaluate the model with the per-frame mAP and the action-conditional metrics.

Remarks

  • The network implementation is in ./MSTCT/ folder.
  • RGB and Optical flow are following the same training process. Both modalities can be added in the logit-level to have the two-stream performance (i.e., late fusion). Note that, we mainly focus on the pure RGB result in the paper.
  • In practice, we trained MS-TCT with a Tesla V100 GPU to shrink the computation time. But as MS-TCT is not large, GTX 1080 Ti can be sufficient for running the network.
  • For the evaluation metrics: the standard frame-mAP is following the Superevent and action-conditional metrics is following the MLAD.

Reference

If you find our repo or paper useful, please cite us as

  @inproceedings{dai2022mstct,
    title={{MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection}},
    author={Dai, Rui and Das, Srijan and Kahatapitiya, Kumara and Ryoo, Michael and Bremond, Francois},
    booktitle={CVPR},
    year={2022}
  }

Contact: rui.dai@inria.fr

About

[CVPR2022] MS-TCT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published