Skip to content

BA-Transform/BAT-Video-Classification

Repository files navigation

Bilinear Attentional Transforms (BAT) for Video Classification

This is the official code of Non-Local Neural Networks With Grouped Bilinear Attentional Transforms for video classification on Kinetics.

Pretrained models

Here we provide some of the pretrained models.

Method Backbone Input Frames Top-1 Acc Link
C2D ResNet-50 8 72.0% GoogleDrive / BaiduYun(Access Code: r0i2)
I3D ResNet-50 8 72.7% GoogleDrive / BaiduYun(Access Code: dnwv)
C2D + 2D-BAT ResNet-50 8 74.6% GoogleDrive / BaiduYun(Access Code: inb0)
I3D + 2D-BAT ResNet-50 8 75.1% GoogleDrive / BaiduYun(Access Code: q8d8)
C2D + 3D-BAT ResNet-50 8 75.5% GoogleDrive / BaiduYun(Access Code: rnrg)

Quick starts

Requirements

  • Install Lintel
  • pip install -r requirements.txt

Data preparation

  1. Download Kinetics-400 via the official scripts.
  2. Generate the training / validation list file. A list file looks like
video_path frame_num label
video_path frame_num label
...

Training

To train a model, run main.py with the desired model architecture and other super-paremeters:

python main.py \
    /PATH/TO/TRAIN_LIST \
    /PATH/TO/VAL_LIST \
    --read_mode video \
    --resume /PATH/TO/IMAGENET_PRETRAINED/MODEL --soft_resume \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --num_segments 1 --seq_length 8 --sample_rate 8 \
    --lr 0.01 --lr_steps 40 80 --epochs 100 \
    --eval-freq 5 --save-freq 5 -b 64 -j 48 --dropout 0.5

More training scripts can be found in scripts. The ImageNet pretrained models can be downloaded from GoogleDrive / BaiduYun(Acess Code: 1r48).

Testing

Fully-convolution inference (recommended):

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 10 --test_crops 3 --seq_length 8 --sample_rate 8 \
    -j 16

10 crops and 25 segments:

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 25 --seq_length 8 --sample_rate 8 \
    -j 16

Other applications of BAT

Citation

If you find this work or code is helpful in your research, please cite:

@InProceedings{Chi_2020_CVPR,
  author = {Chi, Lu and Yuan, Zehuan and Mu, Yadong and Wang, Changhu},
  title = {Non-Local Neural Networks With Grouped Bilinear Attentional Transforms},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

About

This is an official implementation of video classification for our CVPR 2020 paper "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published