Skip to content

Latest commit

 

History

History
62 lines (45 loc) · 6.54 KB

MODEL_ZOO.md

File metadata and controls

62 lines (45 loc) · 6.54 KB

PySlowFast Model Zoo and Baselines

Kinetics

We provided original pretrained models from Caffe2 on heavy models (testing Caffe2 pretrained model in PyTorch might have a small different in performance):

architecture depth pretrain frame length x sample rate top1 top5 model config
C2D R50 Train From Scratch 8 x 8 67.2 87.8 link Kinetics/c2/C2D_NOPOOL_8x8_R50
I3D R50 Train From Scratch 8 x 8 73.5 90.8 link Kinetics/c2/I3D_8x8_R50
I3D NLN R50 Train From Scratch 8 x 8 74.0 91.1 link Kinetics/c2/I3D_NLN_8x8_R50
Slow R50 Train From Scratch 4 x 16 72.7 90.3 link Kinetics/c2/SLOW_4x16_R50
Slow R50 Train From Scratch 8 x 8 74.8 91.6 link Kinetics/c2/SLOW_8x8_R50
SlowFast R50 Train From Scratch 4 x 16 75.6 92.0 link Kinetics/c2/SLOWFAST_4x16_R50
SlowFast R50 Train From Scratch 8 x 8 77.0 92.6 link Kinetics/c2/SLOWFAST_8x8_R50
SlowFast R101 Train From Scratch 8 x 8 78.0 93.3 link Kinetics/c2/SLOWFAST_8x8_R101_101_101
SlowFast R101 Train From Scratch 16 x 8 78.9 93.5 link Kinetics/c2/SLOWFAST_16x8_R101_50_50

AVA

architecture depth Pretrain Model frame length x sample rate MAP AVA version model
Slow R50 Kinetics 400 4 x 16 19.5 2.2 link
SlowFast R101 Kinetics 600 8 x 8 28.2 2.1 link
SlowFast R101 Kinetics 600 8 x 8 29.1 2.2 link
SlowFast R101 Kinetics 600 16 x 8 29.4 2.2 link

Multigrid Training

Update June, 2020: In the following we provide (reimplemented) models from "A Multigrid Method for Efficiently Training Video Models " paper. The multigrid method trains about 3-6x faster than the original training on multiple datasets. See projects/multigrid for more information. The following provides models, results, and example config files.

Kinetics:

architecture depth pretrain frame length x sample rate training top1 top5 model config
SlowFast R50 Train From Scratch 8 x 8 Standard 76.8 92.7 link Kinetics/SLOWFAST_8x8_R50_stepwise
SlowFast R50 Train From Scratch 8 x 8 Multigrid 76.6 92.7 link Kinetics/SLOWFAST_8x8_R50_stepwise_multigrid

(Here we use stepwise learning rate schedule.)

Something-Something V2:

architecture depth pretrain frame length x sample rate training top1 top5 model config
SlowFast R50 Kinetics 400 16 x 8 Standard 63.0 88.5 link SSv2/SLOWFAST_16x8_R50
SlowFast R50 Kinetics 400 16 x 8 Multigrid 63.5 88.7 link SSv2/SLOWFAST_16x8_R50_multigrid

Charades

architecture depth pretrain frame length x sample rate training mAP model config
SlowFast R50 Kinetics 400 16 x 8 Standard 38.9 link SSv2/SLOWFAST_16x8_R50
SlowFast R50 Kinetics 400 16 x 8 Multigrid 38.6 link SSv2/SLOWFAST_16x8_R50_multigrid

ImageNet

We also release the imagenet pretrain model if finetune from ImageNet pretrain is preferred. The reported accuracy is obtrained by center crop testing on validation set.

architecture depth Top1 Top5 model
ResNet R50 23.6 6.8 link