Skip to content

This repository provides implementation of a baseline method and our proposed methods for efficient Skeleton-based Human Action Recognition.

License

Notifications You must be signed in to change notification settings

negarhdr/skeleton-based-action-recognition

Repository files navigation

Skeleton-based Human Action Recognition

This repository provides the implementation of the baseline method ST-GCN [1], its extension 2s-AGCN [2], and our proposed methods TA-GCN [3], PST-GCN[4], ST-BLN [5], and PST-BLN [6] for skeleton-based human action recognition. Our proposed methods are built on top of ST-GCN to make it more efficient in terms of number of model parameters and floating point operations.

The application of ST-BLN and PST-BLN methods are also evaluated on facial expression recognition methods for landmark-based facial expression recognition task in our paper [6], and the implementation can be found in FER_PSTBLN_MCD.

This implementation is modified based on the OpenMMLAB toolbox, and the 2s-AGCN repositories.

This project is funded by the OpenDR European project and the implementations are also integrated in OpenDR toolkit which will be publicly available soon.

Data Preparation

  • Download the raw data from NTU-RGB+D and Skeleton-Kinetics. Then put them under the data directory:

     -data\  
       -kinetics_raw\  
         -kinetics_train\
           ...
         -kinetics_val\
           ...
         -kinetics_train_label.json
         -keintics_val_label.json
       -nturgbd_raw\  
         -nturgb+d_skeletons\
           ...
         -samples_with_missing_skeletons.txt
    
  • Preprocess the data with

    python data_gen/ntu_gendata.py

    python data_gen/kinetics-gendata.py.

  • Generate the bone data with:

    python data_gen/gen_bone_data.py

Training & Testing

Modify config files based on your experimental setup and run the following scripts:

`python main.py --config ./config/nturgbd-cross-view/stgcn/train_joint_stgcn.yaml`

`python main.py --config ./config/nturgbd-cross-view/stgcn/train_bone_stgcn.yaml`

To ensemble the results of joints and bones, first run test to generate the scores of the softmax layer.

`python main.py --config ./config/nturgbd-cross-view/stgcn/test_joint_stgcn.yaml`

`python main.py --config ./config/nturgbd-cross-view/stgcn/test_bone_stgcn.yaml`

Then combine the generated scores with:

`python ensemble.py` --datasets ntu/xview

The shell scripts for training and testing each of the methods are also provided. For example, for training the ST-GCN method you need to run:

`sh run_stgcn.sh`

Demo

All the aforementioned methods are also integerated in the OpenDR toolkit along with a webcam demo code. In this demo, we use light-weight OpenPose [10], which is integerated in the toolkit as well, to extract skeletons from each input frame and then we feed a sequence of 300 skeletons to a pre-trained ST-GCN-based model in this toolkit.

demo.mp4

Citation

Please cite the following papers if you use any of the proposed methods implemented in this repository in your reseach.

TA-GCN
@inproceedings{heidari2021tagcn,
      title={Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition},
      author={Heidari, Negar and Iosifidis, Alexandros},
      booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
      pages={7907--7914},
      year={2021},
      organization={IEEE}
}
PST-GCN
@inproceedings{heidari2021pstgcn,
      title={Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition},
      author={Heidari, Negar and Iosifidis, Alexandras},
      booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      pages={3220--3224},
      year={2021},
      organization={IEEE}
    }
ST-BLN
@inproceedings{heidari2021stbln,
      title={On the spatial attention in spatio-temporal graph convolutional networks for skeleton-based human action recognition},
      author={Heidari, Negar and Iosifidis, Alexandros},
      booktitle={2021 International Joint Conference on Neural Networks (IJCNN)},
      pages={1--7},
      year={2021},
      organization={IEEE}
    }
PST-BLN
@article{heidari2021pstbln,
      title={Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout for Landmark-based Facial Expression Recognition with Uncertainty 	   Estimation},
      author={Heidari, Negar and Iosifidis, Alexandros},
      journal={arXiv preprint arXiv:2106.04332},
      year={2021}
    }

Acknowledgement

This work was supported by the European Union’s Horizon 2020 Research and Innovation Action Program under Grant 871449 (OpenDR).

Contact

For any questions, feel free to contact: negar.heidari@ece.au.dk

References

[1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).

[2] Shi, Lei, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

[3] Heidari, Negar, and Alexandros Iosifidis. "Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021.

[4] Heidari, Negar, and Alexandras Iosifidis. "Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

[5] Heidari, N., & Iosifidis, A. (2020). On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition. arXiv preprint arXiv: 2011.03833.

[6] Heidari, Negar, and Alexandros Iosifidis. "Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout for Landmark-based Facial Expression Recognition with Uncertainty Estimation." arXiv preprint arXiv:2106.04332 (2021).

[7] Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1010-1019).

[8] Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

[9] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).

[10] Osokin, Daniil (2017). Real-time 2d multi-person pose estimation on cpu: Lightweight openpose. arXiv preprint arXiv:1811.12004.

About

This repository provides implementation of a baseline method and our proposed methods for efficient Skeleton-based Human Action Recognition.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published