Fast-3D-Human-Pose Estimation

Introduction

This is a pytorch implementation of method based on Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation applying on stereo images to reconstruct the human poses in 3D world. We also compare this with a naive approach reference to Simple Baselines for Human Pose Estimation and Tracking which consist of encoder decoder structure and predict 2d pose from both view. We evaluate their performance using the Mean Per Joint Position Error (MPJPE) metric in both 2D and 3D scenarios. Additionally, we employ data augmentation techniques, such as masking out a small block on the human in images, incorporating methods like Cutout and Hide-and-Seek, to enhance the accuracy of the models.

Contribution

We implement and extent the method Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation from scratch with slightly modification and apply it to the stereo reconstruction tasks.
We find that the random masking data augmentation strategies can more or less ease the self-occlusion and improve the MPJPE to some extent.
We experiment with different tricks (Different Loss Function, Gradient Clip) to further increase both the accuracy and stabilization for the training process.

Dataset

We pretrained our model using the MPII Dataset which includes around 25K images containing over 40K people with annotated body joints. Then we do fine-tuning on the stereo data from MADS Dataset which consists of martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton). Two martial art masters, two dancers and an athlete performed these actions while being recorded with either multiple cameras or a stereo depth camera.

Please download the data and arange it into this pattern:

Your_WorkingSpace/
├── ...
├── ...
├── data/
    └── MADS_depth
    └── MADS_multiview

And run the code to extract the training/validation data:

$ python extract_data.py

Train

CDRNET

Run the following cmd to train the CDRNET.

$ python train_cdr.py

Note: You need the backbone weight(See the "Weigths" section below) before training the CDRNET

Backbone

Run the following cmd to train your cutomized resnet backbone

$ python train.py

Inference

Run the following cmd after extracting data:

$ bash scripts/inference.sh

Weights

You can download the weights via the link.

You should maintain the weight under this structure to run the inference.

Your_WorkingSpace/
├── ...
├── ...
├── weights/
    └── mads_3d_256_101
          └── best.pth
    └── mpii_256_101
        └── latest.pth

Results

Best:

Baseline:

References

CDRNet
DiffDLT
learnable-triangulation-pytorch
R-YOLOv4

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
GIF		GIF
configs		configs
dataset		dataset
models		models
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md
baseline.py		baseline.py
display_data_2d.py		display_data_2d.py
display_data_3d.py		display_data_3d.py
extract_data.py		extract_data.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
train_cdr.py		train_cdr.py

eddie0509tw/Fast-3D-Human-Pose-Estimation

Folders and files

Latest commit

History

Repository files navigation

Fast-3D-Human-Pose Estimation

Introduction

Contribution

Dataset

Train

CDRNET

Backbone

Inference

Weights

Results

References

About

Topics

Resources

Stars

Watchers

Forks

Languages