MODEL ZOO

MSCOCO dataset

Model	Backbone	Detector	Input Size	AP	Speed	Download	Config	Training Log
Simple Baseline	ResNet50	YOLOv3	256x192	70.6	2.94 iter/s	model	cfg	log
Fast Pose	ResNet50	YOLOv3	256x192	72.0	3.54 iter/s	model	cfg	log
Fast Pose (DUC)	ResNet50 - unshuffle	YOLOv3	256x192	72.4	2.91 iter/s	model	cfg	log
HRNet	HRNet-W32	YOLOv3	256x192	72.5	2.13 iter/s	model	cfg	log
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	72.8	2.94 iter/s	model	cfg	log
Fast Pose (DUC)	ResNet152	YOLOv3	256x192	73.3	1.62 iter/s	model	cfg	log

Notes

All models are trained on keypoint train 2017 images which contains at least one human with keypoint annotations (64115 images).
The evaluation is done on COCO keypoint val 2017 (5000 images).
Flip test is used by default.
One TITAN XP is used for speed test, with batch_size=64 in each iteration.
Offline human detection results are used in speed test.
FastPose is our own network design. Paper coming soon!

Halpe dataset (26 keypoints)

Model	Backbone	Detector	Input Size	AP	Speed	Download	Config
Fast Pose	ResNet50	YOLOv3	256x192	-	13.12 iter/s	Google Baidu	cfg

For example, you can run with:

python scripts/demo_inference.py --cfg configs/halpe_26/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/halpe26_fast_res50_256x192.pth --indir examples/demo/ --save_img

Notes

This model is trained based on the first 26 keypoints of Halpe Full-body datatset (without face and hand keypoints).
The speed is tested on COCO val2017 on a single NVIDIA GeForce RTX 3090 gpu, with batch_size=64 in each iteration and offline yolov3 human detection results.

Multi Domain Models (Strongly Recommended)

Model	Backbone	Detector	Input Size	Loss Type	AP	Speed	Download	Config	#keypoints
Fast Pose	ResNet50	YOLOv3	256x192	Symmetric Integral	50.1	16.28 iter/s	Google Baidu(code: d0wi)	cfg	136
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Combined (10 hand weight)	49.8	10.35 iter/s	Google Baidu(code: app1)	cfg	136
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Combined	-	13.88 iter/s	Google Baidu(code: 6kwr)	cfg	68 (no face)
Fast Pose (DCN)	ResNet50 - dcn	-	256x192	Symmetric Integral	-	30.20 iter/s	Google Baidu(code: nwxx)	cfg	21 (single hand)

For the most accurate wholebody pose estimation, you can run with:

python scripts/demo_inference.py --cfg configs/halpe_coco_wholebody_136/resnet/256x192_res50_lr1e-3_2x-dcn-combined.yaml --checkpoint pretrained_models/multi_domain_fast50_dcn_combined_256x192.pth --indir examples/demo/ --save_img

or, you can run with (this version is a little faster and more accurate on body keypoints, but its performance on hand keypoints is worser):

python scripts/demo_inference.py --cfg configs/halpe_coco_wholebody_136/resnet/256x192_res50_lr1e-3_2x-regression.yaml --checkpoint pretrained_models/multi_domain_fast50_regression_256x192.pth --indir examples/demo/ --save_img

Notes

Above models are trained on multiple datasets, thus they can perform well for in-the-wild images.
'Combined (10 hand weight)' means that we use different loss for hand and body keypoints

Halpe dataset (136 keypoints)

Model	Backbone	Detector	Input Size	Loss Type	AP	Speed	Download	Config
Fast Pose	ResNet50	YOLOv3	256x192	Heatmap	41.7	4.37 iter/s	Google Baidu(code: y8a0)	cfg
Fast Pose	ResNet50	YOLOv3	256x192	Symmetric Integral	44.1	16.50 iter/s	Google Baidu(code: 9e4z)	cfg
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Symmetric Integral	46.2	16.58 iter/s	Google Baidu(code: 0yyf)	cfg
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Combined	45.4	10.07 iter/s	Google Baidu(code: hln3)	cfg
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Combined (10 hand weight)	47.2	10.07 iter/s	Google Baidu(code: jkyc)	cfg
Fast Pose (DUC)	ResNet152	YOLOv3	256x192	Symmetric Integral	45.1	16.17 iter/s	Google Baidu(code: gaxj)	cfg

For example, you can run with:

python scripts/demo_inference.py --cfg configs/halpe_136/resnet/256x192_res50_lr1e-3_2x-regression.yaml --checkpoint pretrained_models/halpe136_fast50_regression_256x192.pth --indir examples/demo/ --save_img

Notes

All of above models are trained only on Halpe Full-body dataset.
The APs are tested under Halpe's criterion, with flip test on.
Combined loss means we use heatmap loss (mse loss) on body and foot keypoints and use symmetric integral loss (l1 joint regression loss) on face and hand keypoints.
There are two FastPose-DCN models with combined loss. The second one uses ten times of weight of hand keypoints, so it is more accurate on hand keypoints but less accurate on the other keypoints.
The speed is tested on COCO val2017 on a single NVIDIA GeForce RTX 3090 gpu, with batch_size=64 in each iteration and offline yolov3 human detection results.

COCO WholeBody dataset (133 keypoints)

Model	Backbone	Detector	Input Size	Loss Type	AP	Speed	Download	Config
Fast Pose	ResNet50	YOLOv3	256x192	Symmetric Integral	55.4	17.42 iter/s	Google Baidu(code: nw03)	cfg
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Symmetric Integral	57.7	16.70 iter/s	Google Baidu(code: dq9k)	cfg
Fast Pose	ResNet50	YOLOv3	256x192	Combined	57.8	10.28 iter/s	Google Baidu(code: 7a56)	cfg
Fast Pose (DCN)	ResNet50 - dcn	YOLOv3	256x192	Combined	58.2	10.22 iter/s	Google Baidu(code: 99ee)	cfg
Fast Pose (DUC)	ResNet152	YOLOv3	256x192	Symmetric Integral	56.9	15.72 iter/s	Google Baidu(code: jw3u)	cfg

Notes

All of above models are trained only on COCO WholeBody dataset.
The APs are tested under COCO WholeBody's criterion, with flip test on.
The speed is tested on COCO val2017 on a single NVIDIA GeForce RTX 3090 gpu, with batch_size=64 in each iteration and offline yolov3 human detection results.

Notes

These models are strongly recommended because they are more accurate and flexible.
These models are trained with multi-domain knowledge distillation (MDKD, see our paper for more details).
The APs are tested under Halpe's criterion, with flip test on.
If you want to use the single hand model, you should give the rough bounding box of a single hand instead of that of a whole person.
The speed is tested on COCO val2017 on a single NVIDIA GeForce RTX 3090 gpu, with batch_size=64 in each iteration and offline yolov3 human detection results.

3D Human Pose & Shape Estimation

Model	Backbone	Input Size	PA-MPJPE (3DPW)	PA-MPJPE (Human3.6M)	Download	Config
HybrIK	ResNet34	256x256	45.3	36.3	model	cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

MODEL ZOO

MSCOCO dataset

Notes

Halpe dataset (26 keypoints)

Notes

Multi Domain Models (Strongly Recommended)

Notes

Halpe dataset (136 keypoints)

Notes

COCO WholeBody dataset (133 keypoints)

Notes

Notes

3D Human Pose & Shape Estimation

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

MODEL ZOO

MSCOCO dataset

Notes

Halpe dataset (26 keypoints)

Notes

Multi Domain Models (Strongly Recommended)

Notes

Halpe dataset (136 keypoints)

Notes

COCO WholeBody dataset (133 keypoints)

Notes

Notes

3D Human Pose & Shape Estimation