POV-Surgery

A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities

26th International Conference on Medical Image Computing and Computer Assisted Intervention; MICCAI 2023,(Oral)

This is the official code release for POV Surgery at MICCAI 2023. Check out the POVSurgery YouTube videos below for more details.

Video Description (with audio)	Overview Video

Components

Synthetic data generation pipeline

POV-Surgery dataset utilities

Ground truth reprojection and visualization scripts

Fine-tuning demo code

HandOccNet training and testing code

Dataset Usage

Please download the dataset POV_Surgery_data.zip at POV-Surgery, unzip it and put it in a desired location. Please remember that if you wish to download and utilize our dataset, compliance with the licensing conditions is mandatory. Our proposed dataset contains 53 egocentric RGB-D sequences with 88k frames and accurate 2D/3D hand-object pose annotations. Here's a teaser of our dataset:

RGB-D and Annotation	Dataset Overview

Project structure

Please register yourself at SMPL-X and MANO to use their dependencies. Please read and accept their liscenses to use SMPL-X and MANO models. There are different versions of manopth. We have included the implementation of mano in our repo already. Then please download the data.zip from POV-Surgery, unzip it and put in the POV_Surgery folder. We have prepared all the dependencies required and the final structure should look like this:

    POV_Surgery
    ├── data
    │    │
    │    ├── sim_room
    │          └── room_sim.obj
    │          └── room_sim.obj.mtl
    │          └── textured_output.jpg
    │    │
    │    └── bodymodel
    │          │
    │          └── smplx_to_smpl.pkl
    │          └── ...
    │          └── mano
    │                └── MANO_RIGHT.pkl
    │          └── body_models
    │                └── smpl
    │                └── smplx
    ├── grasp_generation
    ├── grasp_refinement
    ├── pose_fusion
    ├── pre_rendering
    ├── blender_rendering
    ├── HandOccNet_ft
    └── vis_data

Environment

We recommend create a python 3.8 environment with conda. Install pytorch and torchvision that suits you operation system. For example, if you are using cuda 11.8 version, you could use:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Then you should install pytorch3d that suits your python and cuda version. An example could be found here:

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html

Then install the dependencies to finish the environment set up following the requiremesnts.sh.

sh requirements.sh

You could refer to the colab demo for hint to set the environment.

Contact Information

If you have questions, feel free to contact:

Rui Wang: ruiwang46@ethz.ch

Acknowledgement

This work is part of a research project that has been financially supported by Accenture LLP. Siwei Zhang is funded by Microsoft Mixed Reality & AI Zurich Lab PhD scholarship. The authors would like to thank PD Dr. Michaela Kolbe for providing the simulation facilities.
The authors would like to thank David Botta, Dr. Kerrin Weiss, Isabelle Hofmann, Manuel Koch, Marc Wittwer for their participation in data capture and Dr. Julian Wolf, Tobias Stauffer and prof. Dr. Siyu Tang for the enlightening discussions.

License

Software Copyright License for non-commercial scientific research purposes. Please read carefully the terms and conditions and any accompanying documentation before you download and/or use the MANO model, data and software, (the "Model & Software"), including 3D meshes, blend weights, blend shapes, software, scripts, and animations. By downloading and/or using the Model & Software (including downloading, cloning, installing, and any other use of this github repository), you acknowledge that you have read these terms and conditions, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Model & Software. Any infringement of the terms of this agreement will automatically terminate your rights under this License.

Citation

Wang, R., Ktistakis, S., Zhang, S., Meboldt, M., Lohmeyer, Q. (2023). POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Springer, Cham. https://doi.org/10.1007/978-3-031-43996-4_42

BibTeX

@inproceedings{wang2023pov,
  title={POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities},
  author={Wang, Rui and Ktistakis, Sophokles and Zhang, Siwei and Meboldt, Mirko and Lohmeyer, Quentin},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={440--450},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
HandOccNet_ft		HandOccNet_ft
MANO		MANO
assets		assets
blender_rendering		blender_rendering
grasp_generation		grasp_generation
grasp_refinement		grasp_refinement
pose_fusion		pose_fusion
pre_rendering		pre_rendering
vis_data		vis_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.sh		requirements.sh

License

BatFaceWayne/POV_Surgery

Folders and files

Latest commit

History

Repository files navigation

POV-Surgery

A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities

Components

Synthetic data generation pipeline

POV-Surgery dataset utilities

Fine-tuning demo code

Dataset Usage

Project structure

Environment

Contact Information

Acknowledgement

License

Citation

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Languages