Skip to content

OpenGVLab/EgoExoLearn

Repository files navigation

EgoExoLearn

This repository contains the data and benchmark code of the following paper:

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Presented by OpenGVLab in Shanghai AI Lab

Paper Project Page HuggingFace Video

🔥 News

📣 Overview

overall_structure We propose EgoExoLearn, a dataset that emulates the human demonstration following process, in which individuals record egocentric videos as they execute tasks guided by exocentric-view demonstration videos. Focusing on the potential applications in daily assistance and professional support, EgoExoLearn contains egocentric and demonstration video data spanning 120 hours captured in daily life scenarios and specialized laboratories. Along with the videos we record high-quality gaze data and provide detailed multimodal annotations, formulating a playground for modeling the human ability to bridge asynchronous procedural actions from different viewpoints.

🎓 Benchmarks

benchmarks

Please visit each subfolder for code and annotations. More updates coming soon.

We design benchmarks of 1) cross-view association, 2) cross-view action understanding (action segmentation, action ancitipation, action planning), 3) cross-view referenced skill assessment, and 4) cross-view referenced video captioning. Each benchmark is meticulously defined, annotated, and supported by baseline implementations. In addition, we pioneeringly explore the role of gaze in these tasks. We hope our dataset can provide resources for future work for bridging asynchronous procedural actions in ego- and exo-centric perspectives, thereby inspiring the design of AI agents adept at learning from real-world human demonstrations and mapping the procedural actions into robot-centric views.

📑 Data access

Option 1: Google Drive links

Videos (320p, mp4)

Gaze (processed, npy)

CLIP features 5fps

I3D RGB features

CLIP features of gaze cropped videos

I3D RGB features of gaze cropped videos

Option 2: BaiduYun link

EgoExoLearn

Code: tm1g

✒️ Citation

If you find our repo useful for your research, please consider citing our paper:

 @InProceedings{huang2024egoexolearn,
     title={EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World},
     author={Huang, Yifei and Chen, Guo and Xu, Jilan and Zhang, Mingfang and Yang, Lijin and Pei, Baoqi and Zhang, Hongjie and Lu, Dong and Wang, Yali and Wang, Limin and Qiao, Yu},
     booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
     year={2024}
 }

♥️ Acknowledgement

Led by Shanghai AI Laboratory, Nanjing University and Shenzhen Institute of Advanced Technology, this project is jointly accomplished by talented researchers from multiple institutes including The University of Tokyo, Fudan University, Zhejiang University, and University of Science and Technology of China.

📬 Primary contact: Yifei Huang ( hyf at iis.u-tokyo.ac.jp )

About

Data and benchmark code for the EgoExoLearn dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published