TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
CVPR 2025
Hongxiang Zhao*
Xingchen Liu*
Mutian Xu
Yiming Hao
Weikai Chen
Xiaoguang Han§
CUHKSZ GAP-Lab
*Indicates Equal Contribution §Indicates Corresponding Author
📖 Project Page | 📄 Paper Link | 🎥 Dataset Form
We introduce TASTE-Rob: 1) a dataset with 100,856 task-oriented hand-object interaction videos, 2) a three-stage pose-refinement video generation pipeline. With the above contributions, TASTE-Rob is able to generate realistic interactions and support the possibility of transferring on robots.
If you find our work useful in your research, please consider citing:
@InProceedings{Zhao_2025_CVPR,
author = {Zhao, Hongxiang and Liu, Xingchen and Xu, Mutian and Hao, Yiming and Chen, Weikai and Han, Xiaoguang},
title = {TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {27683-27693}
}
- [6/7/2025] TASTE-Rob Dataset Download Tool has been released now!!!
- [5/3/2025] TASTE-Rob Dataset has been released now!!!
- [3/14/2025] TASTE-Rob has been released on Arxiv!!!
- [2/27/2025] 🎉🎉🎉TASTE-Rob has been accepted by CVPR 2025!!!🎉🎉🎉
- Paper Released.
- Dataset will be released before
05/05/2025
. - Source Code and Pretrained Weights.
TASTE-Rob contains 100,856 task-oriented ego-centric hand-object interaction videos crossing different environments. We provide an OneDrive link to download the full data. Please fill out this form, and we will send the download link and password to your e-mail soon.
We split the full data into SingleHand/DoubleHand and multiple environments, the total size is about 1.55 TB.
|-- TASTE_Rob
|-- SingleHand # stores videos with single-hand interaction captions
|-- Bathroom
|-- 50254.mp4
|-- 50255.mp4
|-- 50256.mp4
|-- ...
|-- Bedroom
|-- Dinning
|-- DressingTable
|-- Kitchen
|-- Office
|-- DoubleHand # stores videos with single-hand interaction captions
|-- Bathroom
|-- Dinning
|-- Kitchen
|-- Office
|-- captions.xlsx # stores captions
In captions.xlsx
, the sheet Single-Hand
stores single-hand interaction captions, and the sheet Double-Hand
stores double-hand interaction captions. In each sheet, there has three attributes: id
, scene
and caption
. You could search ids of desired videos in this excel file.
You could use download_tool_taste_rob.py
to download zip files or mp4 files, as follows:
python download_tool_taste_rob.py \
--file_list downlist.txt \
--url {our_given_url} \
--download_folder {local_path_of_downloaded_files} \
--force True
To download successfully, you need to modify our_given_url
, local_path_of_downloaded_files
and downlist.txt
, which stores desired file paths.
The data is released under the TASTE-Rob Terms of Use.
Copyright (c) 2025