Skip to content

UARK-AICV/OpenFusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenFusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Kashu Yamazaki · Taisei Hanyu · Khoa Vo · Thang Pham · Minh Tran
Gianfranco Doretto · Anh Nguyen · Ngan Le

TL;DR: Open-Fusion builds an open-vocabulary 3D queryable scene from a sequence of posed RGB-D images in real-time.

Getting Started 🏁

System Requirements

  • Ubuntu 20.04
  • 10GB+ VRAM (~ 5 GB for SEEM and 2.5 GB ~ for TSDF) - for a large scene, it may require more memory
  • Azure Kinect, Intel T265 (for real-world data)

Environment Setup

Please build a Docker image from the Dockerfile. Do not forget to export the following environment variables (REGISTRY_NAME and IMAGE_NAME) as we use them in the tools/*.sh scripts:

export REGISTRY_NAME=<your-registry-name>
export IMAGE_NAME=<your-image-name>
docker build -t $REGISTRY_NAME/$IMAGE_NAME -f docker/Dockerfile .

Data Preparation

ICL and Replica

You can run the following script to download the ICL and Replica datasets:

bash tools/download.sh --data icl replica

This script will create a folder ./sample and download the datasets into the folder.

ScanNet

For ScanNet, please follow the instructions in ScanNet. Once you have the dataset downloaded, you can run the following script to prepare the data (example for scene scene0001_00):

python tools/prepare_scene.py --filename scene0001_00.sens --output_path sample/scannet/scene0001_00

Model Preparation

Please download the pretrained weight for SEEM from here and put it in as openfusion/zoo/xdecoder_seem/checkpoints/seem_focall_v1.pt.

Run OpenFusion

You can run OpenFusion using tools/run.sh as follows:

bash tools/run.sh --data $DATASET --scene $SCENE

Options:

  • --data: dataset to use (e.g., icl)
  • --scene: scene to use (e.g., kt0)
  • --frames: number of frames to use (default: -1)
  • --live: run with live monitor (default: False)
  • --stream: run with data stream from camera server (default: False)

If you want to run OpenFusion with camera stream, please run the following command first on the machine with Azure Kinect and Intel T265 connected:

python deploy/server.py

Please refer to this for more details.

Acknowledgement 🙇

  • SEEM: VLFM we used to extract region based features
  • Open3D: GPU accelerated 3D library for the base TSDF implementation

Citation 🙏

If you find this work helpful, please consider citing our work as:

@article{kashu2023openfusion,
    title={Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation},
    author={Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le},
    journal={arXiv preprint arXiv:2310.03923},
    year={2023}
}

Contact 📧

Please create an issue on this repository for questions, comments and reporting bugs. Send an email to Kashu Yamazaki for other inquiries.

About

[ICRA 2024 Oral] Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages