ORION Framework for Open-World Interactive Personalized Robot Navigation

Paper: Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation

We propsoe a general framework for Open-woRld Interactive persOnalized Navigation (ORION) that uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules, so the robot can search, detect and navigate in the environment and talk with the user in natural language.

Installation

Git clone the project and cd to the project main directory.

Install habitat-sim. We use 0.2.2 version.

conda create -n <name> python=3.7 cmake=3.14.0
conda activate <name>
conda install habitat-sim==0.2.2 withbullet headless -c conda-forge -c aihabitat

Install pytorch. We use 1.13.1+cu117 version. But you can change the version according to your own cuda version.

conda install install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Install submodules

git submodule update --init --recursive

3.1. habitat-lab. We use 0.2.2 version.

cd third_party/habitat-lab/
pip install -r requirements.txt
python setup.py develop --all

3.2. Install dependency for LSeg, openclip, openai chatgpt api and other dependencies.

cd <project_main_dir>
pip install -r requirements.txt

3.3. Install GroundingSAM

cd third_party/Grounded-Segment-Anything
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.7/    # make sure this cuda version is same as pytorch cuda version

python -m pip install -e segment_anything
python -m pip install -e GroundingDINO

Finally, install the project itself

cd <project_main_dir>
python setup.py develop

Data

Build a data directory in the project main directory. The structure of the data folder should be as follows:

.
├── data
│   ├── datasets                  # soft link to habitat-lab/data/
│   │   └── objectnav_hm3d_v2
│   ├── experiments               # experment results save_dir
│   ├── pretrained_ckpts          # pretrained model checkpoints
│   │   ├── groundingdino_swint_ogc.pth
│   │   ├── lseg_demo_e200.ckpt
│   │   └── sam_vit_h_4b8939.pth
│   └── scene_datasets            # soft link to habitat-lab/data/scene_datasets
│       └── hm3d_v0.2
├── demos
├── orion
├── README.md
├── requirements.txt
├── scripts
├── setup.py
├── tests
└── third_party

Habitat Data

We use (Habitat HM3D dataset)[https://github.com/matterport/habitat-matterport-3dresearch]. Please apply through the website for permission. After downloading the HM3D v0.2, add soft link to the data/scene_datasets/hm3d_v0.2 following the data practice of habiat-lab. Downlaod objectnav_hm3d_v2 dataset and save it to data/datasets/objectnav_hm3d_v2.

Pre-trained Model Checkpoints

Download SAM ckpt sam_vit_h_4b8939.pth to data/pretrained_ckpts/sam_vit_h_4b8939.pth.
Download grounding-dino ckpt groundingdino_swint_ogc.pth to data/pretrained_ckpts/groundingdino_swint_ogc.pth.
Download Lseg ckpt to data/pretrained_ckpts/lseg_demo_e200.ckpt.

ChatGPT API key

We use OpenAI's ChatGPT for the chatbot, using either Azure or OpenAI's API. Please edit the orion/config/chatgpt_config.py with your own api keys.

Run

cd <project_main_dir>. All python scripts should be run in the project main directory.
python scripts/collect_scene_fbe.py --scene_id==<scene_id> to collect rgbd frames in habitat scenes using frotier-based exploration. You can set args to decide which scenes to collect. Optional: use scripts/create_video.py to create video from the collected frames.
python scripts/build_vlmap.py --scene_id=<scene_id> --feature_type=lseg to build vlmap for each scene. It takes 30 min for LSeg and 60min for ConceptFusion to build one scene (around 500 frames).
Now after set your openai api key in orion/config/chatgpt_config.py and prepare the data and pre-trained model ckpts, you can directly run python demos/play_interactive_terminal.py to talk with ORION in the terminal.
After build VLMap or ConceptFusionMap for the scenes, you can run complete experiments:

python scripts/user_agent_talk_orion.py run the ORION method
python scripts/user_agent_talk_cow.py run the CoW method
python scripts/user_agent_talk_vlmap.py run the VLMap method
python scripts/user_agent_talk_cf.py run the Conceptfusion method

Make sure to set suitable arguments and the chatgpt config for both the user simulator and the agent.

Demo

To test the chagpt api, you can try to run python demos/play_chatgpt_api.py to talk with the chatbot. Need to set your openai api key in orion/config/chatgpt_config.py.
To test the LSeg model, you can run python demos/play_lseg.py to see the LSeg model in action.
To test the GroundingSAM model, you can run python demos/play_groundingSAM.py to see the GroundingSAM model in action.
To test GradCAM, you can run python demos/play_gradcam.py to see the GradCAM in action.
To play with keyboard control, you can run python demos/play_habitat_teleop.py to control the agent with keyboard in the habitat-sim window, where "w/s/a/d/p" denotes forward/backward/left/right/finish.
To play with a Gradio GUI, you can run python demos/play_interactive_gradio.py with orion agent. You can interactive with the agent in natural language and see the agent's response and actions.

Trouble Shooting

If the model downloading is very slow, it might be the network issue of IPV6 connect. Try to disble the IPV6

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1

The LSeg model can only process certain size of image, try to resize the image to HxW=480x640.
If GroundingDINO is not installed successfully, you can try to build and install it manually

cd third_party/Grounded-Segment-Anything/GroundingDINO
python setup.py build
python setup.py install

Citation

@article{dai2023think,
  title={Think, act, and ask: Open-world interactive personalized robot navigation},
  author={Dai, Yinpei and Peng, Run and Li, Sikai and Chai, Joyce},
  journal={ICRA},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
demos		demos
orion		orion
scripts		scripts
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
framework.png		framework.png
requirements.txt		requirements.txt
setup.py		setup.py

License

sled-group/navchat

Folders and files

Latest commit

History

Repository files navigation

ORION Framework for Open-World Interactive Personalized Robot Navigation

Installation

Data

Habitat Data

Pre-trained Model Checkpoints

ChatGPT API key

Run

Demo

Trouble Shooting

Citation

About

Resources

License

Stars

Watchers

Forks

Languages