X-Trans2Cap

[CVPR2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning [Arxiv Paper]

Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, Zhen Li*

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yuan_2022_CVPR,
    author    = {Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen},
    title     = {X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {8563-8573}
}

Prerequisites

Python 3.6.9 (e.g., conda create -n xtrans_env python=3.6.9)
Pytorch 1.7.1 (e.g., conda install pytorch==1.7.1 cudatoolkit=11.0 -c pytorch)
Install other common packages (numpy, transformers, etc.)

Installation

Clone the repository

git clone https://github.com/CurryYuan/X-Trans2Cap.git

To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.
```
cd lib/pointnet2
python setup.py install
```

Data

ScanRefer

If you would like to access to the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.

Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.

Download the dataset by simply executing the wget command:

wget <download_link>

Run this commoand to organize the ScanRefer data:

python scripts/organize_data.py

Processed 2D Features

You can download the processed 2D Image features from OneDrive. The feature extraction code is borrowed from bottom-up-attention.pytorch.

Change the data path in lib/config.py.

Training

Run this command to train the model:

python scripts/train.py --config config/xtrans_scanrefer.yaml

Run CIDEr optimization:

python scripts/train.py --config config/xtrans_scanrefer_rl.yaml

Our code also support training on Nr3D/Sr3D dataset. Please organize data as ScanRefer, and change the argument dataset in config file.

Evaluation

python scripts/eval.py --config config/xtrans_scanrefer.yaml --use_pretrained xtrans_scanrefer_rl --force

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
data/scannet		data/scannet
figures		figures
in_out		in_out
lib		lib
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

data/scannet

data/scannet

figures

figures

in_out

in_out

lib

lib

models

models

scripts

scripts

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

X-Trans2Cap

Citation

Prerequisites

Installation

Data

ScanRefer

Processed 2D Features

Training

Evaluation

About

Releases

Packages

Contributors 2

Languages

License

CurryYuan/X-Trans2Cap

Folders and files

Latest commit

History

Repository files navigation

X-Trans2Cap

Citation

Prerequisites

Installation

Data

ScanRefer

Processed 2D Features

Training

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages