Skip to content

[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

License

Notifications You must be signed in to change notification settings

CurryYuan/X-Trans2Cap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

X-Trans2Cap

[CVPR2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning [Arxiv Paper]

Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, Zhen Li*

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yuan_2022_CVPR,
    author    = {Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen},
    title     = {X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {8563-8573}
}

Prerequisites

  • Python 3.6.9 (e.g., conda create -n xtrans_env python=3.6.9)
  • Pytorch 1.7.1 (e.g., conda install pytorch==1.7.1 cudatoolkit=11.0 -c pytorch)
  • Install other common packages (numpy, transformers, etc.)

Installation

  • Clone the repository

    git clone https://github.com/CurryYuan/X-Trans2Cap.git
    
  • To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd lib/pointnet2
    python setup.py install
    

Data

ScanRefer

If you would like to access to the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.

Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.

Download the dataset by simply executing the wget command:

wget <download_link>

Run this commoand to organize the ScanRefer data:

python scripts/organize_data.py

Processed 2D Features

You can download the processed 2D Image features from OneDrive. The feature extraction code is borrowed from bottom-up-attention.pytorch.

Change the data path in lib/config.py.

Training

Run this command to train the model:

python scripts/train.py --config config/xtrans_scanrefer.yaml

Run CIDEr optimization:

python scripts/train.py --config config/xtrans_scanrefer_rl.yaml

Our code also support training on Nr3D/Sr3D dataset. Please organize data as ScanRefer, and change the argument dataset in config file.

Evaluation

python scripts/eval.py --config config/xtrans_scanrefer.yaml --use_pretrained xtrans_scanrefer_rl --force

About

[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published