Skip to content

[arXiv 23] Pytorch code for "Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval"

License

Notifications You must be signed in to change notification settings

minjoong507/BM-DETR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

PWC PWC

model

PyTorch Implementation of paper:

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim Jin-Hwa Kim, Byoung-Tak Zhang

Updates

  • [Dec, 2023] Our codes has been released.
  • [Nov, 2023] Our preprint has been updated on arxiv.

Requirements

To install requirements:

Please refer to here.

To install dependencies:

We recommend creating conda environment and installing all the dependencies as follows:

# create conda env
conda create --name bm_detr python=3.9
# activate env
conda actiavte bm_detr
# install pytorch
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
# install other python packages
pip install tqdm ipython easydict tensorboard tabulate scikit-learn pandas

Released Models and Features

We provide extracted video/text features and released BM-DETR models.

These pre-trained models will report the performances as reported in the table below.

Dataset Feat Split R@1, IoU=0.5 R@1, IoU=0.7 Checkpoint
Charades-STA (C3D) [Link] test 55.78 34.27 [Link]
Charades-STA (VGG) [Link] test 58.01 36.91 [Link]
Charades-STA (SF+C) [Link] test 59.57 39.17 [Link]
QVHighlights (SF+C) [Link] val 61.94 47.16 [Link]
Charades-CD (I3D) [Link] test-ood 53.60 30.18 [Link]

If you have downloaded the features and checkpoints, please check the opt.json and put them in the correct path.

The reproduced performance may differ slightly from the performance reported in the paper.

More pre-trained model coming soon, so stay tuned!

Training

Training can be launched by running the following command:

bash bm_detr/scripts/train_{dset_name}.sh

For Charades-STA, you can choose v_feat_type which can be one of slowfast_clip, c3d, or vgg.

bash bm_detr/scripts/train_charades.sh {v_feat_type}

For more model options, please check our config file bm_detr/config.py.

The checkpoints and other experiment log files will be written into results/results_{dset_name}.

Inference

You can use the following command for inference to check the performance of trained model:

bash bm_detr/scripts/inference_{dset_name}.sh CHECKPOINT_FILE_PATH SPLIT_NAME  

where CHECKPOINT_FILE_PATH is the path to the saved checkpoint, SPLIT_NAME is the split name for inference.

For QVHighlights, we automatically generate the predictions for val and test_public splits and save the results to submission:

bash bm_detr/scripts/inference_qv.sh RESULT_PATH  

where RESULT_PATH is the path to the result_dir in our config file bm_detr/config.py.

Please check the inference_{dset_name}.sh for setting right evaluation path and split name.

Acknowledgement

We used resources from MDETR and DAB-DETR. We thank for the authors for making their projects open-sources.

Citation

If you find our project useful in your work, please consider citing our paper.

@article{jung2023overcoming,
  title={Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval},
  author={Jung, Minjoon and Jang, Youwon and Choi, Seongho and Kim, Joochan and Kim, Jin-Hwa and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:2306.02728},
  year={2023}
}

Contact

This project is maintained by Minjoon Jung. If you have any questions, please feel free to contact via mjjung@bi.snu.ac.kr.

About

[arXiv 23] Pytorch code for "Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published