Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

PyTorch Implementation of paper:

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim Jin-Hwa Kim, Byoung-Tak Zhang

Updates

[Dec, 2023] Our codes has been released.
[Nov, 2023] Our preprint has been updated on arxiv.

Requirements

To install requirements:

Please refer to here.

To install dependencies:

We recommend creating conda environment and installing all the dependencies as follows:

# create conda env
conda create --name bm_detr python=3.9
# activate env
conda actiavte bm_detr
# install pytorch
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
# install other python packages
pip install tqdm ipython easydict tensorboard tabulate scikit-learn pandas

Released Models and Features

We provide extracted video/text features and released BM-DETR models.

These pre-trained models will report the performances as reported in the table below.

Dataset	Feat	Split	R@1, IoU=0.5	R@1, IoU=0.7	Checkpoint
Charades-STA (C3D)	[Link]	test	55.78	34.27	[Link]
Charades-STA (VGG)	[Link]	test	58.01	36.91	[Link]
Charades-STA (SF+C)	[Link]	test	59.57	39.17	[Link]
QVHighlights (SF+C)	[Link]	val	61.94	47.16	[Link]
Charades-CD (I3D)	[Link]	test-ood	53.60	30.18	[Link]

If you have downloaded the features and checkpoints, please check the opt.json and put them in the correct path.

The reproduced performance may differ slightly from the performance reported in the paper.

More pre-trained model coming soon, so stay tuned!

Training

Training can be launched by running the following command:

bash bm_detr/scripts/train_{dset_name}.sh

For Charades-STA, you can choose v_feat_type which can be one of slowfast_clip, c3d, or vgg.

bash bm_detr/scripts/train_charades.sh {v_feat_type}

For more model options, please check our config file bm_detr/config.py.

The checkpoints and other experiment log files will be written into results/results_{dset_name}.

Inference

You can use the following command for inference to check the performance of trained model:

bash bm_detr/scripts/inference_{dset_name}.sh CHECKPOINT_FILE_PATH SPLIT_NAME

where CHECKPOINT_FILE_PATH is the path to the saved checkpoint, SPLIT_NAME is the split name for inference.

For QVHighlights, we automatically generate the predictions for val and test_public splits and save the results to submission:

bash bm_detr/scripts/inference_qv.sh RESULT_PATH

where RESULT_PATH is the path to the result_dir in our config file bm_detr/config.py.

Please check the inference_{dset_name}.sh for setting right evaluation path and split name.

Acknowledgement

We used resources from MDETR and DAB-DETR. We thank for the authors for making their projects open-sources.

Citation

If you find our project useful in your work, please consider citing our paper.

@article{jung2023overcoming,
  title={Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval},
  author={Jung, Minjoon and Jang, Youwon and Choi, Seongho and Kim, Joochan and Kim, Jin-Hwa and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:2306.02728},
  year={2023}
}

Contact

This project is maintained by Minjoon Jung. If you have any questions, please feel free to contact via mjjung@bi.snu.ac.kr.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bm_detr		bm_detr
data		data
features		features
res		res
standalone_eval		standalone_eval
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bm_detr

bm_detr

data

data

features

features

res

res

standalone_eval

standalone_eval

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Updates

Requirements

Released Models and Features

Training

Inference

Acknowledgement

Citation

Contact

About

Releases

Packages

Contributors 3

Languages

License

minjoong507/BM-DETR

Folders and files

Latest commit

History

Repository files navigation

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

Updates

Requirements

Released Models and Features

Training

Inference

Acknowledgement

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages