Improving Semantic Video Retrieval models by Training with a Relevance-aware Online Mining strategy

In this repo, we provide code and pretrained models for the paper Improving Semantic Video Retrieval models by Training with a Relevance-aware Online Mining strategy, which is under journal review. The code also covers the implementation of a preliminary version of this work, called "Learning video retrieval models with relevance-aware online mining", which was accepted for presentation at the 21st International Conference on Image Analysis and Processing (ICIAP).

Python environment

Requirements: python 3, allennlp 2.8.0, h5py 3.6.0, pandas 1.3.5, spacy 2.3.5, torch 1.7.0 (also tested with 1.8)

# clone the repository
cd ranp
export PYTHONPATH=$(pwd):${PYTHONPATH}

Data

Features:
- EPIC-Kitchens-100: TBN features
- MSR-VTT: ResNet-152 features and 3D-ResNeXt-101 features
Additional:
- pre-extracted annotations for EPIC-Kitchens-100 and MSR-VTT
- split folders for EPIC-Kitchens-100 and MSR-VTT
- GloVe checkpoints for EPIC-Kitchens-100 and MSR-VTT
- HowTo100M weights for EAO

Training

To launch a training, first select a configuration file (e.g. prepare_mlmatch_configs_EK100_TBN_thrPos_hardPos.py) and execute the following:

python t2vretrieval/driver/configs/prepare_mlmatch_configs_EK100_TBN_thrPos_hardPos.py .

This will return a folder name (where config, models, logs, etc will be saved). Let that folder be $resdir. Then, execute the following to start a training:

python t2vretrieval/driver/multilevel_match.py $resdir/model.json $resdir/path.json --is_train --load_video_first --resume_file glove_checkpoint_path

Replace multilevel_match.py with eao_match.py to use Everything-at-once (txt-vid version) in place of HGR.

Evaluating

To automatically check for the best checkpoint (after a training run):

python t2vretrieval/driver/multilevel_match.py $resdir/model.json $resdir/path.json --eval_set tst

To resume one of the checkpoints provided:

python t2vretrieval/driver/multilevel_match.py $resdir/model.json $resdir/path.json --eval_set tst --resume_file checkpoint.th

Pretrained models

On EPIC-Kitchens-100:

HGR: (35.9 nDCG, 39.5 mAP)
HGR with Triplet-RANP: thr=0.15 (58.8 nDCG, 47.2 mAP)
EAO: (34.5 nDCG, 35.0 mAP)
EAO with Triplet-RANP: thr=0.10 (59.5 nDCG, 45.1 mAP)

On MSR-VTT:

HGR: (26.7 nDCG)
HGR with Triplet-RANP: thr=0.10 (35.4 nDCG)
EAO: (24.8 nDCG)
EAO with Triplet-RANP: thr=0.10 (34.4 nDCG)
EAO with Triplet-RANP (+HowTo100M PT): thr=0.10 (35.6 nDCG)

Acknowledgements

We thank the authors of Chen et al. (CVPR, 2020) (github), Wray et al. (ICCV, 2019) (github), Wray et al. (CVPR, 2021) (github), Shvetsova et al. (CVPR, 2022) (github) for the release of their codebases.

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{falcon2022learning,
  title={Learning video retrieval models with relevance-aware online mining},
  author={Falcon, Alex and Serra, Giuseppe and Lanz, Oswald},
  booktitle={International Conference on Image Analysis and Processing},
  pages={182--194},
  year={2022},
  organization={Springer}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
framework		framework
t2vretrieval		t2vretrieval
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

framework

framework

t2vretrieval

t2vretrieval

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Improving Semantic Video Retrieval models by Training with a Relevance-aware Online Mining strategy

Python environment

Data

Training

Evaluating

Pretrained models

Acknowledgements

Citations

License

About

Releases

Packages

Languages

License

aranciokov/ranp

Folders and files

Latest commit

History

Repository files navigation

Improving Semantic Video Retrieval models by Training with a Relevance-aware Online Mining strategy

Python environment

Data

Training

Evaluating

Pretrained models

Acknowledgements

Citations

License

About

Resources

License

Stars

Watchers

Forks

Languages