Accurate and Fast Compressed Video Captioning

✨This is the official implementation of ICCV 2023 paper Accurate and Fast Compressed Video Captioning.

Introduction

In this work, we propose an end-to-end video captioning method based on compressed domain information from the encoded H.264 videos. Our approach aims to accurately generate captions for compressed videos in a fast and efficient manner.

By releasing this code, we hope to facilitate further research and development in the field of compressed video processing. If you find this work useful in your own research, please consider citing our paper as a reference.

Preparation

1. Install the Requirements

To run the code, please install the dependency libraries by using the following command:

sudo apt update && sudo apt install default-jre -y  # required by pycocoevalcap
pip3 install -r requirements.txt

Additionally, you will need to install the compressed video reader as described in the README.md of AcherStyx/Compressed-Video-Reader.

2. Prepare the Pretrained Models

Our model is based on the pretrained CLIP. You can run the following script to download the weights before training to avoid any network issues:

sudo apt update && sudo apt install aria2 -y  # install aria2
bash model_zoo/download_model.sh

This will download the CLIP model to model_zoo/clip_model. Note that this directory is hard-coded in our code.

3. Prepare the Data

We have conducted experiments on three video caption datasets: MSRVTT, MSVD, and VATEX. The datasets are stored in the dataset folder under the project root. For detailed instructions on downloading and preparing the training data, please refer to dataset/README.md.

Training & Evaluation

The training is configured using YAML, and all the configurations are listed in configs/compressed_video. You can use the following commands to run the experiments:

# msrvtt
python3 mm_video/run_net.py --cfg configs/compressed_video/msrvtt_captioning.yaml
# msvd
python3 mm_video/run_net.py --cfg configs/compressed_video/msvd_captioning.yaml
# vatex
python3 mm_video/run_net.py --cfg configs/compressed_video/vatex_captioning.yaml

By default, the logs and results will be saved to ./log/<experiment_name>/. The loss and metrics are visualized using tensorboard.

Citation

@inproceedings{shen2023accurate,
      title={Accurate and Fast Compressed Video Captioning}, 
      author={Yaojie Shen and Xin Gu and Kai Xu and Heng Fan and Longyin Wen and Libo Zhang},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs/compressed_video		configs/compressed_video
dataset		dataset
mm_video		mm_video
model_zoo		model_zoo
test		test
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poster.pdf		poster.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

configs/compressed_video

configs/compressed_video

dataset

dataset

mm_video

mm_video

model_zoo

model_zoo

test

test

tools

tools

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

poster.pdf

poster.pdf

requirements.txt

requirements.txt

Repository files navigation

Accurate and Fast Compressed Video Captioning

Introduction

Preparation

1. Install the Requirements

2. Prepare the Pretrained Models

3. Prepare the Data

Training & Evaluation

Citation

About

Releases

Packages

Languages

License

acherstyx/CoCap

Folders and files

Latest commit

History

Repository files navigation

Accurate and Fast Compressed Video Captioning

Introduction

Preparation

1. Install the Requirements

2. Prepare the Pretrained Models

3. Prepare the Data

Training & Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages