ABiViRNet: Attention Bidirectional Video Recurrent Net for video captioning

This repository contains the code for building a system similar to the one from the work Video Description using Bidirectional Recurrent Neural Networks, presented at the International Conference of Artificial Neural Networks (ICANN'16). With this module, you can replicate our experiments and easily deploy new models. ABiViRNet is built upon our fork of Keras framework and tested for the Theano and Tensorflow backends.

Features:

Attention model over the input sequence of frames
Peeked decoder LSTM: The previously generated word is an input of the current LSTM timestep
MLPs for initializing the LSTM hidden and memory state
Beam search decoding

Architecture

Requirements

ABiViRNet requires the following libraries:

Instructions:

Assuming you have a dataset and features extracted from the video frames:

Prepare data:

python data_engine/subsample_frames_features.py

python data_engine/generate_features_lists.py

python data_engine/generate_descriptions_lists.py

See data_engine/README.md for detailed information.

Prepare the inputs/outputs of your model in data_engine/prepare_data.py
Set a model configuration in config.py
Train!:

python main.py

Citation

If you use this code for any purpose, please, do not forget to cite the following paper:

Peris, Á., Bolanos, M., Radeva, P., & Casacuberta, F. (2016, September). Video description using bidirectional recurrent neural networks. In International Conference on Artificial Neural Networks (pp. 3-11). Springer International Publishing.

Bibtex version:

@inproceedings{peris2016video,
  title={Video description using bidirectional recurrent neural networks},
  author={Peris, {\'A}lvaro and Bolanos, Marc and Radeva, Petia and Casacuberta, Francisco},
  booktitle={International Conference on Artificial Neural Networks},
  pages={3--11},
  year={2016},
  organization={Springer}
}

About

Joint collaboration between the Computer Vision at the University of Barcelona (CVUB) group at Universitat de Barcelona-CVC and the PRHLT Research Center at Universitat Politècnica de València.

Contact

Álvaro Peris (web page): lvapeab@prhlt.upv.es

Marc Bolaños (web page): marc.bolanos@ub.edu

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
abivirnet		abivirnet
data_engine		data_engine
docs		docs
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
main.py		main.py
plot_output.ipynb		plot_output.ipynb
train.sh		train.sh
viddesc_model.py		viddesc_model.py

License

lvapeab/ABiViRNet

Folders and files

Latest commit

History

Repository files navigation

ABiViRNet: Attention Bidirectional Video Recurrent Net for video captioning

Features:

Architecture

Requirements

Instructions:

Citation

About

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages