Learnable Pooling Methods for Video Classification

The repository is based on the starter code provided by Google AI. It contains a code for training and evaluating models for YouTube-8M dataset. The detailed table of contents and descriptions can be found at original repository.

The repository contains models from team "Deep Topology". Our approach was accepted in ECCV - The 2nd Workshop on YouTube-8M Large-Scale Video Understanding. The presentation is accessible in ECCV Workshop page.

Presentation: TBA
Paper: Link, Arxiv

Usage

In frame_level_models.py, prototype 1, 2 and 3 refer to sections 3.1, 3.2 and 3.2 in the paper. The detailed instructions instructions to train and evaluate the model can be found at YT8M repository. The following is the example training command to reproduce the result.

Prototype 1 (Attention Enhanced NetVLAD)

python train.py --train_data_pattern="<path to train .tfrecord>" --model=NetVladV1 --train_dir="<path for model checkpoints>" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=512 --iterations=256 --learning_rate_decay=0.85

Prototype 2 (NetVLAD with Attention Based Cluster Similarities)

python train.py --train_data_pattern="<path to train .tfrecord>" --model=NetVladV2 --train_dir="<path for model checkpoints>" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=512 --iterations=256 --learning_rate_decay=0.85

Prototype 3 (Regularized Function Approximation Approach)

TBD

Changes

1.00 (31 August 2018)
- Initial public release
2.00 (30 September 2018)
- Code cleaning
- Model usage

Citations

If you find our apporaches useful, please cite our paper.

@article{kmiec2018learnable,
  title={Learnable Pooling Methods for Video Classification},
  author={Kmiec, Sebastian and Bae, Juhan and An, Ruijian},
  journal={arXiv preprint arXiv:1810.00530},
  year={2018}
}

Contributors (Alphabetical Order)

Name		Name	Last commit message	Last commit date
Latest commit History 364 Commits
paper		paper
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
aggregation_modules.py		aggregation_modules.py
attention_modules.py		attention_modules.py
average_precision_calculator.py		average_precision_calculator.py
eval.py		eval.py
eval_util.py		eval_util.py
export_model.py		export_model.py
frame_level_models.py		frame_level_models.py
inference.py		inference.py
losses.py		losses.py
mean_average_precision_calculator.py		mean_average_precision_calculator.py
model_utils.py		model_utils.py
models.py		models.py
module_utils.py		module_utils.py
modules.py		modules.py
pathmagic.py		pathmagic.py
readers.py		readers.py
rnn_modules.py		rnn_modules.py
train.py		train.py
transformer_utils.py		transformer_utils.py
utils.py		utils.py
video_level_models.py		video_level_models.py
video_pooling_modules.py		video_pooling_modules.py

License

pomonam/LearnablePoolingMethods

Folders and files

Latest commit

History

Repository files navigation

Learnable Pooling Methods for Video Classification

Usage

Prototype 1 (Attention Enhanced NetVLAD)

Prototype 2 (NetVLAD with Attention Based Cluster Similarities)

Prototype 3 (Regularized Function Approximation Approach)

Changes

Citations

Contributors (Alphabetical Order)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages