Skip to content

passerby233/VSCMR-Visual-Storytelling-with-Corss-Modal-Rules

Repository files navigation

Visual Storytelling with Corss-Modal Rules

Code for Paper: Informative Visual Storytelling with Cross-modal Rules

In procedings of ACM Multimedia 2019
@inproceedings{li2019informative,
  title={Informative Visual Storytelling with Cross-modal Rules},
  author={Li, Jiacheng and Shi, Haizhou and Tang, Siliang and Wu, Fei and Zhuang, Yueting},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={2314--2322},
  year={2019},
  organization={ACM}
}
The storytelling code is adpated from https://github.com/eric-xw/AREL

Prerequisites

  • Python 2.7
  • Python 3.x
  • PyTorch 0.3
  • TensorFlow (optional, only using the fantastic tensorboard)
  • cuda & cudnn

Usage

1. Setup

vist
 |--annotatioons
  |--dii
  |--sis
  |--images
    |--test
    |--train
    |--val

2. Cross-Modal Rule Mining

In The following decription, [] denotes option

  • Enter the folder rule_ming:
cd rule_mining
  • Create multi-modal transactions:
python2 create_transactions.py [mode]

mode can be 'train', 'val', 'test'; default is 'train' if not designated.

  • Find the frequent itemset:
python3 fpgrowth_py3.py [--minsupc 3]
  • Get Cross-Modal Rules:
python3 get_rules.py [--minsupc 3,--conf 0.6]
  • Extract semantic concepts with CMR:
python2 extract_semantics.py [4]

The option is the number of threads; Larger, faster; Please set according to the number of your CPU cores.

3. Visual Storytelling

These scripts are adapted from AREL
, we add a attention mechanism to attend the inferred concepts. To train the VIST model:

python2 train.py --beam_size 3 [--id model_name]

To test the performance on metrics:

python2 train.py --beam_size 3 --option test --start_from_model data/save/XE/model.pth [--id score_save_path]