Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Tensorflow implementation for the paper Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding published in CVPR 2019.

Data Pre-Processing

You need to first follow the instructions in ./data/readme/ for each dataset and prepare the data. A sample code for creating data process instance can be found in ./data/data-process.ipynb

Pre-Trained Models

Please download pre-trained models from here and unpack them in ./code/models/. Please note that this package also includes visual models (pre-trained on ImageNet) and ELMo model, which are necessary.

Training

To train a model, simply run ./code/train.py specifying the desired parameters.

Dependencies

Python 3.6/3.7

Tensorflow 1.14.0

We also use the following packages, which could be installed by pip install -r requirements.txt:

tensorpack.dataflow
tensorflow_hub
opencv-python

ToDo List

Cite

If you found this work/code helpful or used this work in any capacity, please cite:

@inproceedings{akbari2019multi,
  title={Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding},
  author={Akbari, Hassan and Karaman, Svebor and Bhargava, Surabhi and Chen, Brian and Vondrick, Carl and Chang, Shih-Fu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={12476--12486},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
code		code
data		data
original_codes_stable		original_codes_stable
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

original_codes_stable

original_codes_stable

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Data Pre-Processing

Pre-Trained Models

Training

Dependencies

ToDo List

Cite

About

Releases

Packages

Languages

License

hassanhub/MultiGrounding

Folders and files

Latest commit

History

Repository files navigation

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Data Pre-Processing

Pre-Trained Models

Training

Dependencies

ToDo List

Cite

About

Resources

License

Stars

Watchers

Forks

Languages