Detecting Parts of Speech from Image for Caption Generation

This project is the code for the paper 'Detecting Parts of Speech from Image for Caption Generation' which is currently in review. Details will be updated later. The goal of this model is to detect different Parts of Speech (PoS) related features from an image, the features detected will be fed to a language model which will generate our caption. The concept is as below.

PoS CNN Model Detection

One method to check our PoS CNN models were trained and can detect PoS related features is to use GradCAM heatmap as seen below:

How To Use

1. Clone git

$ git clone https://github.com/philgookang/pcr.git
$ cd pcr

2. Install required library

$ pip install -r requirements.txt

3. Download dataset
When you download our dataset, you only download the captions for train, validation, and test. For the actual image, you need to download them at the official website. Also, all of our captions are saved by Pickle. You can only open them in python!

$ wget http://pcr.philgookang.com/data.zip
$ unzip data.zip
$ rm data.zip

3.1. Prepare dataset
You only need to prepare your dataset if your are using your own custom dataset. Also, only run the code below for the dataset your are using. If you have downloaded our dataset skip this part.

$ python ready_dataset_mscoco.py
$ python ready_dataset_flickr30k.py
$ python ready_dataset_flickr8k.py

4. Download pretrained model

$ wget http://pisa.snu.ac.kr/pcr/model.zip
$ unzip model.zip
$ mv model/ ./result/
$ rm model.zip

If the link does not work, you can download the pretrained model at this dropbox.

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
component		component
config		config
evaluation		evaluation
helper		helper
model		model
result		result
rss		rss
.gitignore		.gitignore
README.md		README.md
heatmap.py		heatmap.py
pretrain.py		pretrain.py
ready_dataset_flickr30k.py		ready_dataset_flickr30k.py
ready_dataset_flickr8k.py		ready_dataset_flickr8k.py
ready_dataset_mscoco.py		ready_dataset_mscoco.py
requirements.txt		requirements.txt
resize_image.py		resize_image.py
rockstar.py		rockstar.py
score.py		score.py
test.py		test.py
train.py		train.py

philgookang/pcr

Folders and files

Latest commit

History

Repository files navigation

Detecting Parts of Speech from Image for Caption Generation

PoS CNN Model Detection

How To Use

About

Topics

Resources

Stars

Watchers

Forks

Languages