vgs

The PyTorch version of the original Theano Recurrent Highway Network-based model is is slow and buggy.
The alternative GRU-based model is much faster and more usable.

Flickr8k

The data (Flickr8K speech and image features) needed to run the model is here: https://drive.google.com/file/d/14OVoyKibsslVwgYxxgd-s3dbA4bHUZtf/view?usp=sharing Unpack it in the data directory.

python3 run.py > log.txt

The script will run for 25 epochs and print the value of loss function periodically.

There are also example experiment runs with the Synthetically spoken COCO dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
analysis		analysis
data		data
experiments		experiments
onion		onion
vg		vg
README.md		README.md
extract_img_feats.py		extract_img_feats.py
metrics.py		metrics.py
preprocess.py		preprocess.py
scores.py		scores.py
semanticf8k.py		semanticf8k.py
setup.py		setup.py
transcribe_flickr8k.py		transcribe_flickr8k.py