HierSE

Given the difficulty of acquiring labeled examples for many fine-grained visual classes, there is an increasing interest in zero-shot image tagging, aiming to tag images with novel labels that have no training examples present. This project provides pure python (and thus cross-platform) implementation of two semantic embedding based methods for zero-shot image tagging, namely Convex Semantic Embedding (ConSE) [1] and Hierarchical Semantic Embedding (HierSE) [2].

The key idea of zero shot learning is to introduce an intermediate layer between images and labels such that a novel label can also be represented in this layer, even when no example of this label is supplied. In ConSE and HierSE, this layer is implemented using a word2vec semantic space.

Why Hierarchical Semantic Embedding?

Make the label embedding more reliable, in particular for those of relatively low occurrence
Resolve semantic ambiguity by embedding a label into distinct vectors, depending on its given sense. This is a more fundamental advantage compared to ConSE or other semantic embedding methods where a specific tag will always be represented by the same vector, regardless of its senses. Imagine a label of multiple senses, e.g., mouse, which can be rat or computer mouse.

Dependencies

numpy
pre-trained word2vec model: learned from social tags of over 4 million Flickr images (flickr4m) using Word2Vec. The pre-trained model is also available at google drive. The original flickr4m tags can be downloaded here.

Getting started

git clone https://github.com/li-xirong/hierse
cd hierse/doit
./get_word2vec_model.sh 
cd ..
python test_all.py

Run test_all.py to see if everything is in place. For hands-on examples, please refer to scripts in the doit folder and the tutorial page.

Use hierse and tensorflow to tag a new image
Use hierse and pytorch to tag new images

Implementation

The process of projecting a novel label to the w2v layer is described in the Synset2Vec class, PartialSynset2Vec class in synset2vec.py, and the HierSynset2Vec class, the HierPartialSynset2Vec class in synset2vec_hier.py, respectively. The four classes corresponds to four different methods (conse, conse2, hierse, hierse2) for vectorizing a WordNet synset, determined by the choice of phrase matching strategies (full match or partial match) and whether the WordNet hierarchy is considered.

conse: full match + no hierarchy
conse2: partial match + no hierarchy
hierse: full match + hierarchy
hierse2: partial match + hierarchy

The Image2Vec class in im2vec.py projects an (unlabeled) image to this layer. The training label set is ImageNet ilsvrc12_test1k. Our code assumes that probabilistic relevance score of each training label with respect to the image has been pre-computed and stored. see the provided sample set imagenet2hop-random2k. That said, as long as you have a pre-trained (CNN) model, e.g., caffenet or tensorflow series, that can predict the 1k ILSVRC12 labels, the code also works, see the tutorial page.

Having both image and label vectorized, the ZeroshotTagger class in tagger.py predicts the most likely labels.

Using your own data (images and labels)

The current version demonstrates the use of ConSE and HierSE on imagenet2hop-random2k, a subset of 2k images randomly selected from the whole test set of 1.3 million images. Most of the code is self-explained, I hope ;). Nevertheless, a bit coding is probably needed to make the code run on new data, in particular for a new training label set (Y0) and a new test label set (Y1) other than ilsvrc12_test1k and ilsvrc12_test1k_2hop.

To perform zero-shot tagging on a test image set X:

Use an existing classification system to generate probabilistic relevance score of each label in Y0 w.r.t each image in X. Using txt2bin.py to store the predictions in the required binary format.
Modify and run do_label2vec.sh to vectorize Y0 and Y1.
Do image tagging by calling zero_shot_tagging.py.
Report hit@1, hit@2, hit@5, hit@10 using evaluate.py

References

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, Jeffrey Dean, Zero-shot learning by convex combination of semantic embedding, ICLR, 2014
Xirong Li, Shuai Liao, Weiyu Lan, Xiaoyong Du, Gang Yang, Zero-shot image tagging by hierarchical semantic embedding, SIGIR, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HierSE

Dependencies

Getting started

Implementation

Using your own data (images and labels)

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

HierSE

Dependencies

Getting started

Implementation

Using your own data (images and labels)

References