Fine-grained classification with textual cues

Implementation based in our paper: "Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features"

https://arxiv.org/pdf/2001.04732.pdf

Install

Create Conda environment

$ conda env create -f environment.yml

Activate the environment

$ conda activate finegrained

Train from scratch

python3 train.py

(Please refer to the code to decide the args to train the model)

Datasets

Con-Text dataset can be downloaded from: https://staff.fnwi.uva.nl/s.karaoglu/datasetWeb/Dataset.html

Drink-Bottle dataset: https://drive.google.com/file/d/10BZN5_BGg21olZA857SMvF0TPgukmVI4/view?usp=sharing

Textual Features

The results depicted in the paper were obtained by using the Fisher Vector of a set of PHOCs obtained from an image. To extract the PHOCs, the following to repos can be used:

https://github.com/DreadPiratePsyopus/Pytorch-yolo-phoc (Pytorch) https://github.com/lluisgomez/single-shot-str (Tensorflow)

Finally, the Fisher Vector out of the obtained PHOCs are used during training/inference time.

The Fisher Vector implementation was taken from: https://gist.github.com/danoneata/9927923

In the folder 'preproc' there is a script which does the following:

Create a PHOC dictionary.
Perform Scaling, Normalization, PCA of the PHOC dictionary.
Train a GMM based on the PHOC data.
Given a PHOC result path with .json files as PHOC predictions, reads each file and constructs the Fisher Vector to be used to train the model.

Simply edit the path that contains the PHOC predictions and the path to save the Fisher Vectors. This path is the one that the Dataloader uses to load the textual features at training/inference time. Finally, run:

$ python2 phocs_to_FV.py

Precomputed textual features for the Bottles and Context dataset used in the paper can be provided, but if you want to train/test the model with another dataset you will have to generate the Textual features.

Classification Results

Reference

If you found this code useful, please cite the following paper:

@inproceedings{mafla2020fine, title={Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features}, author={Mafla, Andres and Dey, Sounak and Biten, Ali Furkan and Gomez, Lluis and Karatzas, Dimosthenis}, booktitle={The IEEE Winter Conference on Applications of Computer Vision}, pages={2950--2959}, year={2020} }

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
preproc		preproc
project_images		project_images
src		src
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preproc

preproc

project_images

project_images

src

src

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Fine-grained classification with textual cues

Install

Datasets

Textual Features

Classification Results

Reference

License

About

Releases

Packages

Languages

License

AndresPMD/Fine_Grained_Clf

Folders and files

Latest commit

History

Repository files navigation

Fine-grained classification with textual cues

Install

Datasets

Textual Features

Classification Results

Reference

License

About

Resources

License

Stars

Watchers

Forks

Languages