Skip to content

yahsieh37/Visual-Saliency-Prediction

Repository files navigation

Visual Saliency Prediction and Effectiveness Analysis using Metrics

Final project for Gatech BMED 7610 - Quantitative Neuroscience (Fall 2019)

This repository is forked from the original code for visual saliency prediction models development and experiments. The final report of this project can be found here.

The modifications to the original model include:

  • Incoporating self-attention blocks [1] in different layers of the encoder.
  • Features fusion at the end of the encoder.
  • Skip connections between the encoder and decoder.
  • Batch normalization.

To train and evaluate the models, follow the instructions in the original readme (shown below) by using the main.py file. Use main_sc.py for models with skip connections and main_bn.py for models with batch normalization.

Baselines

Other baselines of saliency prediction models used in the project can be found here.

Dataset

The datasets used in the project can be found here for MIT1003 dataset and here for SALICON dataset.

Metrics

Codes for calculating scores of the metrics used in the project are under the folder metrics/. Use cal_score.m to process all saliency prediction results in a given folder.

References:

[1] Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019, May). Self-attention generative adversarial networks. In International Conference on Machine Learning (pp. 7354-7363). PMLR.


Original readme below:

Contextual Encoder-Decoder Network for Visual Saliency Prediction

This repository contains the official TensorFlow implementation of the MSI-Net (multi-scale information network), as described in the arXiv paper Contextual Encoder-Decoder Network for Visual Saliency Prediction (2019).

Abstract: Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive results on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on selected examples. The network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources to estimate human fixations across complex natural scenes.

Our results on the MIT saliency benchmark can be viewed here.

Architecture

Requirements

Package Version
python 3.6.8
tensorflow 1.13.1
matplotlib 3.0.3
requests 2.21.0

The code was tested and is compatible with both Windows and Linux. We strongly recommend to use TensorFlow with GPU acceleration, especially when training the model. Nevertheless, a slower CPU version is officially supported.

Training

The results of our paper can be reproduced by first training the MSI-Net via the following command:

python main.py train

This will start the training procedure for the SALICON dataset with the hyperparameters defined in config.py. If you want to optimize the model for CPU usage, please change the corresponding device value in the configurations file. Optionally, the dataset and download path can be specified via command line arguments:

python main.py train -d DATA -p PATH

Here, the DATA argument must be salicon, mit1003, or cat2000. It is required that the model is first trained on the SALICON dataset before fine-tuning it on either MIT1003 or CAT2000. By default, the selected saliency dataset will be downloaded to the folder data/ but you can point to a different directory via the PATH argument.

All results are then stored under the folder results/, which contains the training history and model checkpoints. This allows to continue training or perform inference on test instances, as described in the next section.

Testing

To test a pre-trained model on image data and produce saliency maps, execute the following command:

python main.py test -d DATA -p PATH

If no checkpoint is available from prior training, it will automatically download our pre-trained model to weights/. The DATA argument defines which network is used and must be salicon, mit1003, or cat2000. It will then resize the input images to the dimensions specified in the configurations file. Note that this might lead to excessive image padding depending on the selected dataset.

The PATH argument points to the folder where the test data is stored but can also denote a single image file directly. As for network training, the device value can be changed to CPU in the configurations file. This ensures that the model optimized for CPU will be utilized and hence improves the inference speed. All results are finally stored in the folder results/images/ with the original image dimensions.

Demo

A demonstration of saliency prediction in the browser is available here. It computes saliency maps based on the input from a webcam via TensorFlow.js. Since the library uses the machine's hardware, model performance is dependent on your local configuration. The buttons allow you to select the quality, ranging from very low for a version trained on low image resolution with high inference speed, to very high for a version trained on high image resolution with slow inference speed.

Contact

For questions, bug reports, and suggestions about this work, please create an issue in this repository.

About

Final project for Quantitative Neuroscience

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published