GitHub

Image Resolution Enhancement Using Multi-Step Reinforcement Learning

An MSc thesis work. - 2020.

As a quest to achieve super-resolution I've taken the novel approach of constructing a Reinforcement Learning based model using Tensorflow 2.0. But first, to better undersand the problem, I've studied and implemented several upsampling models which are detailed in the sections below. My contribution also extends to creating easy-to-use notebooks for downloading and preprocessing popular datasets (BSD68, BSD500, MIT-Adobe FiveK, MS COCO, BDD100k, Cityscapes, Kitti). Based off of Tensorflow's dataset implementation, I've implemeted a custom, reusable, memory-efficient dataset class for handling images (only HDF format is supported as of yet).

Models

Supervised

Pre-upsampling model (SRCNN)

Reference: Image Super-Resolution Using Deep Convolutional Networks - Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang

First, an LR image is upsampled using traditional Bicubic interpolation to get the HR* representation. Then this coarse image is fed to the model, which is aimed to reconstruct missing/lost details. Fast, but the achievable results are hidered by the upsampling algorithm.

SSIM=0.70 PSNR=20.70

Post-upsampling model (FSRCNN)

Reference: Accelerating the Super-Resolution Convolutional Neural Network - Chao Dong, Chen Change Loy, Xiaoou Tang

A model largely similar to the pre-upsampling model with the difference of feeding the LR image to the model instead of the upsampled one. This is done in hope of trying to mitigate the effect of the upsamping algorithm by using the convolutional layers to extract features before they are potentionally lost. Fast, but the achievable results are still hidered by the upsampling.

SSIM=0.60 PSNR=19.21

Progressive upsampling model (LapSRN)

Reference: Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks - Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang

The model uses a pyramid structure. At each pyramid level, the model consists of a feature embedding sub-network for extracting non-linear features, transposed convolutional layers for upsampling feature maps and images, and a convolutional layer for predicting the sub-band residuals. As the network structure at each level is highly similar, it shares the weights of those components across pyramid levels to reduce the number of network parameters. This way the upsampling can take place in multiple steps, which makes the network capable to preserve more of the original image's features, especially for large (4x, 8x) scaling factors.

SSIM=0.70 PSNR=20.68

Iterative upsampling model (LapSRN)

Reference: Deep Back-Projection Networks for Single Image Super-resolution - Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

The model consists of iterative up- and down-sampling layers. These layers are formed as a unit providing an error feedback mechanism for projection errors.

SSIM=0.65 PSNR=19.22

Adversarial

For adversarial training I've constructed a classifier discriminator model consisting of TODO layers. By incorporating the discriminator's loss into the aforementioned supervised leraning model's (in this context the generators') training process, except from the pre-upsampling model all other models were able to attain a higher evaluation rating. The following images are meant to illustrate the results, the values below the images are the average PSNR and SSIM values, and their differences compared to the purely supervised trainig results evaluated on Set14 dataset.

                                    SSIM    0.69 (-0.01)       0.60 (+0.00)       0.71 (+0.01)       0.65 (+0.00)
                                    PSNR    20.55 (-0.15)     19.40 (+0.19)      20.98 (+0.30)      19.53 (+0.31)

Reinforcement

Reference: PixelRL: Fully Convolutional Network with Reinforcement Learning for Image Processing - Ryosuke Furuta, Naoto Inoue, Toshihiko Yamasaki

An effective reinforcement learning model with pixel-wise rewards modified for upsampling task. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. Therefore a policy is constructed which strives for maximum reward.

SSIM=0.81 PSNR=22.37

How to use

Predict

For prediction purposes, I've included pre-trained models. To use them, all you have to do is:

Instantinate a network e.g.: network = IterativeSamplingNetwork((35, 35, 1)) for grayscale, network = IterativeSamplingNetwork((35, 35, 3)) for colored images
Load the pre-trained state/weights: network.load_state() for grayscale, network.load_state("_color") for colored images
Get your images as a 4D numpy array, shaped (count, height, width, channels)
Predict: predicted_images = network.predict(your_4d_numpy_image_array) The predict method supports an optional scaling_factor: int parameter. The default scaling is 2x, but most of the models suppot 4x, 8x as well.

Complete example:

import numpy as np
import cv2

from src.networks.supervised.pre_upsampling_network import PreUpsamplingNetwork

network = PreUpsamplingNetwork((35, 35, 3))
network.load_state("_color")

img = cv2.imread(<path to your image>, cv2.IMREAD_COLOR)
img = cv2.resize(img, (35, 35), cv2.INTER_CUBIC)    # network works with 35x35 images
img = img[np.newaxis, :, :, :]   # get the 4D shaped image numpy array
img = img / 255.0	# normalize image values
pred = network.predict(img, 2)  # upsample the image by 2x

# display the image
cv2.imshow("2x image", pred[0])     # pred is also a 4D array, we want to display the first (and in this case only) image
cv2.waitKey()

Train

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
checkpoints		checkpoints
documentation		documentation
src		src
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoints

checkpoints

documentation

documentation

src

src

test

test

utils

utils

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Image Resolution Enhancement Using Multi-Step Reinforcement Learning

Models

Supervised

Pre-upsampling model (SRCNN)

Post-upsampling model (FSRCNN)

Progressive upsampling model (LapSRN)

Iterative upsampling model (LapSRN)

Adversarial

Reinforcement

How to use

Predict

Train

About

Releases

Packages

Languages

nyikovicsmate/thesis

Folders and files

Latest commit

History

Repository files navigation

Image Resolution Enhancement Using Multi-Step Reinforcement Learning

Models

Supervised

Pre-upsampling model (SRCNN)

Post-upsampling model (FSRCNN)

Progressive upsampling model (LapSRN)

Iterative upsampling model (LapSRN)

Adversarial

Reinforcement

How to use

Predict

Train

About

Resources

Stars

Watchers

Forks

Languages