Skip to content

dabsdamoon/Anime-Colorization-v0.2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Anime-Colorization v0.2

This repository is an upgrade version of Anime colorization I've done previously by using Keras. For previous works, visit: https://github.com/dabsdamoon/Anime-Colorization

์ด๋ฒˆ repository๋Š” keras๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์„ฑํ•œ ์ด์ „ Anime colorization์˜ ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ์ด์ „ ๊ฒฐ๊ณผ๋ฌผ๊ณผ ๊ด€๋ จํ•ด์„œ๋Š” ์ด ๋งํฌ๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค (https://github.com/dabsdamoon/Anime-Colorization).

Data Used

Source and Data preprocessing

(1) https://en.wikipedia.org/wiki/CIELAB_color_space

(2) https://www.aces.edu/dept/fisheries/education/pond_to_plate/documents/ExplanationoftheLABColorSpace.pdf

One can obtain the dataset used from: https://www.kaggle.com/mylesoneill/tagged-anime-illustrations#danbooru-metadata.zip. Since the size of danbooru image dataset is too big, only moeimouto-faces.zip dataset has been used. Notice that in this time I only selected images without background (white background) so that the model can detect facial parts more specifically. Same as the previous repo, I've converted RGB image to LAB image and use L channel for input and AB channel as output.

๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ์˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ–ˆ์Šต๋‹ˆ๋‹ค: https://www.kaggle.com/mylesoneill/tagged-anime-illustrations#danbooru-metadata.zip. ์œ„ ๋งํฌ์— ์กด์žฌํ•˜๋Š” ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ค‘ ํ•˜๋‚˜์ธ danbooru dataset์€ ์‚ฌ์ด์ฆˆ๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ด€๊ณ„๋กœ, moeimouto-face.zip ๋ฐ์ดํ„ฐ์…‹๋งŒ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์–ผ๊ตด์˜ ๊ฐ ๋ถ€๋ถ„๋“ค์„ ์ข€ ๋” ์ž˜ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฐฐ๊ฒฝ์ด ์—†๋Š” (ํ•˜์–€์ƒ‰ ๋ฐฐ๊ฒฝ) ์ด๋ฏธ์ง€๋“ค๋งŒ ๊ณจ๋ผ์„œ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์ „ repo์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, colorization์„ ์œ„ํ•ด์„œ RGP ์ด๋ฏธ์ง€๋ฅผ LAB ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ ํ›„, L channel์„ input, AB channel์„ output์œผ๋กœ ํ•˜๋Š” ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

data_preprocessing

Objective

After reviewing previous repo, I decided to make more clear definition of objective. My objective is

"To realistically color gray images!"
Note that I exclusively tried to use GAN since I want to color the gray image into many different color images. The example similar to my objective can be found in League of Legends, where the game sells chroma packs, a original scheme with different colorizations. In regular supervised learning method, however, one grayscale should have deterministic colorization label in order to train algorithms, and the trained algorithm would yield only the given deterministic colorizations. GAN algorithm, on the other hand, is semi-supervised learning method; in other words, the trained generator from GAN would yield colorization results that seem to be fit into the distribution of colorized images, not the specific colorization. Thus, I treid to use GAN for this project, and gave different noises for testing to observe how the trained generator colorizes a gray image differently.

์ด์ „ repo๋ฅผ ๋ฆฌ๋ทฐํ•ด๋ณธ ๋’ค, ๋ณธ project์˜ ๋ชฉ์ ์„ ์ข€ ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ํ•ด์•ผํ•  ๊ฒƒ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค. ์ œ ๋ชฉ์ ์€

"ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ ์‹ค์ œ ์ฑ„์ƒ‰๋œ ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด colorizeํ•˜๊ธฐ!"
์ž…๋‹ˆ๋‹ค. v0.2 repo๋ฅผ ๋ณด์‹œ๋ฉด ์ฑ„์ƒ‰์„ ์œ„ํ•ด GAN ๋งŒ์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ด๋Š” ํ•˜๋‚˜์˜ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋กœ ์—ฌ๋Ÿฌ๊ฐ€์ง€์˜ ์ฑ„์ƒ‰๋œ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค๊ณ  ์‹ถ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ œ ๋ชฉ์ ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ๋Š” LOL ๊ฒŒ์ž„์—์„œ ๊ธฐ๋ณธ ์Šคํ‚จ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์ƒ‰์น ํ•œ ๋ฒ„์ „์ธ chroma pack์„ ์˜ˆ๋กœ ๋“ค ์ˆ˜ ์žˆ๊ฒ ๋„ค์š”. ํ•˜์ง€๋งŒ ๊ธฐ์กด์˜ supervised learning์—์„œ๋Š” ํ•˜๋‚˜์˜ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๊ฐ€ ์ •ํ•ด์ง„ ์ƒ‰์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ์ง€ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ๋ ‡๊ฒŒ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ ์ •ํ•ด์ง„ ์ด๋ฏธ์ง€๋กœ๋ฐ–์—๋Š” ์ƒ‰์น ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๋ฐ˜๋Œ€๋กœ, GAN semi-supervised learning ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. GAN์˜ ํ•™์Šต๋œ generator๋Š” ํ•˜๋‚˜์˜ ํŠน์ •ํ•œ ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹Œ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€๋“ค์˜ ๋ถ„ํฌ์— ํฌํ•จ๋ ๋งŒํ•œ ์œ ์‚ฌํ•œ ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ €๋Š” GAN์„ ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ์— ์‚ฌ์šฉํ•˜์˜€๊ณ , generator๋ฅผ ํ…Œ์ŠคํŠธ ํ•  ๋•Œ ๋‹ค๋ฅธ noise ๊ฐ’๋“ค์„ ์ฃผ์–ด์„œ ํ•™์Šต๋œ generator๊ฐ€ ๊ฐ๊ฐ ์–ด๋–ค ๋‹ค๋ฅธ ์ƒ‰์น  ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š”์ง€ ๊ด€์ฐฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.

alt text

(https://na.leagueoflegends.com/en/news/champions-skins/skin-release/change-it-chroma-packs)

Can I make "CHROMA" of Anime characters?
๊ณผ์—ฐ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์บ๋ฆญํ„ฐ๋“ค์˜ "ํฌ๋กœ๋งˆ ์Šคํ‚จ"์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์„๊นŒ์š”?

Algorithms Used (with Reference)

DCGAN with U-Net Architecture

Architecture

(1) https://github.com/eriklindernoren/Keras-GAN/blob/master/dcgan/dcgan.py

(2) https://github.com/kongyanye/cwgan-gp/blob/master/cwgan_gp.py

Since I've explained about GAN in previous repo, I'll skip the explanation. It seemed that previous repo did not reveal the true power of GAN, so I tried to apply GAN again for this colorization project, hoping the result gets better in this time. Codes I've referenced are from (1) and (2). Also, many colorization projects with GAN model use either ResNet or U-Net architecture for generator. After some experiments, it seems for me that U-Net architecture works better, so I decided to use U-Net architecture (Since the decision is based on my heuristics, it would be grateful if one can give any helpful advice either supporting or objecting the decision).

GAN์— ๊ด€๋ จํ•ด์„œ๋Š” ์ด์ „ repo์— ์„ค๋ช…ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋žตํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด์ „ repo์—์„œ๋Š” GAN์„ ์ œ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋˜ ๊ฒƒ ๊ฐ™์•„์„œ, ์ด๋ฒˆ์— ๋‹ค์‹œ ํ•œ๋ฒˆ ์ ์šฉํ•˜์—ฌ ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜๊ณ ์ž ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ œ๊ฐ€ ์ฐธ๊ณ ํ•œ ์ฝ”๋“œ๋“ค์€ (1)๊ณผ (2)์—์„œ ์ฐธ์กฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, GAN์„ ์ด์šฉํ•œ ๋งŽ์€ colorization project๋“ค์ด ResNet ํ˜น์€ U-Net ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ช‡๋ช‡ ๊ฐ„๋‹จํ•œ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด, ๋ณธ colorization์—์„œ๋Š” U-Net ๊ตฌ์กฐ๊ฐ€ ์ข€ ๋” ๋‚˜์€ ๊ฒƒ ๊ฐ™์•„์„œ U-Net ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค (์œ„ ๊ฒฐ์ •์€ ์ œ ๊ฐœ์ธ์ ์ธ ํœด๋ฆฌ์Šคํ‹ฑ์—์„œ ๊ธฐ๋ฐ˜ํ•œ ๊ฒฐ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, ๊ฒฐ์ •๊ณผ ๊ด€๋ จํ•ด์„œ ์กฐ์–ธ๋“ค์ด ์žˆ์œผ์‹œ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๋ง์”€ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค).

alt text

alt text

Result

Below are inputs(grayscale) and outputs(colored) of the trained generator using GAN (epoch = 8,192):

์•„๋ž˜์˜ ์ด๋ฏธ์ง€๋“ค์€ GAN ๋ชจ๋ธ๋กœ ๋งŒ๋“  generator์˜ input(ํ‘๋ฐฑ ์ด๋ฏธ์ง€)์™€ output(์ฑ„์ƒ‰ ์ด๋ฏธ์ง€) ์ž…๋‹ˆ๋‹ค (epoch = 8,192):

BEFORE

grayscale

AFTER

colored

Also, as I said before, I tested one grayscale image with 25 different noises. Here, I prepared two different grayscale images - one existing in training dataset and one not existing in training dataset(The character I used for not existing is "Taylor" from BrownDust, a mobile game that I'm currently playing):

๋˜ํ•œ ์ œ๊ฐ€ ์–ธ๊ธ‰ํ•œ๋Œ€๋กœ ํ•˜๋‚˜์˜ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ 25๊ฐ€์ง€์˜ ๋‹ค๋ฅธ noise๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฑ„์ƒ‰ํ•˜๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ์‹คํ—˜์—์„œ๋Š” ๋‘ ์žฅ์˜ ๋‹ค๋ฅธ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ์š”, ํ•˜๋‚˜๋Š” training set์— ์กด์žฌํ•˜๋Š” ํ‘๋ฐฑ์ด๋ฏธ์ง€๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ์ œ๊ฐ€ ํ˜„์žฌ ํ”Œ๋ ˆ์ด์ค‘์ธ ๋ธŒ๋ผ์šด๋”์ŠคํŠธ ๊ฒŒ์ž„์˜ ์บ๋ฆญํ„ฐ์ธ "ํ…Œ์ผ๋Ÿฌ"์˜ ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค.

ONE IN TRAINING DATASET
ORIGINAL

original_within

GRAYSCALE

grayscale_within

COLORED

colored_within

ONE NOT IN TRAINING DATASET
ORIGINAL

original_notin

GRAYSCALE

grayscale_notin

COLORED

colored_notin

Well, not as well-colored as "chroma", but at least the generator gave me some different colorization results reasonable for me. For example, the generator detects facial parts (eyes, hair, mouth, etc) and colorizes them differently. Also, it's interesting for me to see the colorization result of Taylor, an example out of distribution, seems better than the in-distribution example. Now, I'm going to apply different GAN model called WGAN-GP.

์ œ๊ฐ€ ์›ํ•˜๋˜ chroma ๊ธ‰์˜ ํ€„๋ฆฌํ‹ฐ๋Š” ์•„๋‹ˆ์ง€๋งŒ... ๋ญ ์ ์–ด๋„ generator๊ฐ€ ์ดํ•ด๊ฐ€ ๋˜๋Š” ๋ฒ”์œ„์˜ ๋‹ค์–‘ํ•œ ์ƒ‰์น  ๊ฒฐ๊ณผ๋ฌผ์„ ๋‚ด์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŒ์กฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์–ผ๊ตด ๋ถ€๋ถ„๋ถ€๋ถ„(๋ˆˆ, ๋จธ๋ฆฌ, ์ž… ๋“ฑ)์„ ๊ฐ๊ฐ ๋‹ค๋ฅด๊ฒŒ ์ƒ‰์น ํ•˜๋Š” ๊ฒŒ ์ธ์ƒ์ ์ด๋„ค์š”. ๋˜ํ•œ, training ๋ถ„ํฌ์— ์†ํ•ด์žˆ์ง€ ์•Š์€ BrownDust์˜ ํ…Œ์ผ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„ํฌ์— ํฌํ•จ๋œ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์บ๋ฆญํ„ฐ ์ด๋ฏธ์ง€๋ณด๋‹ค ๋” ์ž˜ ์ƒ‰์น  ๊ฒƒ ๊ฐ™๋‹ค๋Š” ๋Š๋‚Œ์ด ๋“ค์–ด ํฅ๋ฏธ๋กœ์› ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—๋Š” ๋‹ค๋ฅธ algorithm์ธ WGAN-GP์„ ํ•œ๋ฒˆ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

WGAN-GP with U-Net Architecture

Brief Explanation about the Concept of WGAN-GP

(1) https://arxiv.org/abs/1701.07875

(2) https://vincentherrmann.github.io/blog/wasserstein/(/p>

(3) https://arxiv.org/pdf/1704.00028.pdf

As I mentioned in previous repo, WGAN(Wasserstein GAN) is one of those new versions by Arjovsky and Bottou (2017)(1), which applies Wasserstein loss instead of KL and JS divergence used for distance for the loss function in original GAN. In (1), the paper brings in the concept of weight clipping the weights in the discriminator in order to satisfy the Lipschitz constraint on the discriminator, a constraint that has to be satisfied in order to compute WGAN loss (For more information, visit (2) since I personally think it's so fat the best explanation I've read about the relationship between WGAN and Lipschitz constraint).

์ด์ „ repo์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด, WGAN์€ Arjovsky์™€ Bottou๊ฐ€ KL๊ณผ JS Divergence์— ๊ธฐ์ดˆํ•œ ๊ธฐ์กด์˜ GAN loss function์ด ์•„๋‹Œ Wasserstein loss ๊ฐœ๋…์„ ์ ์šฉํ•œ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ GAN ์ž…๋‹ˆ๋‹ค. ์ด ์ƒˆ๋กœ์šด loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” WGAN์˜ discriminator๊ฐ€ Lipschitz constraint๋ฅผ ๋งŒ์กฑ์‹œ๊ฒจ์•ผ ํ•˜๋Š”๋ฐ์š”, ์ด๋ฅผ ์œ„ํ•ด์„œ ๋…ผ๋ฌธ(1)์—์„œ ์ €์ž๋“ค์€ discriminator์˜ weight๋“ค์„ ํŠน์ • ๊ฐ’์œผ๋กœ clipํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค (์ž์„ธํ•œ ์ ์€ (2)๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ œ๊ฐ€ ์ƒ๊ฐํ–ˆ์„ ๋•Œ๋Š” ์—ฌํƒœ ์ฝ์—ˆ๋˜ ์ž๋ฃŒ ์ค‘ WGAN๊ณผ Lipschitz constraint์˜ ๊ด€๊ณ„ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•œ ์ž๋ฃŒ๊ฐ€ ์•„๋‹๊นŒ ์‹ถ์Šต๋‹ˆ๋‹ค).

Wasserstein Distance

alt text

However, WGAN also contains some problems such as capacity underuse or exploding/vanishing gradient problem: It's quite obvious that the discriminator will not be optimal if one clips its weight values into certain clipping value. Also, WGAN itself is quite sensitive about the clipping value, so exploding/vanishing gradient problem can be easily occurred. Thus, a new technique of gradient penalty has been introduced in (3), which directly constrains the gradient norm of the discriminatorโ€™s output with respect to its input (This is very brief explanation of WGAN-GP, so I recommend reading (3) in order to fully understand WGAN-GP). Interesting point to note is that the discriminator in WGAN-GP does not use BatchNormalization layer since batch normalization makes correlation among inputs of the layer. If there is a correlation among inputs of layer, the gradient norm of the discriminator's output with respect to its input will be changed.

ํ•˜์ง€๋งŒ, WGAN์—ญ์‹œ ์—ฌ๋Ÿฌ ๋ฌธ์ œ๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ capacity underuse์™€ exploding/vanishing gradient problem ์ž…๋‹ˆ๋‹ค. Discriminator์˜ ํ•™์Šต๋œ weight๋“ค์„ ๋‹ค์‹œ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ clipping ์‹œํ‚จ๋‹ค๋ฉด ๋‹น์—ฐํžˆ discriminator๊ฐ€ 100%์˜ ์„ฑ๋Šฅ์„ ๋‚ด์ง€ ๋ชปํ•˜๊ฒ ์ฃ . ๋˜ํ•œ, WGAN์€ ์ด clipping ๊ฐ’์— ๊ต‰์žฅํžˆ ๋ฏผ๊ฐํ•˜๋ฏ€๋กœ clipping ๊ฐ’์œผ๋กœ ์ธํ•œ exploding/vanishing gradient problem์ด ์‰ฝ๊ฒŒ ์ผ์–ด๋‚œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ƒˆ๋กœ์šด ๊ธฐ๋ฒ•์ธ gradient penalty๊ฐ€ ๋…ผ๋ฌธ(3)์— ์†Œ๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธฐ๋ฒ•์€ ๊ธฐ์กด discriminator์˜ weight๋“ค์„ clippingํ•˜๋Š” ๋Œ€์‹  input์— ๋Œ€ํ•œ discrimintor output์˜ gradient norm์— ์ง์ ‘ ์ œ์•ฝ์„ ์คŒ์œผ๋กœ์จ Lipschitz constraint๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(์œ„ ์„ค๋ช…์€ WGAN-GP์— ๋Œ€ํ•œ ๊ต‰์žฅํžˆ ๊ฐ„๋žตํ•œ ์„ค๋ช…์ด๋ฏ€๋กœ, WGAN-GP๋ฅผ ์˜จ์ „ํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋…ผ๋ฌธ(3)์„ ์ฝ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค). WGAN-GP์—์„œ ํŠน์ดํ•œ ์ ์€, discrimintor์— BatchNormalization layer๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ธ๋ฐ์š”, ์ด๋Š” batch normalization์€ input ๊ฐ„์˜ correlation์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— input์— ๋Œ€ํ•œ discriminator output์˜ gradient norm์„ ๊ณ„์‚ฐํ•ด์•ผ ํ•˜๋Š” WGAN-GP ๊ธฐ๋ฒ•์—๋Š” ์–ด๊ธ‹๋‚œ๋‹ค๊ณ  ํ•ด์„œ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Difference between WGAN weight-clipping and gradient penalty (https://arxiv.org/pdf/1704.00028.pdf)

WGAN-GP

Result

Below are inputs(grayscale) and outputs(colored) of the trained generator using WGAN-GP (epoch = 2,048):

์•„๋ž˜์˜ ์ด๋ฏธ์ง€๋“ค์€ WGAN-GP ๋ชจ๋ธ๋กœ ๋งŒ๋“  generator์˜ input(ํ‘๋ฐฑ ์ด๋ฏธ์ง€)์™€ output(์ฑ„์ƒ‰ ์ด๋ฏธ์ง€) ์ž…๋‹ˆ๋‹ค (epoch = 2,048):

BEFORE

grayscale

AFTER

colored

Also, same as what I did for GAN, I tested one grayscale image with 25 different noises:

๋˜ํ•œ, GAN๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํ•˜๋‚˜์˜ ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ 25๊ฐ€์ง€์˜ ๋‹ค๋ฅธ noise๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค:

ONE IN TRAINING DATASET
ORIGINAL

original_within

GRAYSCALE

grayscale_within

COLORED

colored_within

ONE NOT IN TRAINING DATASET
ORIGINAL

original_notin

GRAYSCALE

grayscale_notin

COLORED

colored_notin

It seems to me that the quality of colorization gets better after using WGAN-GP, but I cannot "quantify" how much the result is improved from GAN result. Still, it was worthwhile for me to run WGAN-GP codes and get comparably decent result for colorization.

๋ˆˆ์œผ๋กœ๋งŒ ๋ดค์„ ๋•Œ๋Š” WGAN-GP๋ฅผ ์ด์šฉํ•˜์—ฌ ๋งŒ๋“  generator๊ฐ€ ์ข€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋Š” ๊ฒƒ ๊ฐ™์ง€๋งŒ, GAN๊ณผ ๋น„๊ตํ•ด์„œ ์–ผ๋งˆ๋‚˜ ์ข‹์•„์กŒ๋Š”์ง€ "์ˆ˜์น˜ํ™”"๋ฅผ ์‹œํ‚ฌ ์ˆ˜๊ฐ€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  WGAN-GP ์ฝ”๋“œ๋ฅผ ๋Œ๋ ค๋ณด๊ณ  ๋น„๊ต์  ๊ดœ์ฐฎ์„ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ๊ฐ€์น˜๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Conclusion and Future Plan

So far, I've done coloriztaion of grayscale image to color image. After I've been training algorithms many times with different parameters it seems that WGAN-GP generally seems to produce better results than GAN. However, WGAN-GP is quite slow, and sometimes GAN also produces seemingly better results! It's also ambiguous to define "better colorization", so I've learned that professional knowledge regarding colorization is also needed for the project. Also, there are some codes needed to be improved: For example, to apply RandomWeightedAverage, I gave global parameters (batch_size, img_shape_d), which needs to be changed whenever the size of batch is changed :(:(

ํ‘๋ฐฑ ์ด๋ฏธ์ง€๋ฅผ ์ƒ‰์น ํ•ด๋ณด๋Š” ๊ณผ์ œ๋ฅผ ๋งˆ์ณค์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐœ์˜ algorithm์„ ๋‹ค๋ฅธ parameter ๊ฐ’๋“ค๋กœ ํ•ด๋ณธ ๊ฒฐ๊ณผ, WGAN-GP๊ฐ€ GAN๋ณด๋‹ค ๋ณด๊ธฐ์— ์ข€ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, WGAN-GP๋Š” GAN์— ๋น„ํ•ด ํ•™์Šต์†๋„๊ฐ€ ๋Š๋ฆฌ๊ณ , ๋•Œ๋•Œ๋กœ GAN์ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ๋•Œ๋„ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  "์–ด๋–ค ์ด๋ฏธ์ง€๊ฐ€ ์ข€ ๋” ์ž˜ ์ƒ‰์น ๋˜์—ˆ๋Š”๊ฐ€?" ๋ผ๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ๊ต‰์žฅํžˆ ์• ๋งคํ•ด์„œ, ์ด๋ฏธ์ง€ ์ƒ‰์น ๊ณผ ๊ด€๋ จ๋œ ๋„๋ฉ”์ธ ์ง€์‹์ด ํ•„์š”ํ•˜๋‹ค๋Š” ์‚ฌ์‹ค๋„ ๊นจ๋‹ฌ์•˜์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ฝ”๋“œ ๋ถ€๋ถ„์—์„œ ์•„์ง ๋ถ€์กฑํ•œ ๋ถ€๋ถ„์ด ๋งŽ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, WGAN-GP๋ฅผ ์œ„ํ•œ RandomWeightAverage ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด batch_size, img_shape_d๋ฅผ global parameter๋กœ ์ •์˜ํ–ˆ๋Š”๋ฐ, ์ด๋Ÿฌ๋ฉด batch_size๊ฐ€ ๋ณ€ํ•  ๋•Œ๋งˆ๋‹ค ๊ฐ’์„ ๋ฐ”๊ฟ”์ฃผ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋น„ํšจ์œจ์ ์ด๋ž€ ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค ใ… ใ… 

Acknowledgement and P.S

Special thanks to AI Research Lab in Neowiz Play Studio (http://neowizplaystudio.com/ko/) that allowed me to use resources for the project. If you need a white-background dataset, please send an e-mail to the address given below contact information.

์œ„ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰์„ ์œ„ํ•œ ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ํ—ˆ๋ฝํ•ด์ฃผ์‹  ๋„ค์˜ค์œ„์ฆˆํ”Œ๋ ˆ์ด์ŠคํŠœ๋””์˜ค ๋‚ด์˜ AI์—ฐ๊ตฌ์†Œ์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ํ˜น์‹œ ์ œ๊ฐ€ ์‚ฌ์šฉํ•œ ํ•˜์–€์ƒ‰ ๋ฐฐ๊ฒฝ์˜ ์บ๋ฆญํ„ฐ ์ด๋ฏธ์ง€๊ฐ€ ํ•„์š”ํ•˜์‹  ๋ถ„๋“ค์ด ์žˆ๋‹ค๋ฉด ์•„๋ž˜ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋กœ ๋ฌธ์˜์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Contact Information

facebook: https://www.facebook.com/dabin.moon.7

email: dabsdamoon@neowiz.com

About

Colorize gray images to RGB with selected dataset (white background only)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published