Skip to content

dnwjddl/PaperReview

Repository files navigation

PaperReview_v1

computer vision paper review

Image Classification

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR 2021 : Open review)

  • NLP๋ถ„์•ผ์—์„œ ํ•ซํ•œ ๋ชจ๋ธ์ธ Transformer๋ฅผ vision task์— ์ ์šฉํ•œ ๋…ผ๋ฌธ
  • Transformer์„ ๊ฑฐ์˜ ๊ทธ๋Œ€๋กœ image classification task์— ์ด์šฉํ•œ ๊ฒƒ์œผ๋กœ, ImangeNet/ImageNet-ReaL/CIFAR-100/VTAB SoTA๋ชจ๋ธ๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•œ ์ •๋„ ํ˜น์€ ๊ทธ ์ด์ƒ์„ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑ

https://arxiv.org/pdf/2010.11929.pdf

Spatially Attentive Output Layer for Image Classification(Accepted at CVPR 2020, Kakao Brain)

  • ์ผ๋ฐ˜์ ์ธ CNN(Convolutional Neural Network)๋Š” GAP(Gloval Average Pooling)์— ์ด์–ด ์ถœ๋ ฅ ๋กœ์ง“์— Fully Connected layer ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ด ๋…ผ๋ฌธ์—์„œ ์ƒˆ๋กญ๊ฒŒ ์ œ์‹œ๋˜๋Š” spatial aggregation procedure ๋Š” ์ถœ๋ ฅ ๊ณ„์ธต์—์„œ ์œ„์น˜๋ณ„ ์ •๋ณด์˜ ํ™œ์šฉ์„ ์ œํ•œํ•œ๋‹ค.
  • ์ด ๋…ผ๋ฌธ์€ ์œ„์น˜๋ณ„ ์ถœ๋ ฅ ์ •๋ณด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ์ปจ๋ณผ๋ฃจ์…˜ ํ”ผ์ณ๋งต ์œ„์— ์ƒˆ๋กœ์šด spatial output layer ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

https://arxiv.org/pdf/2004.07570.pdf

GAN(Generative Adversarial Network)

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation(ICLR 2020)

  • Unsupervised Image-to-Image Translation
  • ๋‘ ๋„๋ฉ”์ธ๊ฐ„์˜ ๋ณ€ํ™˜์„ ํ• ๋•Œ,ย ๊ฐ€์žฅ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ์˜์—ญ์— ์ง‘์ค‘ํ•ด์„œ ๋ณ€ํ™˜์„ ํ•˜๋„๋กย Attention moduleย ๊ฒฐํ•ฉ
  • ๋ณ€ํ™˜์„ ํ• ๋•Œ, ๋ฐ์ดํ„ฐ์…‹์— ๋”ฐ๋ผ์„œย ์–ผ๋งŒํผ ๋ณ€ํ™˜ํ• ์ง€ ๋„คํŠธ์›Œํฌ๊ฐ€ ์Šค์Šค๋กœ ํ•™์Šตํ•˜๋Š”ย AdaLIN(Adaptive Layer-Instance Normalization)์ด๋ผ๋Š” normalization ๊ธฐ๋ฒ• ์ œ์•ˆ

https://arxiv.org/pdf/1907.10830.pdf

StarGAN v2: Diverse Image Synthesis for Multiple Domains (CVPR 2020)

  • ๊ธฐ์กด StarGAN ๋ชจ๋ธ์€ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ๋‹ค์–‘์˜ ๋„๋ฉ”์ธ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ
  • ์–ด๋–คย ๋„๋ฉ”์ธ์˜ย ํ•˜๋‚˜์˜ย ์ด๋ฏธ์ง€๋ฅผย ํƒ€๊ฒŸย ๋„๋ฉ”์ธ์˜ย ์—ฌ๋Ÿฌย ๋‹ค์–‘ํ•œย ์ด๋ฏธ์ง€๋“ค๋กœย ๋ณ€๊ฒฝํ–ˆ๋‹ค๋Š”ย ์ ๊ณผ ๋™์‹œ์—ย ์—ฌ๋Ÿฌย ํƒ€๊ฒŸย ๋„๋ฉ”์ธ์„ย ๋ชฉํ‘œ๋กœย ํ• ย ์ˆ˜ย ์žˆ๊ฒŒย ๋˜์—ˆ๋‹ค๋Š”ย ์ ์ด v2.์—์„œ ์—…๋ฐ์ดํŠธ ๋จ

https://arxiv.org/abs/1912.01865

Editing in Style: Uncovering the Local Semantics of GANs (CVPR 2020)

์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ํ•  ๋•Œ ๊ฐ์ฒด์˜ ํŠน์ • ๋ถ€๋ถ„(Localizaed smentic part)์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ.

Object Detection

DE:TR: End-to-End Object Detection with Transformers[Facebook AI][ECCV 2020]

https://arxiv.org/abs/2005.12872

  • Object Detection์„ direct set prediction์˜ ๋ฌธ์ œ๋กœ ๋ฐ”๋ผ๋ณด๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์‹œ
  • NMS๋‚˜ ์•ต์ปค ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•˜์—ฌ End-to-end ๊ธฐ๋ฐ˜์˜ Object Detection ๋ฐฉ๋ฒ• ์ œ์‹œ(Transformer ์‚ฌ์šฉ)

EfficientDet: Scalable and Efficient Object Detection [Google Brain][CVPR 2020]

https://arxiv.org/abs/1911.09070v4

  • ๊ธฐ์กด EfficientNet์˜ ์ €์ž๋“ค์ด ์†ํ•œ Google BrainํŒ€์—์„œ ์“ด ๋…ผ๋ฌธ์œผ๋กœ EfficientNet์€ Image Classification๋ฌธ์ œ๋ฅผ ํƒ€๊ฒŸ์œผ๋กœ ๋…ผ๋ฌธ์„ ์ž‘์„ฑํ•˜์˜€๋‹ค๋ฉด, Efficient Det์€ - - Object Detection ๋ฌธ์ œ๋ฅผ ํƒ€๊ฒŸ์œผ๋กœ ๋…ผ๋ฌธ์„ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • BiFPN๊ณผ Model Scaling์„ ์ ์šฉํ•˜์—ฌ COCO dataset์—์„œ ๊ฐ€์žฅ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค ๋Œ€๋น„ ๋งค์šฐ ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰(FLOPS)์œผ๋กœ ๋น„์Šทํ•œ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค.

SNIPER: Efficient Multi-scale Training

paper

PaperReview_v2

Generative Pretraining from Pixels

paper

  • ๊ธฐ์กด NLP์—์„œ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋˜ GPT๋ฅผ pixel prediction์— ๋„์ž…
  • ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์—์„œ ๋ฌธ์žฅ์„ ํ•˜๋‚˜์˜ sequenxe๋กœ input์„ ์ฃผ๋“ฏ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ”ฝ์…€์„ flattenํ•˜์—ฌ ํ•˜๋‚˜์˜ sequence๋กœ ๋งŒ๋“  ํ›„ transformer์— input์œผ๋กœ ๋„ฃ๋Š” ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉ
  • SoTA๊นŒ์ง„ ์•„๋‹˜

NMT by Jointly Learning To Align And Translate

paper

  • Attention์„ ์ฒ˜์Œ์œผ๋กœ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ
  • ์–ด๋–ค word์— ์ง‘์ค‘ํ• ์ง€ ์•Œ๋ ค์ฃผ๋Š” ๊ฒƒ์ด alignment(=attention) ์ž„

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

paper

  • ๋‘ ์˜ ์ด๋ฏธ์ง€์—์„œ ํ•œ ์žฅ์„ source, ๋‚˜๋จธ์ง€ ํ•œ ์žฅ์„ target์œผ๋กœ ํ•˜์—ฌ source ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด๋“ค์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์„ ์„ ํƒํ•ด target ์ด๋ฏธ์ง€์— ๋ถ™์—ฌ ๋„ฃ์Œ์œผ๋กœ์จ ์–ด๋ ต๊ณ , ์ƒˆ๋กญ๊ณ  ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ
  • ์ฝ”๋“œ ์ด์‹์„ฑ์ด ์ข‹์•„์„œ ์‰ฝ๊ฒŒ ๋‹ค๋ฅธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๋•Œ data augmentation ์ ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์—ฌ๋Ÿฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ด ๋ณธ ๊ฒฐ๊ณผ object detection, instance segmentation, semantic segmentation, self-supervised learning ์„ฑ๋Šฅ์— ์šฐ์ˆ˜

Human Mesh

3D human mesh recovery

  • SMPL: A Skinned Multi-Person Linear Model, ACM Trans. Graphics (Proc. SIGGRAPH Asia), 2015
  • Keep it {SMPL}: Automatic Estimation of {3D} Human Pose and Shape from a Single Image, ECCV 2016
  • End-to-end Recovery of Human Shape and Pose, CVPR 2018
  • VIBE: Video Inference for Human Body Pose and Shape Estimation, CVPR 2020
  • End-to-End Human Pose and Mesh Reconstruction with Transformers, CVPR 2021

Models and Light-weight models

  • Mask R-CNN, ICCV 2017
  • Focal Loss for Dense Object Detection, ICCV 2017 (RetinaNet)
  • YOLACT: Real-time Instance Segmentation, ICCV2019
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv 2017
  • CONVOLUTIONAL NEURAL NETWORKS WITH LOWRANK REGULARIZATION, ICLR2016

Generative Modeling

Generative Modeling

  • Generative adversarial network, NIPS 2014
  • Auto-Encoding Variational Bayes, arXiv 2014
  • Density estimation using Real NVP, ICLR 2017
  • Neural Ordinary Differential Equations, NeurIPS 2018 (continuous normalizing flow, CNF)
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis, ICLR 2019
  • Denoising diffusion probabilistic models, NeurIPS 2020
  • (Optional) Glow: Generative Flow with Invertible 1x1 Convolutions, NeurIPS 2018
  • (Optional) Score-based Generative Modeling by Diffusion Process, ICLR 2021
  • (Optional) How to Train Your Energy-Based Models, arXiv 2021
  • (Optional) Wasserstein Generative Adversarial Networks, ICML 2017
  • (Optional) f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization, NIPS 2016
  • (Optional) Triple Generative Adversarial Nets, NIPS 2017

Conditional generative modeling

  • Image-to-Image Translation with Conditional Adversarial Nets, CVPR
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017
  • Semantic Image Synthesis with Spatially-Adaptive Normalization, CVPR 2019
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, ICCV 2019
  • (Optional) Vid2Game: Controllable Characters Extracted from Real-World Videos, ICLR 2020

Perceptual metric

  • On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation, arXiv 2021

About

๐Ÿ“‘ computer vision paper review

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published