Skip to content

Read and review various papers in the field of Vision and Vision-Language.

Notifications You must be signed in to change notification settings

sonstory/Paper-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Paper-Review

고려대학교 산업경영공학과 DSBA 연구실(지도교수 : 강필성 교수님) 석사과정 손준영(junyeong_son@korea.ac.kr)

  • 인공지능 내 관심 분야 논문을 읽고, 정리했습니다.
  • 리뷰 내용에 관해 수정해야하거나, 궁금한 부분 있으시다면 이메일을 통해 연락바랍니다.
  • 깃허브 링크의 경우 오피셜 코드가 아닐 수 있습니다.
  • [YOUTUBE] 링크에는 DSBA 연구실 유튜브에서 제가 리뷰한 영상을 포함했습니다.

Reserach Interest

  • Vision-Language Pretrained Model
  • Lightweight Image Captioning Model
  • Parameter-Efficient Vision-Language Model Fine-Tuning(Adapter)

Vision-Language Pretrained Model

  1. Align before Fuse: Vision and Language Representation Learning with Moementum Distillation(NeurIPS 2021 Spotlight) [PAPER] [GITHUB] [REVIEW]
  2. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision(ICLR 2022) [PAPER]
  3. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework(ICML 2022) [PAPER] [GITHUB]
  4. Flamingo: a Visual Language Model for Few-Shot Learning(NeurIPS 2022) [PAPER] [GITHUB]
  5. CoCa: Contrastive Captioners are Image-Text Foundation Models(2022) [PAPER] [REVIEW] [YOUTUBE]
  6. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections(EMNLP 2022) [PAPER] [GITHUB]
  7. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation(2022) [PAPER] [GITHUB] [REVEIW]
  8. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models(2023) [PAPER] [GITHUB]
  9. Flamingo: a Visual Language Model for Few-Shot Learning [PAPER] [REVIEW] [GITHUB]

Lightweight Image Captioning Model

  1. SmallCAP: Lightweight Image Captioning Prompted with Retrieval Augmentation(CVPR 2023) [PAPER] [REVIEW] [GITHUB]
  2. EVCAP: Retreival-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension(CVPR 2024) [PAPER] [REVIEW] [GITHUB]

Parameter-Efficient Fine-Tuning(Adapter)

  1. VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [PAPER] [REVIEW]

Survey

  1. Deep Industrial Image Anomaly Detection: A Survey(2023) [PAPER] [GITHUB]
  2. Self-Supervised Anomaly Detection: A Survey and Outlook(2022) [PAPER] [REVIEW]
  3. Vision-Language Models for Vision Tasks: A Survey(2023) [PAPER]
  4. A survey of efficient fine-tuning methods for Vision-Language Models — Prompt and Adapter [PAPER]

About

Read and review various papers in the field of Vision and Vision-Language.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published