Skip to content

Contains the summaries and notes on a variety of DL papers/blogs

Notifications You must be signed in to change notification settings

garg-aayush/research-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Research papers

This repository houses my personal summaries and notes on a variety of academic papers/blogs I have read. These summaries are intended to provide a brief overview of the papers' main points, methodologies, findings, and implications, thereby serving as quick references for myself and anyone interested.

Diffusion Papers

1. Denoising Diffusion Probabilistic Models, Ho et. al.

2. Denoising Diffusion Implicit Models, Song et. al.

  • Present DDIMS which are implicit probabilistic models and can produce high quality samples 10X to 50X faster (in about 50 steps) in comparison to DDPM
  • Generalizes DDPMs by using a class of non-Markovian diffusion process that lead to "short" generative Markov chains that can simulate image generation in a small number of steps
  • The training objective in DDIM is similar to DDPM, one can use any pretrained DDPM model with DDIM or other generative processes that can generative images in least steps
    Summary notes Archive link Github repo

3. Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et. al.

  • Introduces a textual editing method to semantically edit images in pre-trained text-conditioned diffusion models via Prompt-to-Prompt manipulations
  • Approach allows for editing the image while preserving the original composition of the image and addressing the content of the new prompt.
  • The key idea is that onr can edit images by injecting the cross-attention maps during the diffusion process, controlling which pixels attend to which tokens of the prompt text during which diffusion steps.
    Summary notes Archive link Github repo

4. Null-text Inversion for Editing Real Images using Guided Diffusion Models, Mokady et. al.

  • Introduces an accurate inversion scheme for real input images, enabling intuitive and versatile text-based image modification without tuning model weights.
  • It achieving near-perfect reconstruction, while retaining the rich text-guided editing capabilities of the original model
  • The approach consists of two novel ideas, pivotal inversion (using DDIM inversion trajactory as the anchor noise vector) and null-text optimization (optimizing only the null-text embeddings)
    Summary notes Archive link
    Paper walkthrough video: Original author Github repo

5. Adding Conditional Control to Text-to-Image Diffusion Models, Lvmin Zhang and Maneesh Agarwala et. al.

  • Allows additional control for the pre-trained large diffusion models, such as Stable diffusion, by providing the facility of input visual conditions such as edge maps, segment masks, depth masks, etc.
  • Learns task-specific conditions in an end-to-end way
  • Training is as fast as fine-tuning a diffusion model, and for small dataset (<50k), it can be trained to produce robust results even on desktop-grade personal GPUs.
  • Multiple controlnets can be combinded at inference time to have multiple control visual conditions
    Summary notes Archive link Github repo
    HF usage example Controlnet SD1.5 1.0 and 1.1 ckpts Controlnet SDXL ckpts

6. DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Karras et. al.

  • An image-and-pose conditioned diffusion method based upon Stable Diffusion to turn fashion photographs into realistic, animated videos
  • Introduces a pose conditioning approach that greatly improves temporal consistency across frames
  • Uses an image CLIP and VAE encoder, instead of text encoder, that increases the output fidelity to the conditioning image
    Summary notes Archive link Github repo

7. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et. al.

8. ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models, He et. al.

  • Directly sampling an image with a resolution beyond the training image sizes of pre-trained diffusion models models usually result in severe object repetition issues and unreasonable object structures.
  • The paper explores the use of pre-trained diffusion models to generate images at resolutions higher than the models were trained on, specifically targeting the generation of images with arbitrary aspect ratios and higher resolution.
  • Summary notes Archive link
    Project page Github repo

9. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models, Gandikota et. al.

10. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs, Shah et. al.

  • ZipLoRA, seamlessly allows for merging independently trained style and subject LoRAs thus generates any subject in any style using sufficiently powerful diffusion models like SDXL.
  • It offers a streamlined, cheap, and hyperparameter-free solution for simultaneous subject and style personalization, unlocking a new level of creative controllability for diffusion models.
  • Summary notes Archive link Project page

11. DemoFusion: Democratising High-Resolution Image Generation With No $$$, Du et. al.

Transformers Papers

1. Attention Is All You Need, Vaswani et. al.

GANs Papers

1. Barbershop: GAN-based Image Compositing using Segmentation Masks

  • Proposes a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion
  • introduces a latent space for image blending which is better at preserving detail and encoding spatial information
  • explains a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask
    Summary notes Archive link Github repo

Page Views Count

About

Contains the summaries and notes on a variety of DL papers/blogs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published