Research papers

This repository houses my personal summaries and notes on a variety of academic papers/blogs I have read. These summaries are intended to provide a brief overview of the papers' main points, methodologies, findings, and implications, thereby serving as quick references for myself and anyone interested.

Diffusion Papers

1. Denoising Diffusion Probabilistic Models, Ho et. al.

Introduces a generative modeling using a continuous-time diffusion process, offering an alternative to adversarial and maximum likelihood methods
Produces image samples of quality comparable or superior to leading GANs and VAEs
Provides a theoretical foundation for diffusion models, linking them to other generative techniques

Summary notes Paper explanation video: Yanic Kilcher

Archive link Basic annotated implementation

2. Denoising Diffusion Implicit Models, Song et. al.

Present DDIMS which are implicit probabilistic models and can produce high quality samples 10X to 50X faster (in about 50 steps) in comparison to DDPM
Generalizes DDPMs by using a class of non-Markovian diffusion process that lead to "short" generative Markov chains that can simulate image generation in a small number of steps
The training objective in DDIM is similar to DDPM, one can use any pretrained DDPM model with DDIM or other generative processes that can generative images in least steps

Summary notes Archive link Github repo

3. Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et. al.

Introduces a textual editing method to semantically edit images in pre-trained text-conditioned diffusion models via Prompt-to-Prompt manipulations
Approach allows for editing the image while preserving the original composition of the image and addressing the content of the new prompt.
The key idea is that onr can edit images by injecting the cross-attention maps during the diffusion process, controlling which pixels attend to which tokens of the prompt text during which diffusion steps.

Summary notes Archive link Github repo

4. Null-text Inversion for Editing Real Images using Guided Diffusion Models, Mokady et. al.

Introduces an accurate inversion scheme for real input images, enabling intuitive and versatile text-based image modification without tuning model weights.
It achieving near-perfect reconstruction, while retaining the rich text-guided editing capabilities of the original model
The approach consists of two novel ideas, pivotal inversion (using DDIM inversion trajactory as the anchor noise vector) and null-text optimization (optimizing only the null-text embeddings)

Summary notes Archive link

Paper walkthrough video: Original author Github repo

5. Adding Conditional Control to Text-to-Image Diffusion Models, Lvmin Zhang and Maneesh Agarwala et. al.

Allows additional control for the pre-trained large diffusion models, such as Stable diffusion, by providing the facility of input visual conditions such as edge maps, segment masks, depth masks, etc.
Learns task-specific conditions in an end-to-end way
Training is as fast as fine-tuning a diffusion model, and for small dataset (<50k), it can be trained to produce robust results even on desktop-grade personal GPUs.
Multiple controlnets can be combinded at inference time to have multiple control visual conditions

Summary notes Archive link Github repo

HF usage example Controlnet SD1.5 1.0 and 1.1 ckpts Controlnet SDXL ckpts

6. DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Karras et. al.

An image-and-pose conditioned diffusion method based upon Stable Diffusion to turn fashion photographs into realistic, animated videos
Introduces a pose conditioning approach that greatly improves temporal consistency across frames
Uses an image CLIP and VAE encoder, instead of text encoder, that increases the output fidelity to the conditioning image

Summary notes Archive link Github repo

7. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et. al.

Introduces an enhanced stable diffusion model that surpasses the generating capabilities of previous versions
Uses a larger UNet backbone and introducing novel conditioning schemes in the training stage
Probably, the best public domain open-source text-to-image model at this moment (Aug, 2023)

Summary notes Archive link

Paper walkthrough video: Two minute papers HF usage example

8. ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models, He et. al.

Directly sampling an image with a resolution beyond the training image sizes of pre-trained diffusion models models usually result in severe object repetition issues and unreasonable object structures.
The paper explores the use of pre-trained diffusion models to generate images at resolutions higher than the models were trained on, specifically targeting the generation of images with arbitrary aspect ratios and higher resolution.
Summary notes Archive link

Project page Github repo

9. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models, Gandikota et. al.

It allows for precise control of concepts in the diffusion models.
Summary notes Archive link Project page

Github repo XL sliders (LORA)

10. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs, Shah et. al.

ZipLoRA, seamlessly allows for merging independently trained style and subject LoRAs thus generates any subject in any style using sufficiently powerful diffusion models like SDXL.
It offers a streamlined, cheap, and hyperparameter-free solution for simultaneous subject and style personalization, unlocking a new level of creative controllability for diffusion models.
Summary notes Archive link Project page

11. DemoFusion: Democratising High-Resolution Image Generation With No $$$, Du et. al.

DemoFusion focuses on producing high-resolution images to generate images at 4X, 16X, and even higher resolutions without any fine-tuning or prohibitive memory demands.
Summary notes Archive link

Project page Github repo

Transformers Papers

1. Attention Is All You Need, Vaswani et. al.

Introduces the Transformer model, which relies solely on attention mechanisms for sequence modelling and transduction tasks. It dispenses the recurrence and convolutions networks entirely.
It is a breakthrough paper that has lead to major advances in NLP, CV and multi-modal machine learning

Summary notes Archive link

Paper explanation video: Yanic Kilcher Annotated Implementation

GANs Papers

1. Barbershop: GAN-based Image Compositing using Segmentation Masks

Proposes a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion
introduces a latent space for image blending which is better at preserving detail and encoding spatial information
explains a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask

Summary notes Archive link Github repo

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Summaries		Summaries
.gitignore		.gitignore
READING_LIST.md		READING_LIST.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summaries

Summaries

.gitignore

.gitignore

READING_LIST.md

READING_LIST.md

README.md

README.md

Repository files navigation

Research papers

Diffusion Papers

1. Denoising Diffusion Probabilistic Models, Ho et. al.

2. Denoising Diffusion Implicit Models, Song et. al.

3. Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et. al.

4. Null-text Inversion for Editing Real Images using Guided Diffusion Models, Mokady et. al.

5. Adding Conditional Control to Text-to-Image Diffusion Models, Lvmin Zhang and Maneesh Agarwala et. al.

6. DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Karras et. al.

7. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et. al.

8. ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models, He et. al.

9. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models, Gandikota et. al.

10. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs, Shah et. al.

11. DemoFusion: Democratising High-Resolution Image Generation With No $$$, Du et. al.

Transformers Papers

1. Attention Is All You Need, Vaswani et. al.

GANs Papers

1. Barbershop: GAN-based Image Compositing using Segmentation Masks

About

Releases

Packages

garg-aayush/research-papers

Folders and files

Latest commit

History

Repository files navigation

Research papers

Diffusion Papers

1. Denoising Diffusion Probabilistic Models, Ho et. al.

2. Denoising Diffusion Implicit Models, Song et. al.

3. Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et. al.

4. Null-text Inversion for Editing Real Images using Guided Diffusion Models, Mokady et. al.

5. Adding Conditional Control to Text-to-Image Diffusion Models, Lvmin Zhang and Maneesh Agarwala et. al.

6. DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion, Karras et. al.

7. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et. al.

8. ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models, He et. al.

9. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models, Gandikota et. al.

10. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs, Shah et. al.

11. DemoFusion: Democratising High-Resolution Image Generation With No $$$, Du et. al.

Transformers Papers

1. Attention Is All You Need, Vaswani et. al.

GANs Papers

1. Barbershop: GAN-based Image Compositing using Segmentation Masks

About

Resources

Stars

Watchers

Forks