Skip to content

AlonzoLeeeooo/awesome-video-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

A Collection of Video Generation Studies

This GitHub repository summarizes papers and resources related to the video generation task.

If you have any suggestions about this repository, please feel free to start a new issue or pull requests.

Recent news of this GitHub repo are listed as follows.

🔥 Click to see more information.
  • [Apr. 26th] Update a new direction: Personalized Video Generation.
  • [Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.

Contents

To-Do Lists

  • Latest Papers
    • Update CVPR 2024 Papers
      • Update PDFs and References of ⚠️ Papers
      • Update Published Versions of References
    • Update AAAI 2024 Papers
      • Update PDFs and References of ⚠️ Papers
      • Update Published Versions of References
    • Update ICLR 2024 Papers
    • Update NeurIPS 2023 Papers
  • Previously Published Papers
    • Update Previous CVPR papers
    • Update Previous ICCV papers
    • Update Previous ECCV papers
    • Update Previous NeurIPS papers
    • Update Previous ICLR papers
    • Update Previous AAAI papers
    • Update Previous ACM MM papers
  • Regular Maintenance of Preprint arXiv Papers and Missed Papers

<🎯Back to Top>

Products

Name Organization Year Research Paper Website Specialties
Sora OpenAI 2024 link link -
Lumiere Google 2024 link link -
VideoPoet Google 2023 - link -
W.A.I.T Google 2023 link link -
Gen-2 Runaway 2023 - link -
Gen-1 Runaway 2023 - link -
Animate Anyone Alibaba 2023 link link -
Outfit Anyone Alibaba 2023 - link -
Stable Video StabilityAI 2023 link link -
Pixeling HiDream.ai 2023 - link -
DomoAI DomoAI 2023 - link -
Emu Meta 2023 link link -
Genmo Genmo 2023 - link -
NeverEnds NeverEnds 2023 - link -
Moonvalley Moonvalley 2023 - link -
Morph Studio Morph 2023 - link -
Pika Pika 2023 - link -
PixelDance ByteDance 2023 link link -

<🎯Back to Top>

Papers

Survey Papers

  • Year 2024
  • arXiv
    • Video Diffusion Models: A Survey [Paper]
  • Year 2023
  • arXiv
    • A Survey on Video Diffusion Models [Paper]

Text-to-Video Generation

  • Year 2024
    • CVPR
      • Vlogger: Make Your Dream A Vlog [Paper] [Code]
      • Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
      • VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
      • GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
      • SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
      • MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
      • Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
      • PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
      • EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
      • A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
      • BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
      • Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
      • Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
      • MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
      • ⚠️ Simple but Effective Text-to-Video Generation with Grid Diffusion Models Paper
      • ⚠️ Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation Paper
      • ⚠️ DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation Paper
    • ICLR
      • VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
      • VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
    • AAAI
      • Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
      • E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
      • ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
      • F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
    • arXiv
    • Others
      • Sora: Video Generation Models as World Simulators [Paper]
  • Year 2023
  • Year 2022
  • Year 2021

Image-to-Video Generation

  • Year 2024

  • Year 2023

    • CVPR
      • Conditional Image-to-Video Generation with Latent Flow Diffusion Models [Paper] [Code]
    • arXiv
  • Year 2022

    • CVPR
      • Make It Move: Controllable Image-to-Video Generation with Text Descriptions [Paper] [Code]
  • Year 2021

    • ICCV
      • Click to Move: Controlling Video Generation with Sparse Motion [Paper] [Code]

<🎯Back to Top>

Audio-to-Video Generation

  • Year 2024
    • AAAI
      • Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation [Paper] [Code]
  • Year 2023
    • CVPR
      • MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation [Paper] [Code]

<🎯Back to Top>

Personalized Video Generation

  • Year 2024
  • Year 2023
    • arXiv
      • FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention [Paper] [Code] [Demo]

<🎯Back to Top>

Video Editing

  • Year 2024
  • Year 2023
    • arXiv
      • Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer [Paper]

<🎯Back to Top>

Datasets

  • [arXiv 2012] UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild [Paper] [Dataset]
  • [ICCV 2019] FaceForensics++: Learning to Detect Manipulated Facial Images [Paper] [Code]
  • [NeurIPS 2019] TaiChi-HD: First Order Motion Model for Image Animation [Paper] [Dataset]
  • [ECCV 2020] SkyTimeLapse: DTVNet: Dynamic Time-lapse Video Generation via Single Still Image [Paper] [Code]
  • [ICCV 2021] WebVid-10M: Frozen in Time: ️A Joint Video and Image Encoder for End to End Retrieval [Paper] [Dataset] [Code] [Project]
  • [ICCV 2021] WebVid-10M: Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [Paper] [Dataset] [Project]
  • [ECCV 2022] ROS: Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining [Paper] [Code] [Dataset]
  • [arXiv 2023] HD-VG-130M: VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-video Generation [Paper] [Dataset]
  • [ICLR 2024] InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation [Paper] [Dataset]
  • [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers [Paper] [Dataset] [Project]
  • [arXiv 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models [Paper] [Dataset]

<🎯Back to Top>

Q&A

  • Q: The conference sequence of this paper list?
    • This paper list is organized according to the following sequence:
      • CVPR
      • ICCV
      • ECCV
      • NeurIPS
      • ICLR
      • AAAI
      • ACM MM
      • SIGGRAPH
      • arXiv
      • Others
  • Q: What does Others refers to?
    • Some of the following studies (e.g., Sora) does not publish their technical report on arXiv. Instead, they tend to write a blog in their official websites. The Others category refers to such kind of studies.

<🎯Back to Top>

References

The reference.bib file summarizes bibtex references of up-to-date image inpainting papers, widely used datasets, and toolkits. Based on the original references, I have made the following modifications to make their results look nice in the LaTeX manuscripts:

  • Refereces are normally constructed in the form of author-etal-year-nickname. Particularly, references of datasets and toolkits are directly constructed as nickname, e.g., imagenet.
  • In each reference, all names of conferences/journals are converted into abbreviations, e.g., Computer Vision and Pattern Recognition -> CVPR.
  • The url, doi, publisher, organization, editor, series in all references are removed.
  • The pages of all references are added if they are missing.
  • All paper names are in title case. Besides, I have added an additional {} to make sure that the title case would also work well in some particular templates.

If you have other demands of reference formats, you may refer to the original references of papers by searching their names in DBLP or Google Scholar.

<🎯Back to Top>

Star History

Star History Chart

<🎯Back to Top>