Skip to content

pj4533/Pokora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pokora

License Platform

Pokora icon

Pokora is a video creation platform that combines existing video clips with AI generated video clips, using Stable Diffusion, in a native SwiftUI interface, completely local with no internet access necessary. Pokora uses the frames of an input movie to run image to image processing with a Stable Diffusion model. Check out ml-stable-diffusion for the latest CoreML model changes and how to convert models.

🧙‍♂️ Pokora is named after Hans Pokora, author of many books on collectable psychedelic vinyl and the kickstarter of the legend of my favorite psych band Paternoster.

Features

  • Load video from disk ✅
  • Process frames using Stable Diffusion (prompt, seed, strength) ✅
  • Export video including audio from original video ✅
  • Need icon ✅
  • Need easier install of models ✅
  • Playback video in app ✅
  • Interpolate strength during an effect ✅
  • Persist between launches ✅
  • Add up rezzing using RealESRGAN ✅
  • Show preview while processing ✅
  • Update to use ControlNET ✅
  • Audio Reactivity ✅

Getting Started

  • Any video size will work, but the OUTPUT video will be square (for now)
  • Any video length will work
  • Is a document-based app, so choose 'New Document' when you first start
  • If you Save the project before rendering, you can restart if you get an error during long renders
  • Once you have added some effects, hit 'Render'
  • Render will first extract all the frames from your source video (this can take a while)
  • Once extracted, the rendering of your applied effects will start, this can take hours
  • Tapping export creates a new movie with rendered frames from your effects, original frames where there were no effects, and the audio track from the original movie
  • Save, share and enjoy!

Effect Types

Direct

Applies image to image to the underlying frame using strength settings (ramp or constant).

Generative

Applies image to image to the previously processed frame, can rotate/zoom (also can go in reverse). Useful for fading out and back into video.

Audio Reactive

Set an amplitude threshold, for any frame the average amplitude is calculated, if the frames amplitude is greater than the threshold the strength of the image to image is set to 0.9. An envelope then starts, if all the following frames are below the threshold, the strength ramps linearly back down to 0 after 15 frames. If the threshold is crossed again, the strength goes back to 0.9 and the envelope starts again. Each time the strength goes from 0 to 0.9 a new seed is used.

Limitations

Requirements

Built using below, but haven't tested elsewhere yet.

  • macOS 13.3.1+
  • Xcode 14.3+

Models

You will need to convert or download models in CoreML format. You can download from the HuggingFace org here.

NOTE: I had trouble with the v2.1 model, I think it doesn't like the 768x768. I verified this model works here, however I have had better speeds with a model I converted myself.

ControlNet

ControlNet support is very basic for now. Inside your model directory place a directory called 'controlnet', and in that directory put a single model you would like to use for ControlNet input (such as Depth). Pokora will see this, and for each frame processed will first send the frame thru the ControlNet model. For now, you can only use one ControlNet model, and it is either on or off (based on whether your controlNet model was found or not). On my M2 MacStudio Ultra I was getting about 7.1 iter/s without controlNet and using controlNet I get about 4.5 iter/s, so it is quite slower.

This is a good resource for models to use, both for SD and ControlNet here.

License

This project is licensed under the MIT License - see the LICENSE file for details.