Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: TAE for SVD #14

Open
Ednaordinary opened this issue Apr 7, 2024 · 2 comments
Open

FR: TAE for SVD #14

Ednaordinary opened this issue Apr 7, 2024 · 2 comments

Comments

@Ednaordinary
Copy link

SVD can now get to really fast speeds step wise but is limited by the slow speed of the vae. Any way to distill the temporal spatial auto encoder the same way as the regular auto encoder?

@madebyollin
Copy link
Owner

madebyollin commented Apr 8, 2024

Definitely possible! I worry there's a fairly narrow band of usefulness for a TAESVD, though, since for cheap previews you can run TAESD per-frame and for max quality you should just run the SVD VAE.

(I started training a TAESVD with temporal layers a few months ago - top is GT, middle is per-frame TAESD, and bottom is TAESD with temporal layers - but I haven't gotten around to finishing it)

@Ednaordinary
Copy link
Author

Ednaordinary commented Apr 8, 2024

Since each frame after the first should only be a difference from the previous frame (not an entirely new frame), is it viable to first decode the first frame with the original SVD vae (should only take about the same memory/speed as undistilled regular SD vae in this context) then decode the rest on a vae that is specifically trained on the difference between the current and previous frame in the latents, outputting a difference map to be applied to the last decoded frame? Kinda like a video codec. Moves away from the simplicity of decoding normally though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants