New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about 'SD Features' #9
Comments
Hey im trying to implement this by myself but i encounter a problem of understanding the paper. Here's a sentence from appendix: "To extract the intermediate U-Net features, we add a noise equivalent to the 100th timestep noise to the input image and evaluate the corresponding noisy latent using the forward diffusion process. " Does this mean i should simply compute the noisy latent z_t and send it into the unet model? (So the former and latter halves of the sentence seem to be describing the same thing?) Or should i first add noisy to the original image, then encode it with VAE and treat the result as z_0, and finally add noise again to get z_t? |
I think: Firstly, the hidden variable is obtained through VAE encoding, and after a forward process, noise is added to 100 time step (z_t can be obtained at any time through z_0 noise), and then z_100 is denoised in one step to obtain the desired feature. |
Hey ive implemented the code at https://github.com/Darkbblue/diffusion-feature. Though it can run on my server, i havent tested the extracted features on image classification tasks. Maybe you want to give it a try? |
Thanks. I'll take a look |
Hi, did you do the tests? |
No, I didn't do the tests, but I made some modification to fit my project. |
I'm looking forward to seeing you open up the implementation code for the 'SD Features' model mentioned in the paper. Because I think SD features is a research area.
The text was updated successfully, but these errors were encountered: