Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple images #9

Open
cpnovaes opened this issue Jan 1, 2024 · 4 comments
Open

Multiple images #9

cpnovaes opened this issue Jan 1, 2024 · 4 comments

Comments

@cpnovaes
Copy link

cpnovaes commented Jan 1, 2024

Hi! Thank you so much for providing the code. The video course is amazing, really helpful.

I have a question: is it possible to modify the code so that we can use more than one image as input and condition (target) data? In other words, can we do, e.g., 2images-to-3images, taking 2 images to predict 3 images.

Thanks again!

@mikonvergence
Copy link
Owner

Hi, @cpnovaes, thank you!

Indeed, that would require some changes to the code examples, but the current conditional model allows to specify the condition_channels at initialisation (see https://github.com/mikonvergence/DiffusionFastForward/blob/master/src/PixelDiffusion.py):

class PixelDiffusionConditional(PixelDiffusion):
    def __init__(self,
                 train_dataset,
                 valid_dataset=None,
                 condition_channels=3, # <- here
                 batch_size=1,
                 lr=1e-3):

This means that you can potentially set the conditional_channels parameter to 6 for 2 RGB images and reuse the same framework (when you pass through the network you need to concatenate along the channel dimension torch.cat([condition_1, condition_2],1)).

I am not entirely sure if this is what you're looking for, so let me know (and ideally provide some data examples) if this needs further discussion! Thanks again

@cpnovaes
Copy link
Author

cpnovaes commented Jan 3, 2024

Hi @mikonvergence, thanks a lot for answering my question!

I have been trying to modify the code following your suggestion, which is what I was looking for.
My case is the following: I give 2 images as input and 1 image as output (what I want to predict at the end; this one is somehow related to the 2 input images). These images are not RGB, but simple (128,128) matrices (.npy file). This is an example:

Figure 2024-01-03 093713

I figured that, in the case of input and output having a different number of channels, I need to do the following:

class PixelDiffusionConditional(PixelDiffusion):
    def __init__(self,
                 train_dataset,
                 valid_dataset=None,
                 condition_channels=3, # <- here
                 generated_channels=3, # <- also here!
                 batch_size=1,
                 lr=1e-3):

and use: condition_channels=2 and generated_channels=1. Modifying Class SimpleImageDataset(Dataset) accordingly, my data will have the shape: train_ds[0][0].shape = torch.Size([2, 64, 64] and train_ds[0][1].shape) = torch.Size([1, 64, 64]).

Please, let me know if that make sense or if I may be missing something.

In the case of Conditional Latent Diffusion, I am still trying to implement a similar idea, but I am having problems making the autoencoder accept a different number of channels. In principle, I could follow the same idea, right?

Thanks!

@mikonvergence
Copy link
Owner

Hi @cpnovaes! That's exactly the right approach with the PixelDiffusion type.

For the latent diffusion, that will be tricky, because:

  • The Autoencoder has been trained on natural images with losses that promote aesthetic quality of images, so might not be ideal for compressing other types of signals
  • As you said, it is designed to work with 3 channels (RGB) - you could potentially encode each signal (2 conditions and 1 generated, all with single channel as I understand) by feeding each as a 'greyscale' image to the encoder (assuming that your values are bounded and can be mapped to [-1,+1] range)

However, if your signals are only 64 by 64, there could be less need for a latent diffusion approach. If you later plan to work with larger matrices, then I would suggest to finetune your own autoencoder, but that's outside of the scope of this course. I am always happy to provide hints here though, so feel free to continue this thread.

@cpnovaes
Copy link
Author

Hi @mikonvergence !

Thank you so much, I have learned a lot from all your comments!

My signals are 128x128, but I am testing the PixelDiffusion on them. In the meantime, I am also studying a possible implementation of the autoenconder.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants