Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the model with custom dataset #27

Open
Ghaleb-alnakhlani opened this issue May 28, 2022 · 9 comments
Open

Train the model with custom dataset #27

Ghaleb-alnakhlani opened this issue May 28, 2022 · 9 comments

Comments

@Ghaleb-alnakhlani
Copy link

Ghaleb-alnakhlani commented May 28, 2022

Hi,

I really don`t know what exactly I need to change in the model in order to run the training.
It will be very helpful if you can tell me how to prepare the dataset for the model.
I wonder how the semantic label should look like?
Let's say we have 3 classes (pedestrian, cow, sheep) and what the target and label folder looks like.
This is how I prepared the dataset (example)
Input (note mask can be in any color other than red, white for example)

0002
Target


Your help is highly appreciated.

@SushkoVadim
Copy link
Contributor

Hi,

I think the easiest way to understand the folder structure and the label types is to have a look at one of the commonly used datasets. Taking Ade20k as an example, I would recommend you to download its contents from this link.
You can see that images are stored as normal .jpg RGB images.
For the label maps, they are usually stored as grayscale maps of integer values, where each integer at each pixel location corresponds to the class ID.
For such a dataset structure, our dataloader can be found here: https://github.com/boschresearch/OASIS/blob/master/dataloaders/Ade20kDataset.py

The remaining part for you would be to convert your dataset to a similar structure.
As I could imagine, this would imply the following:

  • Copy all the files into image/, label/ folders.
  • Rename all the files in a way that all image-label pairs are named identically
  • Convert all the label maps to the grayscale integer maps, where each value corresponds to label ID (analogously to Ade20k)
  • Implement a new dataloader for your dataset, taking Ade20kDataset.py as an example. Also take a look at this discussion.

Hope it helps!

@Ghaleb-alnakhlani
Copy link
Author

Hi,

@SushkoVadim thank you I have downloaded the Ade20k dataset and I saw the structure it is simple and straight forward.
However I have a few questions if you do not mind. My dataset is a little different in this case here is the current structure

/ dataset
   /sheep_512
   /sheep_mask
   /cow_512
   /cow_mask
   /pedestrian_512
   /pedestrian_mask

Do I have to create train and val for every directory, or can I leave the way it is now.
Which structure is easier the above or the following

/dataset
  /target
    all the images from the 3 classes extracted here
  /label
    all the labels from the 3 classes extracted here

So in this case I will have two folders similar to Ade20K.
When you mentioned grayscale is this considered as grayscale (image below)
And all my images are transparent and they are already resized to 512x512.

@Ghaleb-alnakhlani
Copy link
Author

I am familiar with Pix2PixHD dataset structure.
I structured the folder to one dataroot, and I there are two folders inside train_A train_B.
!python train.py --label_nc 0 --no_instance --name obj --resize_or_crop none --dataroot /path/to/datasets
And in OASIS it is slightly different --label_nc doesnt function the way it does in Pix2PixHD

@SushkoVadim
Copy link
Contributor

Hi,

Both are possible. I think it depends on whether you plan to train one model for the whole dataset with all classes, or whether you want to have a separate model for each of your classes. Since there are no images having both "sheep" and "cow" classes, it makes sense to have separate models for different classes (but this is just an assumption, it totally depends on your experiment design).
Then, the structure is the first one, or the following:

/ dataset_sheep
   /sheep_512
   /sheep_mask
/ dataset_cow
   /cow_512
   /cow_mask

For the masks, I would suggest a simple test:

# --- read label map ---#
label = Image.open("your_label_map.png")
label = TR.functional.to_tensor(label)
print(label)

In your test, you should see the elements 0 and 1 (e.g., not 255). Then, 0 would correspond to background, and 1 for the cow.


Yes, our structure is a bit different from Pix2PixHD, because their repository is for image-to-image translation, while ours is for semantic image synthesis. The closer repository for us is the one of SPADE (https://github.com/NVlabs/SPADE).

@Ghaleb-alnakhlani
Copy link
Author

Ghaleb-alnakhlani commented May 30, 2022

My bad I forgot to mention I want to train one model for the whole dataset. How the structure should look like?
Thank you for the tip on testing the mask, that is very helpful.
I am also a little confused. What is the main difference between image-to-image translation and semantic image synthesis? in your opinion my dataset would fall into which type?

@SushkoVadim
Copy link
Contributor

In your case, if you want to have a single model for all classes, you should organize the first option:

/dataset
  /target
    all the images from the 3 classes extracted here
  /label
    all the labels from the 3 classes extracted here

Don't forget to in all the masks background should be class 0, cow 1, sheep 2, so it has to be consistent across all the masks.

Image-to-image translation translates images to images, e.g. zebras to horses. Semantic image synthesis is a special class of Image-to-image translation that uses masks as input, not images. Your dataset type is ok for both tasks.

@Ghaleb-alnakhlani
Copy link
Author

Ghaleb-alnakhlani commented May 30, 2022

Thank you that was very helpful I think in my case I can name it as semantic image synthesis. I am using label mask as input.
Sorry I did not fully understand the point Don't forget to in all the masks background should be class 0, cow 1, sheep 2, so it has to be consistent across all the masks.
Using the script mentioned above I can confirm that the mask is between 0 background and 1 foreground (no 255)
It would be very helpful if you can elaborate it with an example
And what should I choose for --label_nc ?

@SnowdenLee
Copy link
Collaborator

@Ghaleb-alnakhlani As you mentioned you want to train one model for the whole dataset, so the semantics labels should be consistent across the whole dataset not just for one image. For example, if you define background-0, cow-1, sheep-2. For the single image contains cow, the label is backgound-0, cow-1; and for the image contains sheep, the label should be backgound-0, sheep-2. Only labelling foreground is not enough, specific class label is needed here.

@Ghaleb-alnakhlani
Copy link
Author

Hi @SnowdenLee thank you. Can you please provide an example of how to prepare my data according to what you have mentioned if you know where this is implemented before?
Another thing I noticed is there a way to keep the images unchanged, the input of the real image is (transparent) however the dataloader changes the images to RGB resulting in a black background added to the transparent image? What do I need to change in order to avoid that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants