Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use a custom dataset? #13

Open
QLaHPD opened this issue Oct 3, 2020 · 11 comments
Open

How to use a custom dataset? #13

QLaHPD opened this issue Oct 3, 2020 · 11 comments

Comments

@QLaHPD
Copy link

QLaHPD commented Oct 3, 2020

I've changed the default_config.py to a custom folder with images:
folder/path
|----/image001.jpg
|----/image002.jpg
...

But it returned me
ValueError: num_samples should be a positive integer value, but got num_samples=0

@Justin-Tan
Copy link
Owner

Posting the full stacktrace would help. If you rename the dataset in default_config.py under DatasetPaths you must also create a new dataset with corresponding name which inherits from the BaseDataset class in src/helpers/datasets.py. There are a few examples in that file.

@QLaHPD
Copy link
Author

QLaHPD commented Oct 7, 2020

Actually I think the problem is that the module torch.utils.data is not finding the images in the folder, so it is returning num_samples=0. What is the directory structure of OpenImages dataset?

@Justin-Tan
Copy link
Owner

If you post the stacktrace it would be easier to diagnose. If you look at the parent BaseDataset class you'll notice the dataset directory should contain train/ and test/ subfolders.

@QLaHPD
Copy link
Author

QLaHPD commented Oct 7, 2020

In default_config.py:

class DatasetPaths(object):
    OPENIMAGES = '/mnt/ramdisk/root_folder'
    CITYSCAPES = ''
    JETS = ''

class args(object):
    dataset = Datasets.OPENIMAGES
    dataset_path = DatasetPaths.OPENIMAGES

The structure is:

/mnt/ramdisk/root_folder
|----/train
|--------/image001.png
|----/test
|--------/image001.png
|----/val
|--------/image001.png
Traceback (most recent call last):
  File "train.py", line 322, in <module>
    normalize=args.normalize_input_image)
  File "/home/user/anaconda3/envs/HIFIC/high-fidelity-generative-compression-master/src/helpers/datasets.py", line 75, in get_dataloaders
    pin_memory=pin_memory)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 224, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 96, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

@Justin-Tan
Copy link
Owner

I think the problem was the following line:

self.imgs = glob.glob(os.path.join(data_dir, '*.jpg'))

which would only get JPGs. I pushed a fix to master to account for PNGs as well. Let me know if you still have issues.

@QLaHPD
Copy link
Author

QLaHPD commented Oct 11, 2020

Unfortunately that didn't work, same error. What is the absolute path expected?
I'm not using the original openimages dataset, its a custom dataset using a custom path, but I did not created a new class in datasets.py

I'm using OPENIMAGES = '/mnt/ramdisk/openimages' but the files are custom, all inside subfolders [train, test, val], all files are PNG.

The code files are in another path.

@june1819
Copy link

june1819 commented Dec 5, 2020

I get this error. But I try to make "val" folder for "validation" folder. The error disappears. I get new error : "out of memory" although I try to make "batch_size = 2" and "crop_size = 64". Could you post your default_config.py if you can run train.py.

@QingLicsaggie
Copy link

@QLaHPD The code does not find the dataset. You can print datapath in BaseDataSet, and derived dataset to make sure.

@ahmedfgad
Copy link

I encountered this error and solved it.

Note that this error may exist even if the model is able to find the dataset. Most of the people say there is a problem locating the dataset but this is not always the case.

Like me, I think you are using a small dataset where there is no enough samples for each iteration. Here are more details.

The default patch size is 8. Assume you set the --n_steps parameter to 1e6. This means there are 1 million (1,000,000) iteration where each iteration requires 8 samples. Thus, you should have 8 million samples (8 * 1,000,000). If you have lower number of samples than 8 million, then the following error occurs:

ValueError: num_samples should be a positive integer value, but got num_samples=0

To solve it, you can set a smaller value to the --n_steps parameter. Try 1 for example: --n_steps 1:

python train.py --model_type compression --regime low --n_steps 1

I hope this helps.

@yifeipet
Copy link

yifeipet commented Jan 27, 2022

I suggest you write your own dataloader and prepare a cropped image dataset so you don't need to crop images everytime.

@yifeipet
Copy link

Yes, writing own dataloader solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants