How to use a custom dataset? #13

QLaHPD · 2020-10-03T04:32:57Z

I've changed the default_config.py to a custom folder with images:
folder/path
|----/image001.jpg
|----/image002.jpg
...

But it returned me
ValueError: num_samples should be a positive integer value, but got num_samples=0

The text was updated successfully, but these errors were encountered:

Justin-Tan · 2020-10-07T10:22:42Z

Posting the full stacktrace would help. If you rename the dataset in default_config.py under DatasetPaths you must also create a new dataset with corresponding name which inherits from the BaseDataset class in src/helpers/datasets.py. There are a few examples in that file.

QLaHPD · 2020-10-07T13:37:16Z

Actually I think the problem is that the module torch.utils.data is not finding the images in the folder, so it is returning num_samples=0. What is the directory structure of OpenImages dataset?

Justin-Tan · 2020-10-07T15:36:40Z

If you post the stacktrace it would be easier to diagnose. If you look at the parent BaseDataset class you'll notice the dataset directory should contain train/ and test/ subfolders.

QLaHPD · 2020-10-07T16:21:28Z

In default_config.py:

class DatasetPaths(object):
    OPENIMAGES = '/mnt/ramdisk/root_folder'
    CITYSCAPES = ''
    JETS = ''

class args(object):
    dataset = Datasets.OPENIMAGES
    dataset_path = DatasetPaths.OPENIMAGES

The structure is:

/mnt/ramdisk/root_folder
|----/train
|--------/image001.png
|----/test
|--------/image001.png
|----/val
|--------/image001.png

Traceback (most recent call last):
  File "train.py", line 322, in <module>
    normalize=args.normalize_input_image)
  File "/home/user/anaconda3/envs/HIFIC/high-fidelity-generative-compression-master/src/helpers/datasets.py", line 75, in get_dataloaders
    pin_memory=pin_memory)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 224, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 96, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Justin-Tan · 2020-10-11T14:30:41Z

I think the problem was the following line:

self.imgs = glob.glob(os.path.join(data_dir, '*.jpg'))

which would only get JPGs. I pushed a fix to master to account for PNGs as well. Let me know if you still have issues.

QLaHPD · 2020-10-11T20:42:45Z

Unfortunately that didn't work, same error. What is the absolute path expected?
I'm not using the original openimages dataset, its a custom dataset using a custom path, but I did not created a new class in datasets.py

I'm using OPENIMAGES = '/mnt/ramdisk/openimages' but the files are custom, all inside subfolders [train, test, val], all files are PNG.

The code files are in another path.

june1819 · 2020-12-05T05:07:57Z

I get this error. But I try to make "val" folder for "validation" folder. The error disappears. I get new error : "out of memory" although I try to make "batch_size = 2" and "crop_size = 64". Could you post your default_config.py if you can run train.py.

QingLicsaggie · 2021-01-06T00:10:59Z

@QLaHPD The code does not find the dataset. You can print datapath in BaseDataSet, and derived dataset to make sure.

ahmedfgad · 2021-02-05T23:22:08Z

I encountered this error and solved it.

Note that this error may exist even if the model is able to find the dataset. Most of the people say there is a problem locating the dataset but this is not always the case.

Like me, I think you are using a small dataset where there is no enough samples for each iteration. Here are more details.

The default patch size is 8. Assume you set the --n_steps parameter to 1e6. This means there are 1 million (1,000,000) iteration where each iteration requires 8 samples. Thus, you should have 8 million samples (8 * 1,000,000). If you have lower number of samples than 8 million, then the following error occurs:

ValueError: num_samples should be a positive integer value, but got num_samples=0

To solve it, you can set a smaller value to the --n_steps parameter. Try 1 for example: --n_steps 1:

python train.py --model_type compression --regime low --n_steps 1

I hope this helps.

yifeipet · 2022-01-27T18:40:30Z

I suggest you write your own dataloader and prepare a cropped image dataset so you don't need to crop images everytime.

yifeipet · 2022-02-11T06:13:20Z

Yes, writing own dataloader solve this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use a custom dataset? #13

How to use a custom dataset? #13

QLaHPD commented Oct 3, 2020

Justin-Tan commented Oct 7, 2020

QLaHPD commented Oct 7, 2020

Justin-Tan commented Oct 7, 2020

QLaHPD commented Oct 7, 2020

Justin-Tan commented Oct 11, 2020

QLaHPD commented Oct 11, 2020 •

edited

june1819 commented Dec 5, 2020

QingLicsaggie commented Jan 6, 2021

ahmedfgad commented Feb 5, 2021

yifeipet commented Jan 27, 2022 •

edited

yifeipet commented Feb 11, 2022

How to use a custom dataset? #13

How to use a custom dataset? #13

Comments

QLaHPD commented Oct 3, 2020

Justin-Tan commented Oct 7, 2020

QLaHPD commented Oct 7, 2020

Justin-Tan commented Oct 7, 2020

QLaHPD commented Oct 7, 2020

Justin-Tan commented Oct 11, 2020

QLaHPD commented Oct 11, 2020 • edited

june1819 commented Dec 5, 2020

QingLicsaggie commented Jan 6, 2021

ahmedfgad commented Feb 5, 2021

yifeipet commented Jan 27, 2022 • edited

yifeipet commented Feb 11, 2022

QLaHPD commented Oct 11, 2020 •

edited

yifeipet commented Jan 27, 2022 •

edited