Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with huge data? #12

Open
aryopg opened this issue Jan 16, 2017 · 10 comments
Open

How to deal with huge data? #12

aryopg opened this issue Jan 16, 2017 · 10 comments

Comments

@aryopg
Copy link

aryopg commented Jan 16, 2017

I'm currently trying to generate a dataset based on your tutorial, but i was wondering how to deal with a huge data(10 gb of images). My laptop can't handle the huge amount of data(because the tutorial told us that we need to store the data into array variable first). Is there anyway to handle this? Thanks

@aditbiswas1
Copy link

you can use the input queues to read from the filesystem instead of loading everything to memory, this is discussed in this tutorial https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/

@aryopg
Copy link
Author

aryopg commented Jan 18, 2017

Thanks for the response @aditbiswas1 . If my current dataset directory structure is like this:
Dataset
->Category A
--->Photos
->Category B
--->Photos
->Category C
--->Photos
->Category D
--->Photos

should i make the csv file containing the path to the datasets first or is there any other way to deal with that kind of problem?

@aditbiswas1
Copy link

hmm, ya using a csv sounds like a reasonably well way to solve the problem 👍

@aryopg
Copy link
Author

aryopg commented Jan 19, 2017

is there any tutorial that you know to create csv out of it? sorry i'm new at this @aditbiswas1

@aryopg
Copy link
Author

aryopg commented Jan 22, 2017

hi @aditbiswas1 i've learn on how to do things from the tutorial you gave, but i was wondering if i wanted to resize my image first, in which step i can do it? this is the tutorial btw https://ischlag.github.io/2016/06/19/tensorflow-input-pipeline-example/

@aditbiswas1
Copy link

oh hi, sorry didnt see the previous notification. hmm, you can create a csv in multiple ways, I would normally try to find a way to create a python list from all the fileobject names and then just dump that into a csv using the inbuilt csv module, you'll most likely want to just write some loops to do this. you can get a list of what files are present in the directory by using os.listdir("directory")

regarding resizing of the images. this can be part of your preprocessing right after you've decoded the image file. an example for this can be found in the docs here
https://www.tensorflow.org/api_docs/python/image/resizing#resize_images

@aditbiswas1
Copy link

since you also wanted to do some stuff like grouping by labels, you probably want to look at something like pandas which is super useful for manipulating datasets. http://pandas.pydata.org/pandas-docs/version/0.13.1/index.html

@aryopg
Copy link
Author

aryopg commented Jan 23, 2017

hi @aditbiswas1 i've managed to enter the training step, but at certain epoch the training is stopped. It said that there is an Out of Range Error because of FIFO batch. Here is the error:

OutOfRangeError (see above for traceback): FIFOQueue '_3_batch/fifo_queue' is closed and has insufficient elements (requested 8, current size 5)
[[Node: batch = QueueDequeueMany[_class=["loc:@batch/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]

I'm following the tutorial you gave, but i uncomment the num_thread line in the tf.train.batch

@aditbiswas1
Copy link

aditbiswas1 commented Jan 23, 2017 via email

@aryopg
Copy link
Author

aryopg commented Jan 23, 2017

yes i've managed to reach 600-ish iterations, and i got the error, and i set the batch size into 8, and i also suspect that the last batch is not dividable by 8, but i've used this line of code : allow_smaller_final_batch=True
but still didn't work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants