Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Dataset for Image Classification #180

Open
cicciolado opened this issue Feb 16, 2021 · 0 comments
Open

Load Dataset for Image Classification #180

cicciolado opened this issue Feb 16, 2021 · 0 comments

Comments

@cicciolado
Copy link

In the examples the mnist dataset from keras is used, but it is already loaded as numpy.ndarray. I would like to load my RGB image dataset into a Spark dataframe. In Pyspark there is the method:

spark.read.format("image").option("dropInvalid", True).load(path)

which allows you to load all the images contained in the path into a dataframe. In the Dataframe there is a row for each image, and each row contains the binary format of the corresponding image. You can convert the binary format to RGB matrices with numpy's methods, but how do you save a Tensor in each row, and then give the Dataframe as input to a convolutional network in Keras?

Is there any other way to not provide 3 matrices (RGB) for each image, and just provide a large vector of pixels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant