Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batchsize=Inf or something? #144

Open
mcabbott opened this issue Feb 11, 2023 · 4 comments · May be fixed by #145
Open

batchsize=Inf or something? #144

mcabbott opened this issue Feb 11, 2023 · 4 comments · May be fixed by #145

Comments

@mcabbott
Copy link
Contributor

It would be nice if you could DataLoader for one maximal-size batch, without knowing the size of the inputs.

This would mean that a function which loads some data, pre-processes it, and then returns a DataLoader could easily be used to return the full dataset, in the identical format, as long as it passes the keyword batchsize along.

Could be batchsize=0, since -1 already does something special. Although unfortunately 0 is not an error right now.

@lorenzoh
Copy link
Contributor

What is the use-case for this over doing something like DataLoader(data; batchsize=numobs(data))? Is it that you don't want to get a DataLoader returned but rather a BatchView(mapobs(f, data); batchsize=numobs(data))?

@mcabbott
Copy link
Contributor Author

The use is functions like this, which load data & make two DataLoaders with the specified batch size:

https://github.com/FluxML/model-zoo/blob/52420da6fcadf30ae2e190fc77669fe1d255ff10/vision/conv_mnist/conv_mnist.jl#L71-L84

@lorenzoh
Copy link
Contributor

Ah I see! That makes sense when creating multiple DataLoaders 👍

@mcabbott
Copy link
Contributor Author

You could almost use typemax(Int) for this purpose, apart from this warning:

julia> DataLoader([1 2 3; 4 5 6]; batchsize=99, partial=false) |> collect
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
1-element Vector{Matrix{Int64}}:
 [1 2 3; 4 5 6]

@mcabbott mcabbott linked a pull request Feb 11, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants