Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal batch size for AVX2 and/or OpenCL? #122

Open
gwiesenekker opened this issue Sep 3, 2023 · 1 comment
Open

Optimal batch size for AVX2 and/or OpenCL? #122

gwiesenekker opened this issue Sep 3, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@gwiesenekker
Copy link

I have enabled AVX2 and OpenCL. Is there a recommended optimal batch size when using AVX2 and/or OpenCL?

@joaopauloschuler
Copy link
Owner

Regarding AVX/AVX2/AVX512, if your hardware has it, you should enable. This API is smart enough to use and to not to use it. The decision is based on csMinAvxSize constant. This is an extract from the API showing it deciding to use AVX or not:

procedure TNNetVolume.Sub(Original: TNNetVolume);
var
  I: integer;
  vHigh: integer;
begin
  if FSize >= csMinAvxSize
    then AVXSub(FDataPtr, Original.FDataPtr, FSize)
    else
    begin
      vHigh := High(FData);
      for I := 0 to vHigh do
        FData[I] -= Original.FData[I];
    end;
end;

If you are renting environment to run models, you should rent only hardware with AVX.

Regarding batch size: this is an ultra interesting question. In this API, the batch size doesn't affect AVX efficiency as each sample is processed separately (in contrary to plenty other frameworks). The same is valid for OpenCL.

For small NN models (non convolutional), I would recommend a batch size with at least 32 or 64 samples. For non convolutional models with 1024 inputs, I would recommend a batch size around 256 to 512 or even higher.

For convolutional NNs, I would also get batch sizes around 32 or 64. If you have a lot of cores, you should consider using at least 4 samples in the batch per thread. Example: if you have 64 threads in a 64 CPU cores machine, I would consider 256 as batch size although the large batch size will slow down convergence along the first epochs.

In short: starting with 32 or 64 as batch size should work well for most problems. If your model is overfitting or if you have plenty of cores, try a larger batch size with maybe a smaller learning rate. Larger batch sizes will make the threading overhead smaller for all environments: plain CPU, AVX and OpenCL on both windows and linux.

@joaopauloschuler joaopauloschuler self-assigned this Sep 4, 2023
@joaopauloschuler joaopauloschuler added the question Further information is requested label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants