Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNN taking longer time then full precision Network #336

Open
kumarmanas opened this issue Nov 23, 2019 · 6 comments
Open

BNN taking longer time then full precision Network #336

kumarmanas opened this issue Nov 23, 2019 · 6 comments
Labels
question Further information is requested

Comments

@kumarmanas
Copy link

I was trying to compare Larq BNN and full precision ( by making Integer and kernel_quantizer= None). I found that time taken to run the program is more for BNN compare to Full precision. Is it ok?
Since time to train is an important parameter for the efficient network.

@lgeiger
Copy link
Member

lgeiger commented Nov 24, 2019

by making Integer and kernel_quantizer= None

Could you elaborate a bit on what you are doing? If possible, it would be good to post a minimal code sample that reproduces the issue.

I found that time taken to run the program is more for BNN compare to Full precision.

Are you referring to time per epoch or step, or total training time? Could you elaborate on the time difference?

@lgeiger lgeiger added the question Further information is requested label Nov 24, 2019
@kumarmanas
Copy link
Author

kumarmanas commented Nov 25, 2019

For below segment of code in BNN example provided by you, I have made Integer and kernel_quantizer= None instead of ste_sign

larq.layers.QuantDense(512,kernel_quantizer="ste_sign",kernel_constraint="weight_clip"), larq.layers.QuantDense(10,input_quantizer="ste_sign",kernel_quantizer="ste_sign",kernel_constraint="weight_clip",activation="softmax")])

I have used time.clock() and time.time() to measure total training time of running the code and Found BNN time is greater than Full precision. I just put time.clock() in the start and end of the program to get the total time of running the BNN and Full precision program.

Code which I used to test- https://github.com/larq/larq/blob/master/docs/examples/mnist.ipynb

@lgeiger
Copy link
Member

lgeiger commented Nov 26, 2019

I have made Integer and kernel_quantizer= None instead of ste_sign

What do you mean by "Integer" in this context?

I have used time.clock() and time.time() to measure total training time of running the code and Found BNN time is greater than Full precision

What's the time difference?

Larq (and TensorFlow) use fake quantization during training, thus run the calculations in float32 or float16. When using a latent weight based training method this means that during training for kernel and inputs we add additional calculations (i.e. ste_sign) to compute the binarization which may result in slightly slower training times. We are thinking about way to make this significantly faster by implementing a truly binary forward pass, but we currently have no immediate plans for this.

Since time to train is an important parameter for the efficient network.

I agree training time is important, but the main goal is to train networks that can be run efficiently during inference, so an increase in training time is often unavoidable.

@kumarmanas
Copy link
Author

kumarmanas commented Nov 26, 2019

What's the time difference?

For BNN total time taken from the start of the program(starting from dataset load) till model.fit is 184.05 second and for evaluation (model. evaluate) it took 2.41 sec.
For full precision, time is 169.52 second and 0.000094 seconds respectively. No of epoch is 6 and
Code structure-
start_time = time.clock()
tf.keras.datasets.mnist.load_data()
code lines as shown in example of larq
........ .....
model.compile(...)
model.fit(...)
print (time.clock() - start_time, "train seconds") # this time is 184s for BNN and 169s for Full precision
Eval_time=time.clock()
test_loss, test_acc = model.evaluate(....)
print (time.clock() - Eval_time) # this time is 2.41s for BNN and 0.000094s for Full precision

What do you mean by "Integer" in this context?

Sorry for typo it was input_quantizer.

Note- Time can vary slightly but the pattern is always the same (BNN taking longer then Full precision). Both for train and evaluate.

@susuhu
Copy link

susuhu commented Sep 3, 2021

I'm facing the same issue. I tried the simple models below just to see the speed and file size change and put aside the accuracy for the moment.

# full-precision model
simplemodel=models.Sequential()
simplemodel.add(layers.Conv2D(32,(3,3),padding='same',input_shape=(32,32,3)))
simplemodel.add(layers.Flatten())
simplemodel.add(layers.Dense(10,activation='sigmoid'))

# binarized model
kwargs = dict(input_quantizer="ste_sign",
              kernel_quantizer="ste_sign",
              kernel_constraint="weight_clip",
              use_bias=False)

simplemodelbnn=models.Sequential()
simplemodelbnn.add(lq.layers.QuantConv2D(32,3,kernel_quantizer="ste_sign",kernel_constraint="weight_clip", use_bias=False,input_shape=(32,32,3)))
simplemodelbnn.add(layers.Flatten())
simplemodelbnn.add(lq.layers.QuantDense(10, **kwargs,activation='sigmoid'))

I ran both model on the CIFAR 10 datasets normalized to (0,1) and (-1,1), with the same compile and 2 epochs as an example.
Full precision model has Total params: 328,586, and binarized model has Total params: 289k.
But for both training and inference, full precision model ran faster than the binarized model. and the full precision model has smaller file size.

from what @lgeiger said, now I can understand the slower training for binarized model, but why is the inference also slower?

Larq (and TensorFlow) use fake quantization during training, thus run the calculations in float32 or float16. When using a latent weight based training method this means that during training for kernel and inputs we add additional calculations (i.e. ste_sign) to compute the binarization which may result in slightly slower training times. We are thinking about way to make this significantly faster by implementing a truly binary forward pass, but we currently have no immediate plans for this.

the difference is small in absolute value since it's ralatively a small dataset, but I tried several times, binarized model all ran slower. the running time is read from the model.fit and model.evaluate output. both per epoch and per step

@jneeven
Copy link
Contributor

jneeven commented Sep 3, 2021

@susuhu Larq BNN inference is slower than full precision inference, because Tensorflow does not actually support binarized operations. To make it possible to train and evaluate BNNs, larq therefore adds "fake" quantizers before the activations and weights that need to be binarized, mapping them from their original float values to -1.0 or 1.0. Note that even these binary values are floats: again, tensorflow does not support non-float computations. This is also the reason the binary model may not be any smaller than the full precision model in keras: technically the weights are still floats.

The speedup you're looking for can be obtained with the Larq Compute Engine, an inference engine based on Tensorflow Lite that does support binary operations and therefore is much faster than running a "fake" BNN in the python tensorflow library. Hope that clears up some confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants