support for binarized GPU training? #742

drproduck · 2022-01-14T20:25:18Z

drproduck
Jan 14, 2022

Hello, I have been using larq to convert models and I love the inference speed-up on mobile device. This makes me wonder whether training time can be reduced as well? A shorter training time would greatly benefit the research community as ablation study can be done much faster potentially resulting in better architectures specific to binarization (most current BNNs are mirror images of their real counterpart). I am a complete newbie when it comes to CUDA, so I would love to hear your thoughts on this matter. Do public GPU implementations already exist or if not what are the challenges that have been stopping us from doing so?

CNugteren · 2022-01-19T10:54:07Z

CNugteren
Jan 19, 2022
Maintainer

When training a binarized neural network we still need to keep track of floating point gradients, which doesn't allow us to do any extra optimisations compared to training a normal neural network. Thus, training a BNN does happen on the GPU through CUDA, and we can still benefit from FP16 for example, but we can't make it go faster than a regular training run. For the forward pass technically we could implement layers that bitpack input and run more efficiently, but given that the backward pass will still be slow the gains are not as much as in inference.

You can find more details about the floating point gradients here: https://docs.larq.dev/larq/guides/bnn-optimization/.

0 replies

bogdankjastrzebski · 2022-07-25T11:15:29Z

bogdankjastrzebski
Jul 25, 2022

From the original BNN paper:

"We show that during the forward pass (both at runtime and train-time), BNNs drastically reduce memory consumption (size and number of accesses), and replace most arithmetic operations with bit-wise operations, which potentially lead to a substantial increase
in power-efficiency."

Does larq implement binarization/quantization in the forward pass during training? The most straightforward implementation would simply binarize the weights during training, leaving them as floats.

1 reply

lgeiger Jul 25, 2022
Maintainer

The most straightforward implementation would simply binarize the weights during training, leaving them as floats.

This is what we do currently during training and TensorFlow will execute these floating point kernels on GPU when available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for binarized GPU training? #742

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

support for binarized GPU training? #742

drproduck Jan 14, 2022

Replies: 2 comments · 1 reply

CNugteren Jan 19, 2022 Maintainer

bogdankjastrzebski Jul 25, 2022

lgeiger Jul 25, 2022 Maintainer

drproduck
Jan 14, 2022

Replies: 2 comments 1 reply

CNugteren
Jan 19, 2022
Maintainer

bogdankjastrzebski
Jul 25, 2022

lgeiger Jul 25, 2022
Maintainer