Replies: 2 comments 1 reply
-
When training a binarized neural network we still need to keep track of floating point gradients, which doesn't allow us to do any extra optimisations compared to training a normal neural network. Thus, training a BNN does happen on the GPU through CUDA, and we can still benefit from FP16 for example, but we can't make it go faster than a regular training run. For the forward pass technically we could implement layers that bitpack input and run more efficiently, but given that the backward pass will still be slow the gains are not as much as in inference. You can find more details about the floating point gradients here: https://docs.larq.dev/larq/guides/bnn-optimization/. |
Beta Was this translation helpful? Give feedback.
-
From the original BNN paper: "We show that during the forward pass (both at runtime and train-time), BNNs drastically reduce memory consumption (size and number of accesses), and replace most arithmetic operations with bit-wise operations, which potentially lead to a substantial increase Does larq implement binarization/quantization in the forward pass during training? The most straightforward implementation would simply binarize the weights during training, leaving them as floats. |
Beta Was this translation helpful? Give feedback.
-
Hello, I have been using larq to convert models and I love the inference speed-up on mobile device. This makes me wonder whether training time can be reduced as well? A shorter training time would greatly benefit the research community as ablation study can be done much faster potentially resulting in better architectures specific to binarization (most current BNNs are mirror images of their real counterpart). I am a complete newbie when it comes to CUDA, so I would love to hear your thoughts on this matter. Do public GPU implementations already exist or if not what are the challenges that have been stopping us from doing so?
Beta Was this translation helpful? Give feedback.
All reactions