add binary fully connected operator #53

arashb · 2019-10-07T16:13:50Z

Binary fully connected operator is in essence doing binary matrix matrix multiplication (BGemm). Assume that the input is M × N , the weight is N×K (M is the batch size,N is the number of neurons of the previous layer, K is the number of neurons of the current layer)

arashb · 2019-11-11T14:27:17Z

this boils down to implementing a fast binary matrix-vector multiplication

rameshKrSah · 2021-06-23T14:30:54Z

@lgeiger @arashb @Tombana any update on the implementation for the binary dense layers?

AdamHillier · 2021-06-23T15:20:50Z

@rameshKrSah we haven't made any specific efforts towards implementing a binary dense layer.

I think the most obvious and easiest way we could support binary dense layers would actually not involve adding a new op at all, but instead mapping Larq binary dense layers to an equivalent LCE 1x1 binary convolution. This would be an automated way of doing the penultimate bullet point from this page of our docs.

This wouldn't be particularly fast (though it would hopefully be faster than float dense layers), because our optimised binary convolution kernels operate on 'chunks' of four input pixels at a time, whereas this 'equivalent' convolution here would have only one input pixel. This is, however, something that we could very easily solve once we switch over to using the (currently experimental) indirect bgemm kernels, by adding a micro-kernel that operates on one input pixel at a time.

bywmm · 2023-10-25T16:57:01Z

Hi, there @AdamHillier @Tombana @arashb @lgeiger

This wouldn't be particularly fast (though it would hopefully be faster than float dense layers), because our optimised binary convolution kernels operate on 'chunks' of four input pixels at a time, whereas this 'equivalent' convolution here would have only one input pixel. This is, however, something that we could very easily solve once we switch over to using the (currently experimental) indirect bgemm kernels, by adding a micro-kernel that operates on one input pixel at a time.

It seems that the actual 1x1 binary convolution is 4x slower than its fully optimized version. Is there any guidelines or instructions on how to bridge this gap?

lgeiger added the feature New feature or request label May 26, 2020

Tombana added a commit that referenced this issue Jul 15, 2020

Add binary maxpool 2D op to LCE Micro (#53)

9966e9f

lgeiger mentioned this issue Jun 22, 2021

Model size on disk bigger than what was reported by lq.models.summary() #669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add binary fully connected operator #53

add binary fully connected operator #53

arashb commented Oct 7, 2019 •

edited

arashb commented Nov 11, 2019

rameshKrSah commented Jun 23, 2021

AdamHillier commented Jun 23, 2021

bywmm commented Oct 25, 2023

add binary fully connected operator #53

add binary fully connected operator #53

Comments

arashb commented Oct 7, 2019 • edited

arashb commented Nov 11, 2019

rameshKrSah commented Jun 23, 2021

AdamHillier commented Jun 23, 2021

bywmm commented Oct 25, 2023

arashb commented Oct 7, 2019 •

edited