[BUG] bitlinear fix #42

jayUyang · 2024-03-10T11:46:40Z

beta and gamma sizes to be (1, weight.shape[0], not (weight.shape[0], 1) ???

kyegomez · 2024-03-10T17:10:25Z

Can you elaborate please? Can you go deeper?

Vipiao · 2024-03-14T09:34:35Z

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line
return x * self.gamma * self.beta / self.Q_b
Saying
"
Exception has occurred: RuntimeError
The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise
return x * self.gamma * self.beta / self.Q_b
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward
output = self.dequantize_activations_groupwise(output)
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward
x = self.layer1(x)
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in
outputs = model(inputs) # Forward pass
RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0
"
I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

zouyingcao · 2024-03-26T07:14:40Z

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

zouyingcao · 2024-03-26T07:24:07Z

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

emmm, I see the owner update the new code. (without group quantization)

jayUyang added the bug Something isn't working label Mar 10, 2024

jayUyang assigned kyegomez Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] bitlinear fix #42

[BUG] bitlinear fix #42

jayUyang commented Mar 10, 2024

kyegomez commented Mar 10, 2024

Vipiao commented Mar 14, 2024

zouyingcao commented Mar 26, 2024

zouyingcao commented Mar 26, 2024

[BUG] bitlinear fix #42

[BUG] bitlinear fix #42

Comments

jayUyang commented Mar 10, 2024

kyegomez commented Mar 10, 2024

Vipiao commented Mar 14, 2024

zouyingcao commented Mar 26, 2024

zouyingcao commented Mar 26, 2024