You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I presume according to BitNet paper the weight should be -1 or 1. But
importtorchfrombitnetimportBitLinearNew# Create a random tensor of shape (16, 10)x=torch.randn(2, 10, 10)
# Create an instance of the BitLinearNew class with input size 10, output size 20, and 2 groupslayer=BitLinearNew(
10,
20,
)
# Perform a forward pass through the BitLinearNew layer with input xoutput=layer(x)
print(layer.weight.dtype)
print(layer.weight)
I'm pretty sure the reason it shows fp32 numbers in the parameters is because when training the network, you need to use the original floating point values for backprop (1.58 bit quant destroys gradient). Then when you do the forward pass, the weights are re-quantized every training step. I believe when you actually deploy the model, you would simply take the quantized weights and use them.
Hello, I presume according to BitNet paper the weight should be -1 or 1. But
Output
Am I missing something?
The text was updated successfully, but these errors were encountered: