1.58bit algorithm implement recommend #46

princepride · 2024-03-15T07:57:36Z

Line 52 in 914bad9

def ste(self, x):

I noticed that you're attempting to implement 1.58-bit quantization, but it seems you only quantize the values during the forward pass, then proceed with the model inference, using the original values for the backward pass. In 4-bit quantization, we store two quantized values in one byte for representation, and the computation and gradients of the new data type are implemented with CUDA. You should consider this approach as well. Keep it up, I'm rooting for you.

AwaitFuture · 2024-03-22T07:00:34Z

I think what youre saying is that youre saying is that they should use the 1.58 bit quantization for the backward pass? Its not really talked about in the paper for whatever reason, but the 1.58bit quantization destroys the gradient, making backprop with it impossible, so they keep the original weights to do backprop, while using the quantized weights for forward pass.

github-actions · 2024-05-21T12:46:45Z

Stale issue message

github-actions bot added the no-issue-activity label May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.58bit algorithm implement recommend #46

1.58bit algorithm implement recommend #46

princepride commented Mar 15, 2024

AwaitFuture commented Mar 22, 2024

github-actions bot commented May 21, 2024

1.58bit algorithm implement recommend #46

1.58bit algorithm implement recommend #46

Comments

princepride commented Mar 15, 2024

AwaitFuture commented Mar 22, 2024

github-actions bot commented May 21, 2024