You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for providing 1.58bit implementation. Nice work! I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training Tips, Code and FAQ". The weights of the models that are currently trained according to this recipe are not numbers in the set {-1, 0, 1} and values in the interval (0,1). Is this the way it should be?
The formula describing the quanztization of weights ("The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits").
Implementation proposal ("The Era of 1-bit LLMs: Training Tips, Code and FAQ").
Weights quantization test.
Model during training.
The text was updated successfully, but these errors were encountered:
rkinas
changed the title
1.58bitnet - is it {-1,0,1}
1.58bitnet - is it {-1,0,1}?
Apr 9, 2024
Can't speak for the author but the general idea is ternary is better since what's being built is really a directed acyclical knowledge graph. Ternary can be represented in 1.58 bits since you only need 1 bit for 1 and 0 and an additional sign bit on the rare occasion you have a -1.
Hi, thank you for providing 1.58bit implementation. Nice work! I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training Tips, Code and FAQ". The weights of the models that are currently trained according to this recipe are not numbers in the set {-1, 0, 1} and values in the interval (0,1). Is this the way it should be?
The text was updated successfully, but these errors were encountered: