1.58bitnet - is it {-1,0,1}? #1502

rkinas · 2024-04-09T08:59:03Z

Hi, thank you for providing 1.58bit implementation. Nice work! I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training Tips, Code and FAQ". The weights of the models that are currently trained according to this recipe are not numbers in the set {-1, 0, 1} and values in the interval (0,1). Is this the way it should be?

The formula describing the quanztization of weights ("The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits").
Implementation proposal ("The Era of 1-bit LLMs: Training Tips, Code and FAQ").
Weights quantization test.
Model during training.

sdmorrey · 2024-05-15T05:10:27Z

Can't speak for the author but the general idea is ternary is better since what's being built is really a directed acyclical knowledge graph. Ternary can be represented in 1.58 bits since you only need 1 bit for 1 and 0 and an additional sign bit on the rare occasion you have a -1.

rkinas changed the title ~~1.58bitnet - is it {-1,0,1}~~ 1.58bitnet - is it {-1,0,1}? Apr 9, 2024

donglixp assigned shumingma May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.58bitnet - is it {-1,0,1}? #1502

1.58bitnet - is it {-1,0,1}? #1502

rkinas commented Apr 9, 2024 •

edited

sdmorrey commented May 15, 2024

1.58bitnet - is it {-1,0,1}? #1502

1.58bitnet - is it {-1,0,1}? #1502

Comments

rkinas commented Apr 9, 2024 • edited

sdmorrey commented May 15, 2024

rkinas commented Apr 9, 2024 •

edited