Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.58bitnet - is it {-1,0,1}? #1502

Open
rkinas opened this issue Apr 9, 2024 · 1 comment
Open

1.58bitnet - is it {-1,0,1}? #1502

rkinas opened this issue Apr 9, 2024 · 1 comment
Assignees

Comments

@rkinas
Copy link

rkinas commented Apr 9, 2024

Hi, thank you for providing 1.58bit implementation. Nice work! I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training Tips, Code and FAQ". The weights of the models that are currently trained according to this recipe are not numbers in the set {-1, 0, 1} and values in the interval (0,1). Is this the way it should be?

  1. The formula describing the quanztization of weights ("The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits").
  2. Implementation proposal ("The Era of 1-bit LLMs: Training Tips, Code and FAQ").
  3. Weights quantization test.
  4. Model during training.

1 58bitnet

@rkinas rkinas changed the title 1.58bitnet - is it {-1,0,1} 1.58bitnet - is it {-1,0,1}? Apr 9, 2024
@sdmorrey
Copy link

Can't speak for the author but the general idea is ternary is better since what's being built is really a directed acyclical knowledge graph. Ternary can be represented in 1.58 bits since you only need 1 bit for 1 and 0 and an additional sign bit on the rare occasion you have a -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants