Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Closed
mriganktiwari opened this issue Mar 10, 2024 · 3 comments
Assignees
Labels
bug Something isn't working no-issue-activity

Comments

@mriganktiwari
Copy link

Describe the bug
When I try to replace BitLinear layer into a HF model (say Llama2-7b-chat), the size is same for both though. Shouldn't size after replacing with BitLinear layer be reduced?

@mriganktiwari mriganktiwari added the bug Something isn't working label Mar 10, 2024
@mriganktiwari
Copy link
Author

mriganktiwari commented Mar 11, 2024

Also when I use the HF model with replaced BitLinear layers, the generations isn't working.

  • The .generate with Llama2 model, completes generation in ~68 seconds
  • Whereas, doing same after replacing the BitLinear layers, keeps it running for eternity
model_name = "meta-llama/Llama-2-7b-hf" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, token='xxxx')
model = AutoModelForCausalLM.from_pretrained(model_name, token='xxxx')

text = "Tell me about Boxing day significance."
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")
replace_linears_in_hf(model)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

@matkara
Copy link

matkara commented Mar 15, 2024

I had a quick look at this repo. In the current state of the code, it seems the binarized weights are still floats, which would explain your observation. Also it is still doing weights multiplication instead of some add / subtract, therefore not taking advantage of the replacement of the multiplication operator in bitnet1.58.
This being said, performance wise (and potential bugs) apart, the results should be identical to bitnet1.58.
Nice to see such attempts!

Copy link

Stale issue message

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-issue-activity
Projects
None yet
Development

No branches or pull requests

3 participants