Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

mriganktiwari · 2024-03-10T02:54:09Z

Describe the bug
When I try to replace BitLinear layer into a HF model (say Llama2-7b-chat), the size is same for both though. Shouldn't size after replacing with BitLinear layer be reduced?

mriganktiwari · 2024-03-11T04:00:02Z

Also when I use the HF model with replaced BitLinear layers, the generations isn't working.

The .generate with Llama2 model, completes generation in ~68 seconds
Whereas, doing same after replacing the BitLinear layers, keeps it running for eternity

model_name = "meta-llama/Llama-2-7b-hf" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, token='xxxx')
model = AutoModelForCausalLM.from_pretrained(model_name, token='xxxx')

text = "Tell me about Boxing day significance."
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

replace_linears_in_hf(model)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

matkara · 2024-03-15T12:50:34Z

I had a quick look at this repo. In the current state of the code, it seems the binarized weights are still floats, which would explain your observation. Also it is still doing weights multiplication instead of some add / subtract, therefore not taking advantage of the replacement of the multiplication operator in bitnet1.58.
This being said, performance wise (and potential bugs) apart, the results should be identical to bitnet1.58.
Nice to see such attempts!

github-actions · 2024-05-15T12:46:36Z

Stale issue message

mriganktiwari added the bug Something isn't working label Mar 10, 2024

mriganktiwari assigned kyegomez Mar 10, 2024

github-actions bot added the no-issue-activity label May 15, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

mriganktiwari commented Mar 10, 2024

mriganktiwari commented Mar 11, 2024 •

edited

matkara commented Mar 15, 2024

github-actions bot commented May 15, 2024

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Comments

mriganktiwari commented Mar 10, 2024

mriganktiwari commented Mar 11, 2024 • edited

matkara commented Mar 15, 2024

github-actions bot commented May 15, 2024

mriganktiwari commented Mar 11, 2024 •

edited