Quantizing and running inference on bloom-176B required some changes #21

barsuna · 2023-04-02T15:08:41Z

Most issues are due to fact that embedding layer 250880x14336 is too large to fit into signed integer
Above affects the main, quantize, and also ggml code
2nd issue is that main seems to estimate amount of necessary memory on the low side
Above is not fixed, i have just added 5GB for weights and doubled the size of context used for model evaluation Being very far away from proficiency in C++, these changes need to be civilized by someone experienced with ggml and c++

- Most issues are due to fact that embedding layer 250880x14336 is too large to fit into signed integer - Above affects the main, quantize, and also ggml code - 2nd issue is that main seems to estimate amount of necessary memory on the low side - Above is not fixed, i have just added 5GB for weights and doubled the size of context used for model evaluation Being very far away from proficiency in C++, these changes need to be civilized by someone experienced with ggml and c++

akumaburn mentioned this pull request Apr 11, 2023

Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model ggerganov/llama.cpp#867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizing and running inference on bloom-176B required some changes #21

Quantizing and running inference on bloom-176B required some changes #21

barsuna commented Apr 2, 2023

Quantizing and running inference on bloom-176B required some changes #21

Are you sure you want to change the base?

Quantizing and running inference on bloom-176B required some changes #21

Conversation

barsuna commented Apr 2, 2023