Bloomz 176B inference doesn't work #15

agemagician · 2023-03-18T19:20:20Z

Hello,

I have converted bloomz model successfully, but the inference doesn't work.

 ./main -m ./models/ggml-model-bloomz-f16.bin -t 8 -n 128
main: seed = 1679167152
bloom_model_load: loading model from './models/ggml-model-bloomz-f16.bin' - please wait ...
bloom_model_load: n_vocab = 250880
bloom_model_load: n_ctx   = 512
bloom_model_load: n_embd  = 14336
bloom_model_load: n_mult  = 1
bloom_model_load: n_head  = 112
bloom_model_load: n_layer = 70
bloom_model_load: f16     = 1
bloom_model_load: n_ff    = 57344
bloom_model_load: n_parts = 1
bloom_model_load: ggml ctx size = 333257.61 MB
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349847586752, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349847931776, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081229760, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081459328, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 350670590144, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 349848678784, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351081976768, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351082206336, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351493305664, available 349445931264)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 351493305664, available 349445931264)
Segmentation fault (core dumped)

I have enough cpu memory "420GB". Any idea what is the issue ?

The text was updated successfully, but these errors were encountered:

laurentperez · 2023-03-18T21:51:08Z

out of curiosity and adding a question over your question, how much of your 420GB of RAM did you use to convert to ggml ? I barely managed to convert bloomz-7b1 using 32GB of RAM so I wonder how much 176b needs.

agemagician · 2023-03-18T22:13:19Z

out of curiosity and adding a question over your question, how much of your 420GB of RAM did you use to convert to ggml ? I barely managed to convert bloomz-7b1 using 32GB of RAM so I wonder how much 176b needs.

all of it + approx 30GB of virtual memory.

bil-ash · 2023-03-20T01:52:39Z

It seems you are running out of memory. Most probably,I can help to reduce the memory usage to 1/6th(was successful with 7b1 model). What is the model size(disk usage) of the 176B model?
Please share a link to download the quantized model because my server does not have the RAM(>400GB) to quantize the 176B model . I will then see if I am able to run the model

agemagician · 2023-03-20T07:59:52Z

The disk size for the model is approx 360GB.
Unfortunately, quantization doesn't work; please see:
huggingface/optimum#901

I don't think it is a problem with out-of-memory as there is 420GB of main memory + 50 swap memory.

ZhangYunchenY · 2023-03-21T12:15:29Z

same question, while I have 1000GB RAM

barsuna · 2023-04-02T15:23:22Z

./main -m models/bloom/ggml-model-bloom-f16-q4_0.bin -t 96 -p "The most beautiful question is" -n 20
main: seed = 1680447842
bloom_model_load: loading model from 'models/bloom/ggml-model-bloom-f16-q4_0.bin' - please wait ...
bloom_model_load: n_vocab = 250880
bloom_model_load: n_ctx = 512
bloom_model_load: n_embd = 14336
bloom_model_load: n_mult = 1
bloom_model_load: n_head = 112
bloom_model_load: n_layer = 70
bloom_model_load: f16 = 2
bloom_model_load: n_ff = 57344
bloom_model_load: n_parts = 1
bloom_model_load: ggml ctx size = 106877.59 MB
bloom_model_load: memory_size = 3920.00 MB, n_mem = 35840
bloom_model_load: loading model part 1/1 from 'models/bloom/ggml-model-bloom-f16-q4_0.bin'
bloom_model_load: ......................................................................................................... done
bloom_model_load: model size = 107237.48 MB / num tensors = 846

main: prompt: 'The most beautiful question is'
main: number of tokens in prompt = 5
2175 -> 'The'
6084 -> ' most'
40704 -> ' beautiful'
5893 -> ' question'
632 -> ' is'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

The most beautiful question is the one you ask yourself.
What are we doing here?
I don't understand this at all!
L

main: mem per token = 192093564 bytes
main: load time = 65292.77 ms
main: sample time = 498.68 ms
main: predict time = 407606.25 ms / 16983.59 ms per token
main: total time = 537545.81 ms

barsuna · 2023-04-02T15:25:02Z

Above was produced with this commit
barsuna@2d0e478

bozo32 · 2023-05-28T10:45:11Z

I have a cluster running scientific linux with basically unlimited ram but 4x15gb vram I can test things on. If anybody gets a GGML that is worth testing, tell me.

linuxmagic-mp · 2023-06-09T00:33:28Z

Getting lost in this thread, just converted the 176B model into GGML, fp16, and now looking at using bloom.cpp, but noticed that @barsuma Readme appears to reflect there there are still problems. Could we get a status update? Doesn't look like his code is a pull request, or that this code has been updated to solve the issue, but I am not sure.

NouamaneTazi mentioned this issue Mar 22, 2023

Quantization doesn't work with Bloomz 176B #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bloomz 176B inference doesn't work #15

Bloomz 176B inference doesn't work #15

agemagician commented Mar 18, 2023

laurentperez commented Mar 18, 2023

agemagician commented Mar 18, 2023

bil-ash commented Mar 20, 2023

agemagician commented Mar 20, 2023

ZhangYunchenY commented Mar 21, 2023

barsuna commented Apr 2, 2023

barsuna commented Apr 2, 2023

bozo32 commented May 28, 2023

linuxmagic-mp commented Jun 9, 2023

Bloomz 176B inference doesn't work #15

Bloomz 176B inference doesn't work #15

Comments

agemagician commented Mar 18, 2023

laurentperez commented Mar 18, 2023

agemagician commented Mar 18, 2023

bil-ash commented Mar 20, 2023

agemagician commented Mar 20, 2023

ZhangYunchenY commented Mar 21, 2023

barsuna commented Apr 2, 2023

barsuna commented Apr 2, 2023

bozo32 commented May 28, 2023

linuxmagic-mp commented Jun 9, 2023