Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

paulleo13 · 2024-01-22T13:51:58Z

I have been playing around with your awesome implementation and found the following bug:

When I call LLaMaGenerate() with prompts longer than 511 tokens (might also be character limit, I just used token counting for simplicity), the subsequent call to Int4llamaDecoder::forward() causes a segmentation fault upon creation/allocation of inputs_embeds_buf on line 71 in llama/TinyChatEngine/llm/src/nn_modules/non_cuda/Int4llamaDecoder.cc.
I believe the issue lies in the stack allocation which might get too large for some prompts. On most systems the stack growth is limited. A stack allocation of a few MBytes can be too large, which is the case here and causes the segmentation fault.
I have the following fix: Instead define a vector with the specified size (I think this will be on the heap) like such std::vector<float> inputs_embeds_buf_vec(sqlen * this->embed_dim); and pass the data pointer to the Matrix3D<float> object in the next line Matrix3D<float> inputs_embeds(inputs_embeds_buf_vec.data(), 1, sqlen, this->embed_dim);.

It has worked in my test cases as of right now. Should I post a pull-request?

Edit: I don't know how reproducible this is, as the stack growth limit is architecture dependent according to this stack overflow comment: https://stackoverflow.com/a/1826072

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

paulleo13 commented Jan 22, 2024 •

edited

Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

Comments

paulleo13 commented Jan 22, 2024 • edited

paulleo13 commented Jan 22, 2024 •

edited