Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation of 'float inputs_embeds_buf[]' in Int4llamaDecoder::forward() causes Segmentation Fault for inputs longer than 511 tokens #88

Open
paulleo13 opened this issue Jan 22, 2024 · 0 comments

Comments

@paulleo13
Copy link

paulleo13 commented Jan 22, 2024

I have been playing around with your awesome implementation and found the following bug:

When I call LLaMaGenerate() with prompts longer than 511 tokens (might also be character limit, I just used token counting for simplicity), the subsequent call to Int4llamaDecoder::forward() causes a segmentation fault upon creation/allocation of inputs_embeds_buf on line 71 in llama/TinyChatEngine/llm/src/nn_modules/non_cuda/Int4llamaDecoder.cc.
I believe the issue lies in the stack allocation which might get too large for some prompts. On most systems the stack growth is limited. A stack allocation of a few MBytes can be too large, which is the case here and causes the segmentation fault.
I have the following fix: Instead define a vector with the specified size (I think this will be on the heap) like such std::vector<float> inputs_embeds_buf_vec(sqlen * this->embed_dim); and pass the data pointer to the Matrix3D<float> object in the next line Matrix3D<float> inputs_embeds(inputs_embeds_buf_vec.data(), 1, sqlen, this->embed_dim);.

It has worked in my test cases as of right now. Should I post a pull-request?

Edit: I don't know how reproducible this is, as the stack growth limit is architecture dependent according to this stack overflow comment: https://stackoverflow.com/a/1826072

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant