Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context size of the model keeps fall to default of 512 #149

Closed
RachelShalom opened this issue Mar 3, 2024 · 4 comments
Closed

context size of the model keeps fall to default of 512 #149

RachelShalom opened this issue Mar 3, 2024 · 4 comments
Assignees

Comments

@RachelShalom
Copy link

RachelShalom commented Mar 3, 2024

HI I am running this:

export NEURAL_SPEED_VERBOSE=1
PROMPT1=$(cat <<'END_HEREDOC'
## some large text here about 1000 tokens
END_HEREDOC
)
python scripts/inference.py --model_name llama -m llama_files/ne_llama_int4.bin -c 1500  -n 400  --color -p "$PROMPT1"


model_print_timings:        load time =  1696.97 ms
model_print_timings:      sample time =   172.18 ms /   323 runs   (    0.53 ms per token)
model_print_timings: prompt eval time =  1696.79 ms /   512 tokens (    3.31 ms per token)
model_print_timings:        eval time = 21978.58 ms /   322 runs   (   68.26 ms per token)
model_print_timings:       total time = 23929.26 ms
========== eval time log of each prediction ==========

I keep getting in the output that the size of prompt is 512 but the actual size of the prompt is more than 1000. any advice on how to handle this? I use the -c 1500 requesting a large context size

Also when I use --keep 0 ( which is the defult) I still see the initial prompt printed to stdout. how do I make it not apearing as part of the output?

@intellinjun
Copy link
Contributor

@RachelShalom please add -b 2048 in " python scripts/inference.py --model_name llama -m llama_files/ne_llama_int4.bin -c 1500 -n 400 --color -p "$PROMPT1" "

@RachelShalom
Copy link
Author

thenks @intellinjun I added and it works as exxpected. Any solution in mind about the --keep? or how to not show the prompt in stdout?

@intellinjun
Copy link
Contributor

intellinjun commented Mar 7, 2024

--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.

@intellinjun
Copy link
Contributor

--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.

@RachelShalom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants