context size of the model keeps fall to default of 512 #149

RachelShalom · 2024-03-03T08:29:04Z

HI I am running this:

export NEURAL_SPEED_VERBOSE=1
PROMPT1=$(cat <<'END_HEREDOC'
## some large text here about 1000 tokens
END_HEREDOC
)
python scripts/inference.py --model_name llama -m llama_files/ne_llama_int4.bin -c 1500  -n 400  --color -p "$PROMPT1"


model_print_timings:        load time =  1696.97 ms
model_print_timings:      sample time =   172.18 ms /   323 runs   (    0.53 ms per token)
model_print_timings: prompt eval time =  1696.79 ms /   512 tokens (    3.31 ms per token)
model_print_timings:        eval time = 21978.58 ms /   322 runs   (   68.26 ms per token)
model_print_timings:       total time = 23929.26 ms
========== eval time log of each prediction ==========

I keep getting in the output that the size of prompt is 512 but the actual size of the prompt is more than 1000. any advice on how to handle this? I use the -c 1500 requesting a large context size

Also when I use --keep 0 ( which is the defult) I still see the initial prompt printed to stdout. how do I make it not apearing as part of the output?

The text was updated successfully, but these errors were encountered:

intellinjun · 2024-03-04T03:07:43Z

@RachelShalom please add -b 2048 in " python scripts/inference.py --model_name llama -m llama_files/ne_llama_int4.bin -c 1500 -n 400 --color -p "$PROMPT1" "

RachelShalom · 2024-03-04T10:23:04Z

thenks @intellinjun I added and it works as exxpected. Any solution in mind about the --keep? or how to not show the prompt in stdout?

intellinjun · 2024-03-07T02:49:37Z

--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.

intellinjun · 2024-03-07T02:50:05Z

--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.

@RachelShalom

kevinintel assigned intellinjun Mar 4, 2024

intellinjun closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context size of the model keeps fall to default of 512 #149

context size of the model keeps fall to default of 512 #149

RachelShalom commented Mar 3, 2024 •

edited

intellinjun commented Mar 4, 2024

RachelShalom commented Mar 4, 2024

intellinjun commented Mar 7, 2024 •

edited by zhentaoyu

intellinjun commented Mar 7, 2024

context size of the model keeps fall to default of 512 #149

context size of the model keeps fall to default of 512 #149

Comments

RachelShalom commented Mar 3, 2024 • edited

intellinjun commented Mar 4, 2024

RachelShalom commented Mar 4, 2024

intellinjun commented Mar 7, 2024 • edited by zhentaoyu

intellinjun commented Mar 7, 2024

RachelShalom commented Mar 3, 2024 •

edited

intellinjun commented Mar 7, 2024 •

edited by zhentaoyu