You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
export NEURAL_SPEED_VERBOSE=1
PROMPT1=$(cat <<'END_HEREDOC'
## some large text here about 1000 tokens
END_HEREDOC
)
python scripts/inference.py --model_name llama -m llama_files/ne_llama_int4.bin -c 1500 -n 400 --color -p "$PROMPT1"
model_print_timings: load time = 1696.97 ms
model_print_timings: sample time = 172.18 ms / 323 runs ( 0.53 ms per token)
model_print_timings: prompt eval time = 1696.79 ms / 512 tokens ( 3.31 ms per token)
model_print_timings: eval time = 21978.58 ms / 322 runs ( 68.26 ms per token)
model_print_timings: total time = 23929.26 ms
========== eval time log of each prediction ==========
I keep getting in the output that the size of prompt is 512 but the actual size of the prompt is more than 1000. any advice on how to handle this? I use the -c 1500 requesting a large context size
Also when I use --keep 0 ( which is the defult) I still see the initial prompt printed to stdout. how do I make it not apearing as part of the output?
The text was updated successfully, but these errors were encountered:
--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.
--keep should be the number of the first few tokens reserved when cutting off the streaming LLM. And the second problem is that we haven't developed this feature yet.
HI I am running this:
I keep getting in the output that the size of prompt is 512 but the actual size of the prompt is more than 1000. any advice on how to handle this? I use the -c 1500 requesting a large context size
Also when I use --keep 0 ( which is the defult) I still see the initial prompt printed to stdout. how do I make it not apearing as part of the output?
The text was updated successfully, but these errors were encountered: