Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assistant spitting out non-readable characters on RTX 4060 #71

Open
zhefciad opened this issue Oct 26, 2023 · 1 comment
Open

Assistant spitting out non-readable characters on RTX 4060 #71

zhefciad opened this issue Oct 26, 2023 · 1 comment

Comments

@zhefciad
Copy link

zhefciad commented Oct 26, 2023

(TinyChatEngine) zhef@zhef:~/TinyChatEngine/llm$ make chat -j
CUDA is available!
src/Generate.cc src/LLaMATokenizer.cc src/OPTGenerate.cc src/OPTTokenizer.cc src/utils.cc src/nn_modules/Fp32OPTAttention.cc src/nn_modules/Fp32OPTDecoder.cc src/nn_modules/Fp32OPTDecoderLayer.cc src/nn_modules/Fp32OPTForCausalLM.cc src/nn_modules/Fp32llamaAttention.cc src/nn_modules/Fp32llamaDecoder.cc src/nn_modules/Fp32llamaDecoderLayer.cc src/nn_modules/Fp32llamaForCausalLM.cc src/nn_modules/Int4OPTAttention.cc src/nn_modules/Int4OPTDecoder.cc src/nn_modules/Int4OPTDecoderLayer.cc src/nn_modules/Int4OPTForCausalLM.cc src/nn_modules/Int8OPTAttention.cc src/nn_modules/Int8OPTDecoder.cc src/nn_modules/Int8OPTDecoderLayer.cc src/nn_modules/OPTForCausalLM.cc src/ops/BMM_F32T.cc src/ops/BMM_S8T_S8N_F32T.cc src/ops/BMM_S8T_S8N_S8T.cc src/ops/LayerNorm.cc src/ops/LayerNormQ.cc src/ops/LlamaRMSNorm.cc src/ops/RotaryPosEmb.cc src/ops/W8A8B8O8Linear.cc src/ops/W8A8B8O8LinearReLU.cc src/ops/W8A8BFP32OFP32Linear.cc src/ops/arg_max.cc src/ops/batch_add.cc src/ops/embedding.cc src/ops/linear.cc src/ops/softmax.cc ../kernels/matmul_imp.cc ../kernels/matmul_int4.cc ../kernels/matmul_int8.cc
../kernels/cuda/matmul_ref_fp32.cc ../kernels/cuda/matmul_ref_int8.cc
../kernels/cuda/gemv_cuda.cu ../kernels/cuda/matmul_int4.cu  src/nn_modules/cuda/Int4llamaAttention.cu src/nn_modules/cuda/Int4llamaDecoder.cu src/nn_modules/cuda/Int4llamaDecoderLayer.cu src/nn_modules/cuda/Int4llamaForCausalLM.cu src/nn_modules/cuda/LLaMAGenerate.cu src/nn_modules/cuda/utils.cu src/ops/cuda/BMM_F16T.cu src/ops/cuda/LlamaRMSNorm.cu src/ops/cuda/RotaryPosEmb.cu src/ops/cuda/batch_add.cu src/ops/cuda/embedding.cu src/ops/cuda/linear.cu src/ops/cuda/softmax.cu
make: 'chat' is up to date.
(TinyChatEngine) zhef@zhef:~/TinyChatEngine/llm$ ./chat
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: LLaMA2_7B_chat
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... Finished!
USER: Hi, I'm Jeff!
ASSISTANT:

 #
$  ⸮#

#" ⁇ $
   $!!$
        ⁇ "

"!!" #         !
$
         ! !    #


!⸮
$       !$$
"##!
 ⁇ ⸮ ⁇  $ ⁇

        $"!" ⁇  #

        ⸮#
"


⸮
        $ ⁇

#        $
 "# ⁇  ⁇ ##
⸮#!"!"
$!"!" !"

Inference latency, Total time: 40.5 s, 73.9 ms/token, 13.5 token/s, 548 tokens
USER:

I have an RTX 4060 Windows Laptop and ran this with WSL Ubuntu. Modified the Makefile to match my computing capability (89). Anything I did wrong or it's still not supported?

@zhefciad zhefciad changed the title Assistant spitting out non-readable characters Assistant spitting out non-readable characters on RTX 4060 Oct 26, 2023
@dt1729
Copy link

dt1729 commented Nov 11, 2023

GTX 1070 and same issue

Screenshot from 2023-11-10 17-43-26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants