You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @tuobulatuo, thanks for your interest in our work. Sorry that I can't replicate your issue firsthand as I don't have an A100 GPU on hand. That being said, we've just released the new version of our CUDA implementation. As mentioned in #58, the new version has been tested on various GPUs with compute capability from 6.1 to 8.9. Therefore, we suppose it should work on A100 now, but we can't guarantee it as we haven't been able to test it directly. Please feel free to try it out and give us feedback.
I am experiencing the same issues when running the application on the A30. I attempted to resolve the issue by following the instructions in issue #58, but I am still receiving the above bug, do you have the solution for it yet @tuobulatuo. Thank you.
System Config:
LLM: 13b int4 version
A100 GPU, set "sm_80" for compute capability
Ubuntu 20.04, cuda version 12.2, driver version 535.104.12
Need help on this one, thanks!
Alex
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: 13b
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... Finished!
USER: mit
ASSISTANT:
$ #
#" ⁇ $
Xshel$!!$
Xshell ⁇ Xshell"
"!!" Xshell !
$
Xshell !Xshell XshellXshell! #Xshel#
!
$ !$$
"##!Xshell⁇ ⁇ $ ⁇
$
"# ⁇ ⁇ ## #!"!"
$!"!""
Inference latency, Total time: 10.2 s, 18.6 ms/token, 53.7 token/s, 548 tokens
The text was updated successfully, but these errors were encountered: