New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
调用api接口感觉速度有点慢,是否和流式输出有关系,我看接口记录中流式输出参数是false,这个"stream": false, 参数在哪里可以设置呢? #329
Comments
我用的模型是RWKV-4-World-CHNtuned-3B-v1-20230625-ctx4096.pth,也转换过RWKV-4-World-CHNtuned-3B-v1-20230625-ctx4096-fp16.bin,都测试过,即使修改代码backend-python\routes\completion.py下的设置,重启程序也不生效,是这个模型不支持吗
class CompletionBody(ModelConfigBody):
|
好像每次程序都会重置修改的参数,那个流式参数是不能修改吗 |
模型用 https://huggingface.co/BlinkDL/rwkv-6-world/blob/main/RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth 另外显存多少?显存够就解码参数选 cuda fp16 |
12g的显存 |
@zhuifengzl 参数是调用api的时候传递的, 可以改, 你不用去改源码, 载入模型的时候, 拉满"载入显存层数" |
好的,我试试,感谢哈 |
public struct LocalSendData
这是写的代码,我不太懂,帮忙看看应该怎么修改,才能提高接口的回答速度呢 |
2024-04-19 16:22:02,475 - INFO
Client: Address(host='192.168.31.39', port=63902)
Url: http://192.168.31.39:8000/chat/completions
Body: {"max_tokens": 1000, "temperature": 1.2, "top_p": 0.5, "presence_penalty": 0.4, "frequency_penalty": 0.4, "penalty_decay": null, "top_k": null, "global_penalty": null, "messages": [{"role": "user", "content": "喂", "raw": false}], "model": "rwkv", "stream": false, "stop": ["\n\nUser", "\n\nQuestion", "\n\nQ", "\n\nHuman", "\n\nBob", "\n\nAssistant", "\n\nAnswer", "\n\nA", "\n\nBot", "\n\nAlice", "\n\nUser", "\n\nAss"], "user_name": null, "assistant_name": null, "system_name": null, "presystem": true}
Data:
Hello! How can I assist you today?
Finished. RequestsNum: 0
The text was updated successfully, but these errors were encountered: