📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
Jun 12, 2024
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
🚀 DeepSeek-V2大模型逆向API白嫖测试【特长:GPT4平替】,支持高速流式输出、多轮对话,零配置部署,多路token支持。
RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information retrieval.
Chatbot-GPT, powered by OpenIM’s webhooks, seamlessly integrates with various messaging platforms. This tool enables private and group chats with bots, enhancing interactive communication. It delivers quick, automated responses, ideal for optimizing customer service and dynamic discussions, meeting diverse communication needs.
Add a description, image, and links to the deepseek topic page so that developers can more easily learn about it.
To associate your repository with the deepseek topic, visit your repo's landing page and select "manage topics."