-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT-LLM Requests #632
Comments
Please add CohereAI!! CohereForAI/c4ai-command-r-plus |
Llama 3 would be great (both 8B and 70B): #1470 Maybe quantized to 8 or even 4 bit. |
currently llama 3 throws a bunch of errors converting to TensorRT LLM any ideal about the support for llama 3 |
Phi-3-mini should be amazing! Such a small 3.8B model could run quantized on many GPUs, with as little as 4GB VRAM. |
+1 for Phi-3 |
+1 for Command R Plus! CohereForAI/c4ai-command-r-plus |
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update:
Jan 14th, 2024
馃殌 = in development
Models
Decoder Only
Encoder / Encoder-Decoder
Multi-Modal
Other
Features & Optimizations
implementation done - documentation in progress
KV Cache
Quantization
Sampling
frequnecy_penalty
- Support forfrequency_penalty
聽#275repetition
&presence
penalties - Support for combiningrepetition_penalty
,presence_penalty
聽#274Workflow
Front-ends
Integrations
Usage / Installation
Platform Support
The text was updated successfully, but these errors were encountered: