You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can run LLAMA with LoRA weights for text generative tasks like completion and summarization. However, I have trained a model with classification LoRA. When I tried to run with run.py, I can able to do generative tasks. However, I wanted to do classifcation.
Can you post an example where LLM like LLAMA or MISTRAL is fine tuned with classification head using LoRA and convert those checkpoints and do classification by running run.py?
The requirement is not very clear for me. Do you want to get the context logits of model instead of generating tokens? If so, you could get the logits of context by adding --gather_context_logits during building engine.
System Info
TensorRT-LLM 0.8
Nvidia A100
Cuda 12.2
Who can help?
I can run LLAMA with LoRA weights for text generative tasks like completion and summarization. However, I have trained a model with classification LoRA. When I tried to run with
run.py
, I can able to do generative tasks. However, I wanted to do classifcation.Can you post an example where LLM like LLAMA or MISTRAL is fine tuned with classification head using LoRA and convert those checkpoints and do classification by running
run.py
?It appears model_runner does only decode?
TensorRT-LLM/tensorrt_llm/runtime/model_runner.py
Line 788 in 89ba1b1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
.
Expected behavior
A working example for classification with LoRa and LLM
actual behavior
Unavailable
additional notes
.
The text was updated successfully, but these errors were encountered: