New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Logits processor plugins #4769
base: main
Are you sure you want to change the base?
[Misc] Logits processor plugins #4769
Conversation
I added some documentation about this feature :) |
This looks cool - a distribution mechanism for logit processors. When #4775 gets merged this PR would need to be updated to support the more generic interface. |
I am very much in favor of this approach. A few months ago I tried to get a similar concept in huggingface-tgi: |
I like this idea. And I agree with @mmoskal that it would be important to support the more involved API being worked on in #4775. I wonder though how one would implement support for the OpenAI API on tool use if guided decoding were to be provided by such a plugin. The code on the OpenAI server depends on the guided decoding backend and will need to know how to transform the OpenAI API conformant parameters into valid guided decoding parameters (c.f. #4656). Supporting the OpenAI API as thoroughly as possible is a very valuable thing that should not be sacrificed for software-architectural reasons. So we can either define guided decoding as a core vLLM feature that is not in the scope of logit-processor plugins or we can think about e.g. also making the frontend part necessary to "correctly" use the plugins also pluggable. Latter would be a challenging endeavor. |
Thank you for the feedback everyone. Regarding @br3no response: It's a good point, I believe as a first step it does make sense to keep the guided decoding code as core vLLM logic, and even more so as it's already implemented this way. I will try and think how it would be possible to implement it as plugins but still allow tool calling, but I believe this pull request is valuable both ways :) |
This pull request adds support for Logits processor plugins.
This makes implementing custom Logits processors very easy, and eliminates the need to change vLLM directly to implement it.
For example with this merge request we could implement all of the guided decoding features, just by implementing a Python package and installing it in the same virtualenv as vLLM, without actually changing vLLM source code.
Example code for a logits processor plugin that given a token id multiplies its logit by 100:
And the
setup.py
file for the package should look something like this:With this merge request vLLM will load all the plugins at startup, and each inference request can specify usage of custom logits processors using the
logits_processors
field in the request body.The
parameters_model
in the plugin dictionary is used to validate and parse the request body.I will soon add to this pull request a page in the documentation explaining how to implement custom logits processors.