Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Logits processor plugins #4769

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NadavShmayo
Copy link
Contributor

@NadavShmayo NadavShmayo commented May 11, 2024

This pull request adds support for Logits processor plugins.
This makes implementing custom Logits processors very easy, and eliminates the need to change vLLM directly to implement it.

For example with this merge request we could implement all of the guided decoding features, just by implementing a Python package and installing it in the same virtualenv as vLLM, without actually changing vLLM source code.

Example code for a logits processor plugin that given a token id multiplies its logit by 100:

from pydantic import BaseModel


class MyParameters(BaseModel):
    token_id: int


class MyLogitsProcessor:
    def __init__(self, tokenizer, parameters: MyParameters):
        self.tokenizer = tokenizer
        self.parameters = parameters

    def __call__(self, token_ids, logits):
        new_logits = logits.clone()
        new_logits[self.parameters.token_id] *= 100
        return new_logits


LOGITS_PROCESSOR_PLUGIN = {
    'logits_processor_class': MyLogitsProcessor,
    'parameters_model': MyParameters
}

And the setup.py file for the package should look something like this:

from setuptools import setup

setup(name='example_logits_processor',
      version='0.1',
      install_requires=[
            "pydantic>=1.8.2"
      ],
      entry_points={
            'vllm.logits_processors': ['example_plugin=example_plugin.main:LOGITS_PROCESSOR_PLUGIN']
      }
      )

With this merge request vLLM will load all the plugins at startup, and each inference request can specify usage of custom logits processors using the logits_processors field in the request body.
The parameters_model in the plugin dictionary is used to validate and parse the request body.

I will soon add to this pull request a page in the documentation explaining how to implement custom logits processors.

@rkooo567 rkooo567 self-assigned this May 13, 2024
@NadavShmayo
Copy link
Contributor Author

I added some documentation about this feature :)

@simon-mo
Copy link
Collaborator

@mmoskal @noamgat @br3no curious about your feedback on this!

@mmoskal
Copy link
Contributor

mmoskal commented May 15, 2024

This looks cool - a distribution mechanism for logit processors. When #4775 gets merged this PR would need to be updated to support the more generic interface.

@noamgat
Copy link
Contributor

noamgat commented May 15, 2024

I am very much in favor of this approach. A few months ago I tried to get a similar concept in huggingface-tgi:
huggingface/text-generation-inference#1274
But have since switched to vLLM :)

@br3no
Copy link
Contributor

br3no commented May 15, 2024

I like this idea. And I agree with @mmoskal that it would be important to support the more involved API being worked on in #4775.

I wonder though how one would implement support for the OpenAI API on tool use if guided decoding were to be provided by such a plugin. The code on the OpenAI server depends on the guided decoding backend and will need to know how to transform the OpenAI API conformant parameters into valid guided decoding parameters (c.f. #4656).

Supporting the OpenAI API as thoroughly as possible is a very valuable thing that should not be sacrificed for software-architectural reasons.

So we can either define guided decoding as a core vLLM feature that is not in the scope of logit-processor plugins or we can think about e.g. also making the frontend part necessary to "correctly" use the plugins also pluggable. Latter would be a challenging endeavor.

@NadavShmayo
Copy link
Contributor Author

Thank you for the feedback everyone.

Regarding @br3no response: It's a good point, I believe as a first step it does make sense to keep the guided decoding code as core vLLM logic, and even more so as it's already implemented this way.

I will try and think how it would be possible to implement it as plugins but still allow tool calling, but I believe this pull request is valuable both ways :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants