Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch compile fusion backend prototype #209

Draft
wants to merge 28 commits into
base: upstream-main
Choose a base branch
from

Conversation

bnellnm
Copy link
Member

@bnellnm bnellnm commented Apr 25, 2024

pulls in parts of vllm-project#3014

for now, only the forward method on LlamaMLP is tagged with the new backend. sample code to run (derived from examples/offline_inference.py):

from vllm import LLM, SamplingParams

# to turn up the logging level
import logging
import vllm
vllm.logger._default_handler.setLevel(logging.DEBUG)

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@bnellnm bnellnm requested a review from mgoin April 30, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant