Torch compile fusion backend prototype #209

bnellnm · 2024-04-25T16:30:45Z

for now, only the forward method on LlamaMLP is tagged with the new backend. sample code to run (derived from examples/offline_inference.py):

from vllm import LLM, SamplingParams

# to turn up the logging level
import logging
import vllm
vllm.logger._default_handler.setLevel(logging.DEBUG)

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

bnellnm added 3 commits April 25, 2024 15:16

Custom torch.compile backend prototype

b449e5d

add lowering_utils.py

e9a62ff

torch.compile fusion backend prototype

44153ea

bnellnm requested review from tlrmchlsmth, varun-sundar-rabindranath, SageMoore and robertgshaw2-neuralmagic April 25, 2024 16:33

bnellnm added 4 commits April 25, 2024 20:24

fix a mess of fusion pass bugs

55d20b3

add fusion failure fallback exception

a2ba837

add workaround for symbolic shape issue, fix other stuff

2b0fc7d

meta + signature generation

a64f893

bnellnm requested a review from mgoin April 30, 2024 21:52

bnellnm and others added 18 commits April 30, 2024 22:09

refactor, use temporary files

cb686a1

comment

86381d8

Merge branch 'upstream-main' into torch-compile-fusion-new

692fd79

wip

e8a9b6b

wip registry

1ce096f

wip registry

bdd91ed

merge

5f4bb6e

replace prints with logging

019910a

refactoring + hacked up support for getitem

5389a6e

cleanups + comments

835756b

debugging print

260cbf8

move code cache to class scope

b8bc74b

wip

9db9f46

remove use of split_module

193b7a6

handle dynamic dim wip

1188acb

delete tensors that are no longer needed in c++

e2f45bd

turn down logging

abaab9b

put symint rejection hack back in

5bdc042

bnellnm added 3 commits May 13, 2024 21:23

fix 'memory leak'

14791d6

comments

67e97ed

smarter slice translation

173e654

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch compile fusion backend prototype #209

Torch compile fusion backend prototype #209

bnellnm commented Apr 25, 2024 •

edited

Torch compile fusion backend prototype #209

Are you sure you want to change the base?

Torch compile fusion backend prototype #209

Conversation

bnellnm commented Apr 25, 2024 • edited

bnellnm commented Apr 25, 2024 •

edited