Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Include GPT Fast in torch.compile nightly benchmark workflow #2857

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

sachanub
Copy link
Collaborator

@sachanub sachanub commented Dec 20, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

The objective of this PR is to include the GPT Fast model with weights corresponding to Llama 7B with int4 quantization.

Steps to download Llama 7B weights in the benchmark host:

Ran a temporary workflow to download weights with the HUGGING_FACE_HUB_TOKEN in the commit 1e6088e

Results of successful run: https://github.com/pytorch/serve/actions/runs/7271384883/job/19811851851?pr=2857

Testing:

Ran benchmark workflow in the commit 31936c9

Results of the successful run: https://github.com/pytorch/serve/actions/runs/7272224847/job/19813999840?pr=2857
Benchmark report file: report.md

Updates in benchmark-ab.py script:

Also updated the benchmark-ab.py script to include -l in the ab commands to allow variable response lengths without counting them as errors (https://httpd.apache.org/docs/2.4/programs/ab.html).

@sachanub sachanub changed the title Include GPT Fast in torch.compile nightly benchmark workflow [WIP] Include GPT Fast in torch.compile nightly benchmark workflow Dec 20, 2023
gpt_fast:
7b_int4:
benchmark_engine: "ab"
url: https://torchserve.pytorch.org/mar_files/gpt_fast_7b_int4.mar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clearly specify the model in the name. eg. Llama-2-7b-hf

backend_profiling: False
exec_env: "local"
processors:
- "cpu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu should be reomoved.

@chauhang
Copy link
Contributor

@namannandan @lxning What is the work remaining for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants