Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make graph capture apis easy to understand #472

Closed
wants to merge 9 commits into from
Closed

Conversation

wangyems
Copy link
Member

@wangyems wangyems commented May 17, 2024

Split the current API (try_graph_with_max_batch_size) into use_graph_capture() and set_max_batch_size() so user can ramp in a comfortable pace.

@yufenglee
Copy link
Member

I don’t see the change in c API

@yufenglee
Copy link
Member

And there is a conflict

@yufenglee
Copy link
Member

And we may not call the API and use default in the model_generate and phi3_qa examples

@wangyems
Copy link
Member Author

I don’t see the change in c API

do we expose c API to users?

@wangyems
Copy link
Member Author

And we may not call the API and use default in the model_generate and phi3_qa examples

I think we can make user aware that we have the API to manipulate max_bs?

@yufenglee
Copy link
Member

I don’t see the change in c API

do we expose c API to users?

yes, we have C API:

OGA_EXPORT OgaResult* OGA_API_CALL OgaGeneratorParamsTryGraphCaptureWithMaxBatchSize(OgaGeneratorParams* generator_params, int32_t max_batch_size);

@yufenglee
Copy link
Member

yufenglee commented May 18, 2024

I think it is better to just rename the python API to try_graph_capture_with_max_batch_size. C and C# language already use this name and they are right.

Splitting the function into 2 doesn't look natural. The SetMaxBatchSize is just for graph capture, but the name doesn't reflect that. And we need to call 2 functions for one atomic operation.

yufenglee added a commit that referenced this pull request May 20, 2024
1. fix the API name in python API.
2. enable cuda graph by default and set the max batch size to 1 if it is
enabled in config.

Same functionality as
#472
yufenglee added a commit that referenced this pull request May 20, 2024
1. fix the API name in python API.
2. enable cuda graph by default and set the max batch size to 1 if it is
enabled in config.

Same functionality as
#472
@yufenglee yufenglee closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants