-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make graph capture apis easy to understand #472
Conversation
I don’t see the change in c API |
And there is a conflict |
And we may not call the API and use default in the model_generate and phi3_qa examples |
do we expose c API to users? |
I think we can make user aware that we have the API to manipulate max_bs? |
yes, we have C API: onnxruntime-genai/src/ort_genai_c.h Line 154 in 3296782
|
…enai into wangye/easy_apis
I think it is better to just rename the python API to try_graph_capture_with_max_batch_size. C and C# language already use this name and they are right. Splitting the function into 2 doesn't look natural. The SetMaxBatchSize is just for graph capture, but the name doesn't reflect that. And we need to call 2 functions for one atomic operation. |
1. fix the API name in python API. 2. enable cuda graph by default and set the max batch size to 1 if it is enabled in config. Same functionality as #472
1. fix the API name in python API. 2. enable cuda graph by default and set the max batch size to 1 if it is enabled in config. Same functionality as #472
Split the current API (try_graph_with_max_batch_size) into use_graph_capture() and set_max_batch_size() so user can ramp in a comfortable pace.