-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-2244: Add support for fetching models from S3 based on the modelRelativePath & Adds a universal OpenVino ContainerFile #254
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? It does not seem to be used in any of the PipelineRuns and hence not tested by our CI. Does it replace pipelines/containerfiles/Containerfile.openvino.mlserver.mlflow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not replace the Containerfile you referenced due to that file being hardcoded to the tensorflow-housing
model we use as an example. I would need to refactor the tensorflow-housing model away from MLFlow to align it with the Containerfile.openvino
.
I included Containerfile.openvino
in the PR since this is what I needed to make this work with the Fraud Detection tutorial but it is generic for any OpenVino model that is being packaged into an inference container. A key feature that the new Containerfile supports is model versioning so that if the model contains multiple versions /model/<model name>/{1,2,3,...,999}
OpenVino will automatically serve the highest version found for inferencing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why can't this be used for the tensorflow-housing instead of the original one, so that it gets exposed and tested, rather than just being a "random" file in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why can't this be used for the tensorflow-housing instead of the original one,....
I added the comment in the Containerfile.openvino.mlserver.mlflow
file to address that issue. The tensorflow-housing models were developed for an MLFlow workflow so I would need to update the directory structure for tensorflow-housing
in Git|S3 to include the versioning structure required by OpenVino. Since the original model training code was not written for that workflow, my strong preference is to update the training code to support that structure for reproducibility.
so that it gets exposed and tested, rather than just being a "random" file in the repo?
This would be a follow-up PR to bring the Fraud Detection model workflow into our supported workflow. I included it in this PR because it is required working building our OpenVino inference containers with any number of model versions AND it allows anyone reviewing & testing the PR to use this Containerfile to verify OpenVino functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since documentation to do that is not there, I doubt about the utility of that at this point.
@@ -29,6 +29,9 @@ This repo currently contains two examples: | |||
| **Model (Pipelines)** | `tensorflow-housing` | `bike-rentals-auto-ml` | | |||
| **App (ACM)** | `tensorflow-housing-app` | `bike-rental-app` | | |||
|
|||
|
|||
*Additionally*, you can follow the [OpenShift AI tutorial - Fraud detection example](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8/html/openshift_ai_tutorial_-_fraud_detection_example/index) to execute the end to end workflow of training a model from scratch & building the [inferencing container](pipelines/containerfiles/Containerfile.openvino) to deliver to Edge environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to promote the fraud detection tutorial (which I like), shouldn't we just expand all the relevant places and PipelineRuns to actually lead the reader through the steps, and also have things tested by the CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but that is out of scope for this PR. A lot of the current pipeline workflows are specific to the initial demo use case for the bike-rental
and tensorflow-housing
models to show potential integration w/ AzureML and MLFlow. We need to refactor some of the tasks and pipelines to work with any model supported by OpenVino in ODH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a maintenance liability.
Is it critical that this goes in in this PR? If not, I'd just leave this change out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it critical that this goes in in this PR? If not, I'd just leave this change out.
Which change, this text in the DEVELOPERS doc? I don't have any particular need to included this the document other than to point people to documented workflow that is supported and fits our E2E scenario training and delivering models to the edge.
This is a maintenance liability.
I am curious about this. Is it a maintenance liability because it is based on code outside of this repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that us writing here that a user can follow that tutorial and use our pipeline for build might mean that it stops working in the future when we don't test it permanently, however, we can at least say it, that it is not permanently tested by us and it is best effort from our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all good points. I am going to separate all of the "Fraud Detection Model" specific changes, this DEVELOPERS.md and Containerfile, into a separate story/work since some of the longterm fixes should be aligned with a complete rewrite of the pipeline, tasks and manifests.
I get that us writing here that a user can follow that tutorial and use our pipeline for build might mean that it stops working in the future when we don't test it permanently,
IMO, this scenario does have a benefit of us dogfooding our own tutorials since this tutorial and official documentation are maintained by another group working on OpenShift AI. If we find a bug, it would be good to file it early in the upstream releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing people somewhere with some promise of experiencing the end-to-end example across two repos, without including the relevant steps in our documentation, will lead to people (including myself) to get stuck with "OK, what next?" questions, either because the steps are not trivial or because the other repo has changed significantly so our assumptions in the Containerfile no longer work.
Folks will either abandon the attempt, or they will file issues here (better) or in the https://github.com/rh-aiservices-bu/fraud-detection (worse, maintainers of that repo don't know anything about this work). even issues filed against this repo will be hard to respond to in a timely and effective manner if we don't have the experience with that other repo among the team and in the CI.
@LaVLaS: This pull request references RHOAIENG-2244 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@@ -319,7 +322,7 @@ spec: | |||
if [ "$(params.upon-end)" == "stop" ]; then | |||
oc scale deployment.apps/$(params.model-name)-$(params.model-version) --replicas=0 | |||
elif [ "$(params.upon-end)" == "delete" ]; then | |||
oc delete all --selector=app=$(params.model-name)-$(params.model-version) | |||
oc delete deployment,service --selector=app=$(params.model-name)-$(params.model-version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
@@ -255,6 +257,7 @@ spec: | |||
containers: | |||
- image: $(params.candidate-image-tag-reference) | |||
name: $(params.model-name)-$(params.model-version) | |||
imagePullPolicy: Always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this changed to work in case we push the same label twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I was stuck in a debuging loop testing updates to the Containerfile that were not propagating to the actual test-deployment pod due to a stale directory being present in the running pod although the build step had renamed the folder. Ultimately it appeared that the hash was a much older build so the quickest fix was to set it to Always
.
Another option would be to use the digest hash from the previous build for these test pods to ensure we are testing the exact build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the digest hash might be a bulletproof option, we can fix it in a separate PR maybe.
@@ -1,3 +1,7 @@ | |||
# This ContainerFile is used to manually convert the MLFlow tensorflow-housing example to a format that works with the OpenVino Model Server | |||
# The tensorflow-housing example we provide in the ai-edge repo copies the tensorflow-housing/tf2model/{saved_model.pb,variables/} directory to a | |||
# format that is compatible with OpenVino Model Server: /model/<MODEL VERSION>/<TENSORFLOW SAVED MODEL FORMAT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# format that is compatible with OpenVino Model Server: /model/<MODEL VERSION>/<TENSORFLOW SAVED MODEL FORMAT> | |
# format that is compatible with OpenVino Model Server: /model/<MODEL VERSION>/<TENSORFLOW SAVED MODEL FORMAT> |
@@ -29,6 +29,9 @@ This repo currently contains two examples: | |||
| **Model (Pipelines)** | `tensorflow-housing` | `bike-rentals-auto-ml` | | |||
| **App (ACM)** | `tensorflow-housing-app` | `bike-rental-app` | | |||
|
|||
|
|||
*Additionally*, you can follow the [OpenShift AI tutorial - Fraud detection example](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8/html/openshift_ai_tutorial_-_fraud_detection_example/index) to execute the end to end workflow of training a model from scratch & building the [inferencing container](pipelines/containerfiles/Containerfile.openvino) to deliver to Edge environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that us writing here that a user can follow that tutorial and use our pipeline for build might mean that it stops working in the future when we don't test it permanently, however, we can at least say it, that it is not permanently tested by us and it is best effort from our side.
Description
fetch-model-s3
task to provide basic support for fetching a model from a sub directory under the S3 bucket removing the hardcoded requirement that all model directories be stored at the root of the S3 bucket.This provides the ability for users to utilize the OpenShift AI Fraud Detection tutorial in the workflow to train and deploy a model using our pipelines
Jira: RHOAIENG-2244
How Has This Been Tested?
OR
fraud-detection
model froms3://rhoai-edge/example-models/fraud-detection
The
test-model-rest-svc
must be updated to match output based on thedata:
notprediction:
The
Continerfile.openvino
to build the inferencing container must be used with this modelMerge criteria:
Existing pipeline workflows for
bike-rental
andtensorflow-housing
are not affected