-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-4146 - Add support to import a model from the ODH Model Registry via Model Build Pipeline #246
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
So was just looking at this and wanted to get a better understanding of what is happening. Fetching from the model registry is now the default and the results from the model registry (info about model in s3 or git) are passed to the task that will then get the model? Is that correct? Is it expected that we only use model registry now? |
@@ -117,6 +117,12 @@ Create a copy of the file(s) below to include the required credentials for acces | |||
# Linking secret is needed only if the secret from the Robot account hasn't been already applied and linked in the previous step | |||
$ oc secret link pipeline credentials-image-registry | |||
``` | |||
### Setup ODH Model Registry for Storing Models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the CLI docs have different instructions for setting up the model registry: https://github.com/opendatahub-io/ai-edge/blob/main/docs/cli.md#installing-model-registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in this PR is the link to the MR operator repo which is changing fast.
@devguyio Maybe we would need to update the CLI with the link for the MR installations because anyone using the latest changes of MR operators would see errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is some custom configuration for model registry, I agree that we should defer to the instruction in the model-registry-operator repo. If we notice any issues with failed deployments then it would be best for the project to file bugs / PRs to improve their instructions then us owning a duplicate set of instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, leaving this as is. We can open a PR to the CLI readme if needed to reflect the same change.
@@ -9,11 +9,15 @@ spec: | |||
- name: model-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this pipelinerun example should be changed, should it? Maybe there should be a separate pipelinerun example to demonstrate the model registry integration. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep adding a third example I think would be best also 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, its a good idea. I'll add a separate pipelinerun example.
export MODEL_ID=$(curl --silent -X 'GET' \ | ||
"$(params.model-registry-hostname)/api/model_registry/v1alpha3/registered_models?" -H 'accept: application/json' | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there's authentication required on the model registry endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grdryn I don't think MR have implemented anything with authentication. Do you see anything that I am missing? Swagger for MR
My thinking was that since the MR would be deployed on the same 'local' cluster as that of the pipeline, so it should/would have access to the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that since the MR would be deployed on the same 'local' cluster as that of the pipeline, so it should/would have access to the endpoint.
That will probably typically be the case, although I do wonder if there will be cases where the pipelines are run on a different cluster than the model registry. I notice that you're using a Route address to access the model registry (rather than a Service address internal to the cluster), which indicates that the model registry will at least be accessible to the outside world.
I don't think MR have implemented anything with authentication. Do you see anything that I am missing?
Yeah, I'm not sure really. Authentication could be external to the API, using something like the OpenShift OAuth proxy or something else sitting in front of the API.
I'm not sure if it's 100% necessary to be able to handle authentication for this at this point, but it's something that we should at least think about.
@LaVLaS do you have any thoughts about this?
@devguyio have you thought about authentication for the model registry in the CLI at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grdryn, I reached out to the MR channel about this question in slack.
I feel the istio part of it is optional so we might be okay temporarily or I might be wrong :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine for this PR to leave it as-is, like you say. I think we should have a follow-up story to support authentication on it though. WDYT? If you agree, could you create that in Jira?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, otherwise this PR gets too complex and a lot of changes in one. Created the Jira 6703.
echo -n "$S3-BUCKET-NAME" | tee $(results.s3-bucket-name.path) | ||
fi | ||
else | ||
echo -n "Fetching params from PipelineRun" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo -n "Fetching params from PipelineRun" | |
echo -n "Using params from PipelineRun" |
- description: The workspace for the downloaded model uris. | ||
name: workspace | ||
- description: The S3 secret. | ||
name: s3-secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are either of these workspaces actually used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this comment earlier. Good catch, actually this is not being used, I'm gonna test with this removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grdryn It turns out we need these workspaces and the pipeline fails without them with the below error: workspace binding "s3-secret" does not match any declared workspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting that they be removed from the Pipeline altogether, just from these tasks, if they're not being used within this task. See the check-model-and-containerfile-exists.yaml
task for example: the s3-secret
workspace is neither used nor declared.
- name: s3-bucket-name | ||
value: rhoai-edge-models | ||
- name: model-registry-hostname | ||
value: "modelregistry-sample-http-default.apps.srbiswas-cluster2.dev.datahub.redhat.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are doing another pipeline run make sure to just change the host name here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelahunt Yeah I was planning to remove it, but since we don't have any shared cluster, should i just leave it as <insert model-registry-hostname>
in the example pipelinerun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just some example value that shows what it should be without actually working.
modelregistry-sample-http-default.apps.username-cluster.dev.datahub.redhat.com
or something
The PR check failure appears to be as a result of the following error in the PipelineRun:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review.
Also, is this somehow aligned with the CLI work? Currently we have 2 approaches to Model registry:
- CLI - where we store all pipeline params in the model registry
- This - where we get model location from the model registry during the run of the pipeline
@LaVLaS maybe?
- name: model-registry-hostname | ||
type: string | ||
default: "" | ||
description: The servicename route for model registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: The servicename route for model registry | |
description: The domain name of the model registry |
This might not be a correct suggestion. Is servicename
a particular route among many routes that the model registry would have?
The intention of this suggestion is to make it less specific to the idea of a "route" in OpenShift: in future it' might be exposed using the K8s Gateway API or some other method, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grdryn My idea of calling it a servicename
route was that the user would know where or which endpoint to pick-up from the Openshift UI console. In my understanding model registry probably has one route to the db, so I would agree that either is fine. WDYT?
@@ -117,6 +117,12 @@ Create a copy of the file(s) below to include the required credentials for acces | |||
# Linking secret is needed only if the secret from the Robot account hasn't been already applied and linked in the previous step | |||
$ oc secret link pipeline credentials-image-registry | |||
``` | |||
### Setup ODH Model Registry for Storing Models | |||
Setting up the Model Registry using [Model Registry Operator](https://github.com/opendatahub-io/model-registry-operator?tab=readme-ov-file#running-on-the-cluster) or the ODH Operator. We are currently using the MySQL Db of MR and needs the `serviceRoute` enabled [here](https://github.com/opendatahub-io/model-registry-operator/blob/main/config/samples/mysql/modelregistry_v1alpha1_modelregistry.yaml#L17). Once this option is set to enabled we add the MR Route under `model-registry-hostname` parameter to the pipeline run file. Following this we simply run the existing pipeline create commands to start the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting up the Model Registry using [Model Registry Operator](https://github.com/opendatahub-io/model-registry-operator?tab=readme-ov-file#running-on-the-cluster) or the ODH Operator. We are currently using the MySQL Db of MR and needs the `serviceRoute` enabled [here](https://github.com/opendatahub-io/model-registry-operator/blob/main/config/samples/mysql/modelregistry_v1alpha1_modelregistry.yaml#L17). Once this option is set to enabled we add the MR Route under `model-registry-hostname` parameter to the pipeline run file. Following this we simply run the existing pipeline create commands to start the pipeline. | |
Set up the Model Registry using [Model Registry Operator](https://github.com/opendatahub-io/model-registry-operator?tab=readme-ov-file#running-on-the-cluster) or the ODH Operator. | |
We are currently using the MySQL Db of MR and needs the `serviceRoute` enabled [here](https://github.com/opendatahub-io/model-registry-operator/blob/main/config/samples/mysql/modelregistry_v1alpha1_modelregistry.yaml#L17). | |
Once this option is set to enabled we add the MR host name under `model-registry-hostname` parameter to the pipeline run file. | |
Following this we simply run the existing pipeline create commands to start the pipeline. |
Some minor wording suggestions here, but also I've put each sentence on a separate line, to avoid the lines being so long (it can be annoying to read really long lines in a text editor/terminal, for example). When rendered by markdown, they will all be part of the same paragraph anyway, so this is just a source change.
|
||
![MR Schema](docs/images/edge-MR.png) | ||
|
||
The current workflow supports the above schema for storing the model parameters in the MR where the parameters from the pipeline run model is stored in the Model Version as a `customProperties` with `string_value` and `metadataType`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current workflow supports the above schema for storing the model parameters in the MR where the parameters from the pipeline run model is stored in the Model Version as a `customProperties` with `string_value` and `metadataType`. | |
The current workflow supports the above schema for storing the model parameters in the MR, where the parameters for the PipelineRun model are stored in the Model Version as `customProperties`. |
jq -r --arg MODEL_PARAM "s3-bucket-name" '.customProperties[$MODEL_PARAM].string_value') | ||
echo -n "$S3-BUCKET-NAME" | tee $(results.s3-bucket-name.path) | ||
fi | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else | |
else |
^I think this should be at the same level of indentation as the outer if condition, rather than the inner one just above it. In other words, it's odd to see the following two words at the same level of indentation:
fi
else
- description: The workspace for the downloaded model uris. | ||
name: workspace | ||
- description: The S3 secret. | ||
name: s3-secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting that they be removed from the Pipeline altogether, just from these tasks, if they're not being used within this task. See the check-model-and-containerfile-exists.yaml
task for example: the s3-secret
workspace is neither used nor declared.
- name: s3-bucket-name | ||
value: rhoai-edge-models | ||
- name: model-registry-hostname | ||
value: "modelregistry-sample-http-default.apps.srbiswas-cluster2.dev.datahub.redhat.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is specific to the cluster where the model registry is deployed, we should change this to something that is easily identifiable to a user as what needs to be changed.
Unless we can use the internal service route for model registry based on the default instructions then the default example should just break with an easily identifiable message in the value.
If we stick to the instructions from the model-registry repo for deploying modelregistry-sample
, you can use this as a default
value: "modelregistry-sample-http-default.apps.srbiswas-cluster2.dev.datahub.redhat.com" | |
value: "modelregistry-sample.odh-model-registry.svc:8080" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an oversight. Thanks for catching this :)
value: $(params.model-dir) | ||
- name: fetch-model | ||
value: $(params.fetch-model) | ||
- name: modelRelativePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just repeating, so it doesn't get lost. This is still not fixed:
Missing:
Suggested change
- name: modelRelativePath value: $(params.modelRelativePath) - name: s3-bucket-name value: $(params.s3-bucket-name) Also, it would be good to have the parameters in the same order both in the Pipeline here and in the Task.
Another improvement can be to have thes3-bucket-name
parameter without the default value in the task. This way, the pipeline would most likely fail that the pipeline is not passing the parameter to the task. But once properly linked, it won't be failing because the pipeline parameters3-bucket-name
itself has a default value of""
.
Therefore, I would:
- Make the order of params here and in the task definition same.
- Remove any default values for the task params, so if any of the param is not set by the pipeline, it will fail.
- Add the
s3-bucket-name
parameter here as it is still missing.
type: string | ||
- name: model-version | ||
type: string | ||
#Values to be passed by pipeline run when model-register is disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Values to be passed by pipeline run when model-register is disabled | |
#Values to be passed by pipeline run when model registry is disabled |
kind: PipelineRun | ||
metadata: | ||
labels: | ||
tekton.dev/pipeline: aiedge-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tekton.dev/pipeline: aiedge-e2e | |
tekton.dev/pipeline: aiedge-e2e | |
model-name: tensorflow-housing |
- name: model-registry-enabled | ||
value: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea for better readability, try to move this model-registry param to the model-registry-hostname, so it is grouped.
tekton.dev/pipeline: aiedge-e2e | ||
generateName: aiedge-e2e- | ||
spec: | ||
params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of provided param values is not complete, therefore, the pipeline run is not runnable. Parameters without default values must be provided and I have identified that these are missing:
- modelRelativePath
- fetch-model
"$(params.model-registry-hostname)/api/model_registry/v1alpha3/registered_models/$MODEL_ID/versions?" -H 'accept: application/json' | \ | ||
jq -r --arg MODEL_VERSION "$(params.model-version)" '.items[] | select(.name == $MODEL_VERSION).id') | ||
|
||
echo -n "MODEL ID is $MODEL_ID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These echo -n
statements are causing the output statements to be combined on a single line
MODEL ID is 15MODEL VERSION ID is 16fetching parameters from model-registry/tekton/scripts/script-0-rq6wc: line 49: export: `S3-BUCKET-NAME=': not a valid identifier
|
||
set -Eeuo pipefail | ||
|
||
if [ "$(params.model-registry-enabled)" == "true" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent of the model-registry-download
Task is to be run only when fetching from the ODH model registry then this should be a "known" requirement for successful execution. We can move this logic check to the Pipeline using the Tekton when.
Inside of the task, we should only be focused on checking that there required input parameters are valid and show a clear error when the tasks execution fails
echo -n "MODEL ID is $MODEL_ID" | ||
echo -n "MODEL VERSION ID is $MODEL_VERSION_ID" | ||
|
||
if [[ ! -z "$MODEL_ID" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that an empty MODEL_ID
is an unrecoverable failure, we should exit with a non-zero error code to prevent any further execution of the pipeline steps since the model information is missing
if [[ ! -z "$MODEL_ID" ]]; then | ||
echo -n "fetching parameters from model-registry" | ||
|
||
export FETCH_MODEL=$(curl --silent -X 'GET' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are running a GET against the same endpoint to save values from the response, can we just cache the json response after the first call and use that for all of the variable assignments?
If we expect an update to the values between each query then there should be a comment to show why so that we don't optimize the calls in a future update.
jq -r --arg MODEL_PARAM "modelRelativePath" '.customProperties[$MODEL_PARAM].string_value') | ||
echo -n "$MODEL_RELATIVE_PATH" | tee $(results.modelRelativePath.path) | ||
|
||
export S3-BUCKET-NAME=$(curl --silent -X 'GET' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
is not allowed in bash shell variable names. You'll need to change this to _
export S3-BUCKET-NAME=$(curl --silent -X 'GET' \ | |
export S3_BUCKET_NAME=$(curl --silent -X 'GET' \ |
export S3-BUCKET-NAME=$(curl --silent -X 'GET' \ | ||
"$(params.model-registry-hostname)/api/model_registry/v1alpha3/model_versions/$MODEL_VERSION_ID" -H 'accept: application/json' | \ | ||
jq -r --arg MODEL_PARAM "s3-bucket-name" '.customProperties[$MODEL_PARAM].string_value') | ||
echo -n "$S3-BUCKET-NAME" | tee $(results.s3-bucket-name.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above for variable name change
echo -n "$S3-BUCKET-NAME" | tee $(results.s3-bucket-name.path) | |
echo -n "$S3_BUCKET_NAME" | tee $(results.s3-bucket-name.path) |
Adding a reply for historical reference.
There was a discussion about adding a CLI to support this entire workflow but the scope of the work requires collaboration with the model registry in order to differentiate from the functionality included in the tekton cli.
From an offline conversation with @biswassri about #246 (comment), the intent was to use the model-registry as an abstraction layer for retrieving the model storage information. Long term, I think this approach is the better solution as we want to have tasks that can be used to fetch from git or s3 directly with the option of putting some type of registry abstraction task in front that can retrieve the storage information |
/hold I'm putting this PR on hold while @biswassri adds changes based on recent comments |
Adding support to use model-registry and fetch model properties from it when enabled and to continue supporting the existing pipeline without MR deployed.
JIRA: RHOAIENG-4146
Description
Added a Task that fetches properties stored in model registry when the parameter
model-registry-enabled
is true. So far it only fetches the model source details (git repo, revision etc.) and not the containerfile details.The added task
fetch-data
runs always, whenmodel-registry-enabled
is set totrue
then the Task connects to MR and fetches all the model related properties from the MR specified in themodel-registry-hostname
. The fetch-git and fetch-s3 tasks runs based on the results of fetch-model from this task. When set tofalse
it passes on the properties from the PipelineRun file.Below is a sample curl command used to save the model properties in the MR.
How Has This Been Tested?
serviceRoute
set to enabled state.model-registry-hostname
parameter to the pipeline run file$oc apply -k tekton/aiedge-e2e/ $oc create -f tekton/aiedge-e2e/aiedge-e2e.tensorflow-housing.pipelinerun.yaml
You'd expect the below green pipelineRun :
Merge criteria: