Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KServe Blue Green Deployment issues with Virtual Service #1654

Open
satyajitghana opened this issue Nov 22, 2023 · 9 comments
Open

KServe Blue Green Deployment issues with Virtual Service #1654

satyajitghana opened this issue Nov 22, 2023 · 9 comments
Labels
kind/bug Something isn't working

Comments

@satyajitghana
Copy link

Here's the Virtual Service

❯ k describe vs vit-classifier
Name:         vit-classifier
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networking.istio.io/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2023-11-22T17:09:06Z
  Generation:          3
  Owner References:
    API Version:     v1
    Kind:            configmap
    Name:            vit-classifier-routemap
    UID:             18a2f121-b763-499c-b2f2-8168de4de465
  Resource Version:  72552
  UID:               c77942e2-016d-4fa4-b3a6-5ea00dc957a4
Spec:
  Gateways:
    knative-serving/knative-ingress-gateway
    knative-serving/knative-local-gateway
    mesh
  Hosts:
    vit-classifier.default
    vit-classifier.default.svc
    vit-classifier.default.svc.cluster.local
  Http:
    Match:
      Headers:
        Branch:
          Exact:  vit-classifier-0
    Name:         vit-classifier-0
    Rewrite:
      Uri:  /v2/models/vit-classifier-0/infer
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Remove:
            branch
          Set:
            Host:  vit-classifier-0-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-0
    Match:
      Headers:
        Branch:
          Exact:  vit-classifier-1
    Name:         vit-classifier-1
    Rewrite:
      Uri:  /v2/models/vit-classifier-1/infer
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Remove:
            branch
          Set:
            Host:  vit-classifier-1-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-1
    Name:                   split
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Set:
            Branch:  vit-classifier-0
            Host:    vit-classifier.default
      Weight:        70
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Set:
            Branch:  vit-classifier-1
            Host:    vit-classifier.default
      Weight:        30
Events:              <none>

Which doesn't work

/demo $ curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -
HTTP/1.1 404 Not Found
content-length: 22
content-type: application/json
date: Wed, 22 Nov 2023 17:33:49 GMT
server: envoy
x-envoy-upstream-service-time: 6
app-version: vit-classifier-1

But if i change

Headers:
        Request:
          Set:
            Branch:  vit-classifier-1
            Host:    vit-classifier.default

to

Headers:
        Request:
          Set:
            Branch:  vit-classifier-1
            Host:    vit-classifier-0-predictor.default.svc.cluster.local

It works!

❯ k describe vs/vit-classifier
Name:         vit-classifier
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networking.istio.io/v1beta1
Kind:         VirtualService
Metadata:
  Creation Timestamp:  2023-11-22T17:09:06Z
  Generation:          10
  Owner References:
    API Version:     v1
    Kind:            configmap
    Name:            vit-classifier-routemap
    UID:             18a2f121-b763-499c-b2f2-8168de4de465
  Resource Version:  86112
  UID:               c77942e2-016d-4fa4-b3a6-5ea00dc957a4
Spec:
  Gateways:
    knative-serving/knative-ingress-gateway
    knative-serving/knative-local-gateway
    mesh
  Hosts:
    vit-classifier.default
    vit-classifier.default.svc
    vit-classifier.default.svc.cluster.local
  Http:
    Match:
      Headers:
        Branch:
          Exact:  vit-classifier-0
    Name:         vit-classifier-0
    Rewrite:
      Uri:  /v2/models/vit-classifier-0/infer
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Remove:
            branch
          Set:
            Host:  vit-classifier-0-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-0
    Match:
      Headers:
        Branch:
          Exact:  vit-classifier-1
    Name:         vit-classifier-1
    Rewrite:
      Uri:  /v2/models/vit-classifier-1/infer
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Remove:
            branch
          Set:
            Host:  vit-classifier-1-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-1
    Name:                   split
    Route:
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Set:
            Branch:  vit-classifier-0
            Host:    vit-classifier-1-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-0
      Weight:               70
      Destination:
        Host:  knative-local-gateway.istio-system.svc.cluster.local
      Headers:
        Request:
          Set:
            Branch:  vit-classifier-1
            Host:    vit-classifier-0-predictor.default.svc.cluster.local
        Response:
          Add:
            App - Version:  vit-classifier-1
      Weight:               30

/demo $ curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -
HTTP/1.1 200 OK
content-length: 27
content-type: application/json
date: Wed, 22 Nov 2023 17:40:48 GMT
server: envoy
x-envoy-upstream-service-time: 4
app-version: vit-classifier-0

/demo $ curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -
HTTP/1.1 200 OK
content-length: 27
content-type: application/json
date: Wed, 22 Nov 2023 17:40:49 GMT
server: envoy
x-envoy-upstream-service-time: 3
app-version: vit-classifier-1

{"models":["imagenet-vit"]}
@satyajitghana satyajitghana added the kind/bug Something isn't working label Nov 22, 2023
@kalantar
Copy link
Member

@satyajitghana could you share the commands you used to start the test?

@satyajitghana
Copy link
Author

satyajitghana commented Nov 23, 2023

Here's the deployment: https://github.com/satyajitghana/canary-argocd-iter8-kserve/blob/master/vit-classifier/values.yaml

It was used with iter8 release helm chart

after which

I used the example container mentioned in iter8 blue green deployment

curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.18.4/samples/kserve-serving/sleep.sh | sh -

kubectl exec --stdin --tty "$(kubectl get pod --sort-by={metadata.creationTimestamp} -l app=sleep -o jsonpath={.items..metadata.name} | rev | cut -d' ' -f 1 | rev)" -c sleep -- /bin/sh

and tried to run

curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -

this simply tries to list the models

What I've observed is that,

this endpoint works fine

vit-classifier-0-predictor.default.svc.cluster.local

but this doesn't

vit-classifier.default

If i set the headers in curl it works

curl -H 'Content-Type: application/json' -H "Host: vit-classifier-0-predictor.default.svc.cluster.local"  http://vit-classifier.default/v1/models -s -D -

Also one more thing, how do i add additional domain name to my deployment?

NAME                                 GATEWAYS                                                                                     HOSTS                                                                                                                                                                                   AGE
vit-classifier                       ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway","mesh"]   ["vit-classifier.default","vit-classifier.default.svc","vit-classifier.default.svc.cluster.local"]                                                                                      41m
vit-classifier-0                     ["knative-serving/knative-local-gateway","knative-serving/knative-ingress-gateway"]          ["vit-classifier-0.default.svc.cluster.local","vit-classifier-0.default.emlo.tsai"]                                                                                                     41m
vit-classifier-0-predictor-ingress   ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]          ["vit-classifier-0-predictor.default","vit-classifier-0-predictor.default.emlo.tsai","vit-classifier-0-predictor.default.svc","vit-classifier-0-predictor.default.svc.cluster.local"]   41m
vit-classifier-0-predictor-mesh      ["mesh"]                                                                                     ["vit-classifier-0-predictor.default","vit-classifier-0-predictor.default.svc","vit-classifier-0-predictor.default.svc.cluster.local"]                                                  41m
vit-classifier-1                     ["knative-serving/knative-local-gateway","knative-serving/knative-ingress-gateway"]          ["vit-classifier-1.default.svc.cluster.local","vit-classifier-1.default.emlo.tsai"]                                                                                                     41m
vit-classifier-1-predictor-ingress   ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]          ["vit-classifier-1-predictor.default","vit-classifier-1-predictor.default.emlo.tsai","vit-classifier-1-predictor.default.svc","vit-classifier-1-predictor.default.svc.cluster.local"]   41m
vit-classifier-1-predictor-mesh      ["mesh"]                                                                                     ["vit-classifier-1-predictor.default","vit-classifier-1-predictor.default.svc","vit-classifier-1-predictor.default.svc.cluster.local"]                                                  41m

Here you can see all Virutal Service have default.emlo.tsai but not vit-classifier which was created by iter8 controller. I've set the domain in KNative Serving using

kubectl patch configmap/config-domain \
      --namespace knative-serving \
      --type merge \
      --patch '{"data":{"emlo.tsai":""}}'

@kalantar
Copy link
Member

and tried to run

curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -

Please try:

curl -H 'Content-Type: application/json'  http://vit-classifier.default -s -D -

That is, remove the /v1/models. The virtual service that Iter8 creates identifies this hostname (vit-classifier.default) and rewrites the headers. In particular, adds the branch header. Another part of the virtual service then matches on this, rewrites the header/url and (finally) sends the request the blue or green endpoint.

When. you use the full host (-H "Host: vit-classifier-0-predictor.default.svc.cluster.local") you are sending the request to just this endpoint (the blue version) without the possibility of sending it to the green version.

@kalantar
Copy link
Member

Also one more thing, how do i add additional domain name to my deployment?

This is not a feature we considered. I am not familiar with config change you made. Is the requirement to just add the domain name to the list already provided? Or to replace it? Perhaps you could create a separate issue for this feature request?

@sriumcp
Copy link
Member

sriumcp commented Dec 5, 2023

@satyajitghana Is the issue resolved? Can this issue be closed?

Also, if you're starting to use Iter8, could we request you to kindly add yourself to this list of adopters?

We are looking to approach CNCF soon, and hoping to drive up adoption in the lead up to that.

@satyajitghana
Copy link
Author

satyajitghana commented Dec 6, 2023

and tried to run

curl -H 'Content-Type: application/json'  http://vit-classifier.default/v1/models -s -D -

Please try:

curl -H 'Content-Type: application/json'  http://vit-classifier.default -s -D -

That is, remove the /v1/models. The virtual service that Iter8 creates identifies this hostname (vit-classifier.default) and rewrites the headers. In particular, adds the branch header. Another part of the virtual service then matches on this, rewrites the header/url and (finally) sends the request the blue or green endpoint.

When. you use the full host (-H "Host: vit-classifier-0-predictor.default.svc.cluster.local") you are sending the request to just this endpoint (the blue version) without the possibility of sending it to the green version.

I've tried this also, the same issue, same error. were you able to reproduce it?

@satyajitghana
Copy link
Author

Also one more thing, how do i add additional domain name to my deployment?

This is not a feature we considered. I am not familiar with config change you made. Is the requirement to just add the domain name to the list already provided? Or to replace it? Perhaps you could create a separate issue for this feature request?

okay, agreed, should be a different issue.

the thing is i wanted to use iter8 as it works with KServe Raw Deployment, rather than using KNative with KServe.

@satyajitghana
Copy link
Author

@satyajitghana Is the issue resolved? Can this issue be closed?

Also, if you're starting to use Iter8, could we request you to kindly add yourself to this list of adopters?

We are looking to approach CNCF soon, and hoping to drive up adoption in the lead up to that.

We're not really using K8S in my company right now, but in a very small scale. I am teaching a course of MLOps of The School Of AI, where i teach all the ways of deploying a model on K8S, and i landed on Iter8 as it gives a way to deploy a version and test it.

@kalantar
Copy link
Member

kalantar commented Dec 6, 2023

the thing is i wanted to use iter8 as it works with KServe Raw Deployment

I don't think I have ever tried this. Were the problems you encountered in such a deployment? I tried (and failed to replicate your problem) but it was in a KNative based deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants