Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probe envoy proxy before marking Ingress ready #3500

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

sivanantha321
Copy link
Member

@sivanantha321 sivanantha321 commented Mar 6, 2024

What this PR does / why we need it:
We noticed random 404 erros in CI environment. This is due to the fact that we are marking ingress ready as soon as we created the top level virtual service but the the virtual service configuration is not propagated to the envoy proxies.

This PR probes the queue-proxy to check whether the virtual service is ready to receive traffic or not and marks the ingress condition accordingly.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #604

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Probe virtualservice before marking Ingress ready

Copy link

oss-prow-bot bot commented Mar 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sivanantha321
Once this PR has been reviewed and has the lgtm label, please assign yuzisun for approval by writing /assign @yuzisun in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sivanantha321 sivanantha321 changed the title Probe virtualservice before marking Ingress ready Probe envoy proxy before marking Ingress ready Mar 6, 2024
@sivanantha321 sivanantha321 force-pushed the add-ingress-prober branch 3 times, most recently from b7a6410 to 7b8cfc3 Compare March 7, 2024 09:04
@sivanantha321 sivanantha321 marked this pull request as ready for review March 7, 2024 09:53
@oss-prow-bot oss-prow-bot bot requested review from andyi2it and ckadner March 7, 2024 09:53
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try run CI tests multiple times to see if it actually fixes the flakiness?

@sivanantha321 sivanantha321 force-pushed the add-ingress-prober branch 4 times, most recently from ec7d3e7 to 7471983 Compare April 6, 2024 15:56
if isIngressReady(isvc) {
// When the ingress has already been marked Ready for this generation,
// then it must have been successfully probed. This exception necessary for the case
// of global resyncs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe improve the comment here?
if the ingress is ready, it will not be probed, the opposite you mentioned.

maybe just do if ! isIngressReady(isvc) and probe as needed. wouldn't it be better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the watched resources are periodically reconciled (by default for every 10 hours) by the controller irrespective of any change in the resource. In those scenarios it is unnecessary to probe the ingress, if it was already marked ready for the current generation (If it is marked true then it must be already successfully probed). I agree we can just use if ! isIngressReady(isvc) condition

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
)

var (
log = logf.Log.WithName("IngressReconciler")
// probeTimeout defines the maximum amount of time a request will wait
probeTimeout = 1 * time.Second
Transport http.RoundTripper = &http.Transport{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this configuration work well with plain text and ssl?

} else {
log.V(1).Error(fmt.Errorf("failed to probe ingress"), "Failed to probe ingress", "url", target)
}
return isReady, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move it do line 505?
can save creation of one var.

}
}

func TestMain(m *testing.M) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a test for ssl?

@@ -117,8 +116,6 @@ def predict_str(
logging.info("Sending Header = %s", headers)
logging.info("Sending url = %s", url)
logging.info("Sending request data: %s", input_json)
# temporary sleep until this is fixed https://github.com/kserve/kserve/issues/604
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR fixes the mentioned issue?

@@ -149,6 +150,10 @@ def test_lightgbm_v2_runtime_mlserver():
)
kserve_client.create(isvc)
kserve_client.wait_isvc_ready(service_name, namespace=KSERVE_TEST_NAMESPACE)
# wait for model ready. Currently, wait_model_ready does not support path based routing
time.sleep(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional for was used for test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky KFServing e2e tests with 404s
3 participants