Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPProxy: add cluster outlierDetection #5575

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yangyy93
Copy link
Member

add cluster outlierDetection.
ref: #5317
design: #5460

@yangyy93 yangyy93 requested a review from a team as a code owner July 24, 2023 08:41
@yangyy93 yangyy93 requested review from tsaarni and stevesloka and removed request for a team July 24, 2023 08:41
@yangyy93 yangyy93 added the release-note/minor A minor change that needs about a paragraph of explanation in the release notes. label Jul 24, 2023
@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Attention: Patch coverage is 75.29412% with 21 lines in your changes are missing coverage. Please review.

Project coverage is 81.53%. Comparing base (5f1b981) to head (3a2f324).
Report is 82 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5575      +/-   ##
==========================================
- Coverage   81.56%   81.53%   -0.04%     
==========================================
  Files         133      133              
  Lines       15801    15886      +85     
==========================================
+ Hits        12888    12952      +64     
- Misses       2617     2629      +12     
- Partials      296      305       +9     
Files Coverage Δ
cmd/contour/servecontext.go 85.98% <100.00%> (+0.03%) ⬆️
internal/dag/dag.go 98.40% <ø> (ø)
internal/envoy/v3/cluster.go 96.62% <100.00%> (+0.29%) ⬆️
internal/protobuf/helpers.go 92.30% <100.00%> (+0.64%) ⬆️
pkg/config/parameters.go 88.03% <ø> (ø)
cmd/contour/serve.go 22.74% <50.00%> (+0.08%) ⬆️
internal/dag/httpproxy_processor.go 91.11% <33.33%> (-0.30%) ⬇️
internal/envoy/cluster.go 94.66% <77.77%> (-5.34%) ⬇️
internal/dag/policy.go 93.57% <67.56%> (-1.96%) ⬇️

@izturn izturn requested review from skriss and sunjayBhatia and removed request for tsaarni and stevesloka August 1, 2023 03:02
@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 16, 2023
@yangyy93 yangyy93 marked this pull request as draft August 22, 2023 09:20
@yangyy93 yangyy93 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2023
@github-actions
Copy link

github-actions bot commented Sep 6, 2023

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2023
@github-actions
Copy link

github-actions bot commented Oct 6, 2023

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this Oct 6, 2023
@yangyy93
Copy link
Member Author

yangyy93 commented Oct 8, 2023

not stale

@yangyy93 yangyy93 reopened this Oct 8, 2023
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2023
@yangyy93 yangyy93 marked this pull request as ready for review October 20, 2023 03:06
@davinci26
Copy link
Contributor

Drive by comment:

Can we also add https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#config-bootstrap-v3-clustermanager-outlierdetection

From the Envoy docs

A log of outlier ejection events can optionally be produced by Envoy. This is extremely useful during daily operations since global stats do not provide enough information on which hosts are being ejected and for what reasons. The log is structured as protobuf-based dumps of OutlierDetectionEvent messages. Ejection event logging is configured in the Cluster manager outlier detection configuration.

Which makes the above o11y seem like a good idea

@davinci26
Copy link
Contributor

Also it would be nice if we could have this also applied to extension services as well such ext_auth

Copy link
Contributor

@davinci26 davinci26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would love to have the logs mentioned above and some small changes

}

// OutlierDetection defines the configuration for outlier detection on a service.
type OutlierDetection struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a link to the Envoy docs would be nice here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add a link to envoy's documentation above the comment

var interval, baseEjectionTime, maxEjectionTime, maxEjectionTimeJitter time.Duration

if outlierDetection.Interval != nil {
interval, err = time.ParseDuration(ref.Val(outlierDetection.Interval, "10s"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export the constant here into a variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t quite understand what needs to be modified. Could you please explain it in detail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting we should take "10s" and all other default constants to cost so we can easily inspect those

}

if outlierDetection.BaseEjectionTime != nil {
baseEjectionTime, err = time.ParseDuration(ref.Val(outlierDetection.BaseEjectionTime, "30s"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

out.BaseEjectionTime = baseEjectionTime
}

if outlierDetection.MaxEjectionTime != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@yangyy93
Copy link
Member Author

Drive by comment:

Can we also add https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#config-bootstrap-v3-clustermanager-outlierdetection

From the Envoy docs

A log of outlier ejection events can optionally be produced by Envoy. This is extremely useful during daily operations since global stats do not provide enough information on which hosts are being ejected and for what reasons. The log is structured as protobuf-based dumps of OutlierDetectionEvent messages. Ejection event logging is configured in the Cluster manager outlier detection configuration.

Which makes the above o11y seem like a good idea

thanks for your review, this is a good idea, I will add a switch to control whether logging is turned on or not.
outlier ejection events

@yangyy93
Copy link
Member Author

Also it would be nice if we could have this also applied to extension services as well such ext_auth

Now I just add outlier detection configuration on the services to be routed. I think we can discuss whether we should add relevant configuration on extension services.

@davinci26
Copy link
Contributor

Now I just add outlier detection configuration on the services to be routed. I think we can discuss whether we should add relevant configuration on extension services.

Works for me!

Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 19, 2023
Signed-off-by: yangyang <yang.yang@daocloud.io>
Copy link

github-actions bot commented Jan 4, 2024

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2024
@yangyy93 yangyy93 self-assigned this Jan 9, 2024
@yangyy93 yangyy93 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2024
# Conflicts:
#	cmd/contour/serve.go
#	internal/dag/dag.go
#	internal/dag/httpproxy_processor.go
#	internal/dag/policy.go
#	internal/dag/policy_test.go
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 27, 2024
# Conflicts:
#	site/content/docs/main/config/api-reference.html
Signed-off-by: yangyang <yang.yang@daocloud.io>
Signed-off-by: yangyang <yang.yang@daocloud.io>
Signed-off-by: yangyang <yang.yang@daocloud.io>
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 20, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@izturn izturn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
# Conflicts:
#	cmd/contour/serve.go
#	internal/dag/policy_test.go
#	internal/protobuf/helpers.go
Signed-off-by: yangyang <yang.yang@daocloud.io>
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 12, 2024
@izturn
Copy link
Member

izturn commented Apr 28, 2024

any progress?

Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 13, 2024
@izturn izturn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor A minor change that needs about a paragraph of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants