Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: External Processing support #5866

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

izturn
Copy link
Member

@izturn izturn commented Oct 19, 2023

Signed-off-by: Gang Liu gang.liu@daocloud.io

Design proposal to address #5123

Signed-off-by: gang.liu <gang.liu@daocloud.io>
Signed-off-by: gang.liu <gang.liu@daocloud.io>
@izturn izturn requested a review from a team as a code owner October 19, 2023 03:39
@izturn izturn requested review from stevesloka and sunjayBhatia and removed request for a team October 19, 2023 03:39
@izturn izturn self-assigned this Oct 19, 2023
@izturn izturn added kind/design Categorizes issue or PR as related to design. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/infra size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 19, 2023
Copy link

github-actions bot commented Nov 3, 2023

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2023
@izturn izturn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2023
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2023
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2023
@izturn
Copy link
Member Author

izturn commented Dec 13, 2023

Sorry, I am very busy this month, But this task will be my main focus next month

Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2023
@izturn izturn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2024
Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.56%. Comparing base (c3d6cb4) to head (3f09188).
Report is 31 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5866   +/-   ##
=======================================
  Coverage   81.56%   81.56%           
=======================================
  Files         133      133           
  Lines       15801    15801           
=======================================
  Hits        12888    12888           
  Misses       2617     2617           
  Partials      296      296           

Signed-off-by: gang.liu <gang.liu@daocloud.io>
Signed-off-by: gang.liu <gang.liu@daocloud.io>
Signed-off-by: gang.liu <gang.liu@daocloud.io>
Signed-off-by: gang.liu <gang.liu@daocloud.io>
Signed-off-by: gang.liu <gang.liu@daocloud.io>
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
Signed-off-by: gang.liu <gang.liu@daocloud.io>
@clayton-gonsalves clayton-gonsalves added the release-note/none-required Marks a PR as not requiring a release note. Should only be used for very small changes. label Mar 22, 2024
design/external-processing-design.md Outdated Show resolved Hide resolved
design/external-processing-design.md Outdated Show resolved Hide resolved
@wilsonwu
Copy link
Member

wilsonwu commented Mar 27, 2024

For some real use cases I meet, I think API/Route level customize logic is useful, because in some tranditional company, some of thier applications are not follow the standard of microservices and may have strange logics, so they need the Route level customize process to help only one API to do something, and rest of APIs are all work fine without this process, so I think if we can support Route level ext process, it will better.

Comment on lines 365 to 422
type ExtProc struct {
// GRPCService configure the gRPC service that the filter will communicate with.
//
// +optional
GRPCService *GRPCService `json:"grpcService,omitempty"`

// ProcessingMode describes which parts of an HTTP request and response are sent to a remote server
// and how they are delivered.
//
// +optional
ProcessingMode *ProcessingMode `json:"processingMode,omitempty"`

// MutationRules specifies what headers may be manipulated by a processing filter.
// This set of rules makes it possible to control which modifications a filter may make.
//
// for Overrides is must be nil
//
// +optional
MutationRules *HeaderMutationRules `json:"mutationRules,omitempty"`

// If true, the filter config processingMode can be overridden by the response message from the external processing server `mode_override``.
// If false, `mode_override` API in the response message will be ignored.
//
// +optional
AllowModeOverride bool `json:"allowModeOverride,omitempty"`
}

// ExternalProcessor defines a external processing filter and the policy for fine-grained at VirutalHost and/or Route level.
type ExternalProcessor struct {
// Processor defines a external processing filter which allows an external service to act on HTTP traffic in a flexible way.
//
// +optional
Processor *ExtProc `json:"processor,omitempty"`

// When true, this field disables the external processor: (neither global nor virtualHost)
// for the scope of the policy.
//
// if both Disabled and Processor are set. use disabled.
//
// +optional
Disabled bool `json:"disabled,omitempty"`
}

// ExtProcPolicy modifies how requests/responses are operated.
type ExtProcPolicy struct {
// When true, this field disables the specific client request external processor
// for the scope of the policy.
//
// if both disabled and overrides are set. use disabled.
//
// +optional
Disabled bool `json:"disabled,omitempty"`

// Overrides aspects of the configuration for this route.
//
// +optional
Overrides *ExtProc `json:"overrides,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation in PR I feel the HTTPProxy struct extproc options feel a bit fragmented

is there a reason we cant bundle them up into one struct member on the type VirtualHost struct {}

This is what i was thinking

type ExternalProcessingServer struct {
	// ExtensionServiceRef specifies the extension resource that will handle the client requests.
	//
	// +optional
	ExtensionServiceRef ExtensionServiceReference `json:"extensionRef,omitempty"`

	// ResponseTimeout sets how long the proxy should wait for responses.
	// Timeout durations are expressed in the Go [Duration format](https://godoc.org/time#ParseDuration).
	// Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
	// The string "infinity" is also a valid input and specifies no timeout.
	//
	// +optional
	// +kubebuilder:validation:Pattern=`^(((\d*(\.\d*)?h)|(\d*(\.\d*)?m)|(\d*(\.\d*)?s)|(\d*(\.\d*)?ms)|(\d*(\.\d*)?us)|(\d*(\.\d*)?µs)|(\d*(\.\d*)?ns))+|infinity|infinite)$`
	ResponseTimeout string `json:"responseTimeout,omitempty"`

	// If FailOpen is true, the client request is forwarded to the upstream service
	// even if the server fails to respond. This field should not be
	// set in most cases.
	//
	// +optional
	FailOpen bool `json:"failOpen,omitempty"`

	// ProcessingMode describes which parts of an HTTP request and response are sent to a remote server
	// and how they are delivered.
	//
	// +optional
	ProcessingMode *ProcessingMode `json:"processingMode,omitempty"`

	// MutationRules specifies what headers may be manipulated by a processing filter.
	// This set of rules makes it possible to control which modifications a filter may make.
	//
	// for Overrides is must be nil
	//
	// +optional
	MutationRules *HeaderMutationRules `json:"mutationRules,omitempty"`

	// If true, the filter config processingMode can be overridden by the response message from the external processing server `mode_override``.
	// If false, `mode_override` API in the response message will be ignored.
	//
	// +optional
	AllowModeOverride bool `json:"allowModeOverride,omitempty"`
	
	ExtProcPolicy *ExtProcPolicy  `json:"extProcPolicy,omitempty"`
}


...


type VirtualHost struct {
...
ExtProcServer ExternalProcessingServer `json:"extProcServer,omitempty"`
...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also keeps the UX similar to what we have for ratelimiting and extauth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clayton-gonsalves PTAL, BTW let our review the design first, then the implemention ;-)

Copy link
Contributor

@clayton-gonsalves clayton-gonsalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments on the configuration approach

Signed-off-by: gang.liu <gang.liu@daocloud.io>
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2024
@izturn
Copy link
Member Author

izturn commented Apr 28, 2024

ping @clayton-gonsalves

Comment on lines +163 to +178
externalProcessing:
disabled: false
processor:
extensionRef:
apiVersion: projectcontour.io/v1alpha1
name: extproc-extsvc3
namespace: extproc-test
failOpen: true
responseTimeout: 31s
processingMode:
requestBodyMode: NONE
requestHeaderMode: SKIP
requestTrailerMode: SKIP
responseBodyMode: NONE
responseHeaderMode: SEND
responseTrailerMode: SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, repeating the whole ExternalProcessing struct at route level is not optimal API - it gets too verbose and I think for other similar cases we have used different approach earlier.

What would you think about (re)using the approach that JWT verification implements, where we define one or more named processors at virtualhost level and then at route level refer to one processor by its name? In that case, one or more routes can refer to a processor without repeating the full set of parameters at each route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this approach, but did not adopt it for the following reasons:

  1. The current design is based on the auth implementation, not the jwt.

  2. From the Envoy configuration definition, the structure of jwt and ext_proc is very different. In jwt the external server(s) defined at the virtualhost level are only used as a temporary storage place for server(s) (and then referenced at the route level) and do not have a real effect. However, this is not the case with ext_proc. Therefore, if multiple server(s) are defined at the virtualhost level, then at each route level, the corresponding server(s) must be explicitly enabled or disabled, which increases the difficulty of configuration.

  3. The jwt scheme is similar to the scheme we designed in the first version. In our actual use, we have found that while this is flexible, it is more difficult to understand and use.

Based on the above reasons and our internal practices, I personally believe that the current design is more reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In jwt the external server(s) defined at the virtualhost level are only used as a temporary storage place for server(s) (and then referenced at the route level) and do not have a real effect. However, this is not the case with ext_proc. Therefore, if multiple server(s) are defined at the virtualhost level, then at each route level, the corresponding server(s) must be explicitly enabled or disabled, which increases the difficulty of configuration.

I might be missing something, since I do not see the difference:

The JWT API defines providers as temporary storage for server info as you say, but it also allows setting one as the default provider which is then automatically applied to all routes. Routes can still opt-out by disabling JWT verification explicitly, or require a specific (non-default) provider.

On the other hand, if I understood the current proposal correctly, for (non-default) route level external processor, the current API requires user to re-define the external processor server parameters again on each route, even in the case when two separate routes would share single processor. Wouldn't "distributing" the configuration like that lead into duplication of configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design prioritizes flexibility but results in excessive verbosity. It might be beneficial to introduce a vhost-level extproc setting to alleviate redundancy, especially within the same vhost/proxy.

The configuration on the config seems a bit in between to me where we provide flexibilty but in a sense limited. Historically, we've maintained global defaults in the configuration, managed by admins or cluster admins. IMO, service-specific settings within it feels somewhat out of place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra kind/design Categorizes issue or PR as related to design. release-note/none-required Marks a PR as not requiring a release note. Should only be used for very small changes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants