Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Settings Policy Attachment #1878

Closed

Conversation

kate-osborn
Copy link
Contributor

@kate-osborn kate-osborn commented Apr 24, 2024

Please review the approach only. Unit tests and lint fixes will be added after the approach is approved.

Proposed changes

Implement Client Settings Policy

Problems:

  • Users cannot attach Client Settings Policies to HTTPRoutes and Gateways (unimplemented).
  • Adding new policies to the code base is challenging because it requires changes to all code components.

Solution:

  • Implement Client Settings Policy attachment to HTTPRoutes and Gateways following the inheritance guidelines.
  • Introduce a new Policy interface that the Client Settings Policy and all future NGF Policies will implement. This interface extends the client.Object interface by giving us access to the TargetRef and the Status of the policies. This allows us to store and act on policies generically (to an extent).
  • Introduce PolicyValidator interface. The Graph uses the PolicyValidator to validate policies.
  • Introduce PolicyConfigGenerator interfaces. The dataplane package uses the PolicyConfigGenerator to generate the config for a policy. The generated config is a byte array.
  • Introduce the policy Manager object. This manager implements the PolicyValidator and PolicyConfigGenerator. Each policy must implement its own validator and generator and register it with the policy Manager.
  • Policy configuration is added to the nginx config using the include directive. Valid policies are written as nginx config to a file. The nginx config references the file at the appropriate attachment place using the include directive.

Steps to add a new NGF policy:

  • Update RBAC
  • Implement Policy Interface (eventually we should be able to generate these methods)
  • Register policy with the controller runtime manager
  • Implement validator and generator and register with the policy Manager
  • Add entry to changeTrackingUpserter using common policy store and policy stateChanged predicate.
  • If policy supports attaching to resources other than Gateways and HTTPRoutes, update stateChanged predicate and graph logic.
  • If policy attaches to contexts other than locations and servers, update nginx config generation to support new location.

Testing: Manual testing covering the following:

  • new CRD validation
  • Inherited policy behavior
  • Merge policy behavior
  • Conflict policy behavior
  • Policy Status

Closes #1792 #1760

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • [] I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart labels Apr 24, 2024
Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (b37071d) to head (4c44dc1).

Additional details and impacted files
@@             Coverage Diff              @@
##             main     #1878       +/-   ##
============================================
+ Coverage   86.20%   100.00%   +13.79%     
============================================
  Files          83         1       -82     
  Lines        5540       209     -5331     
  Branches       52        52               
============================================
- Hits         4776       209     -4567     
+ Misses        715         0      -715     
+ Partials       49         0       -49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@sjberman sjberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the approach.

}
}

var clientSettingsTemplate = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var clientSettingsTemplate = `
const clientSettingsTemplate = `

for _, pr := range route.ParentRefs {
ancestor := PolicyAncestor{
Ancestor: PolicyAncestorRef{
Kind: "Gateway",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I believe the ancestor should be the HTTPRoute instead of the Gateway. The policy affects the location blocks in the NGINX config which traces back to the HTTPRoute, not the Gateway.

I think we want an entry for each HTTPRoute in the ancestor status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ancestor language always confused me a little bit. Before reading it, I assumed ancestor just mean who the policy is attached to, since that's who it affects. But the language says to go all the way up the chain. So wouldn't everything just have Gateway as ancestor? What's the point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extremely confusing, and I've read it at least 10 times.

Here are my notes:

  • Describes the status of a policy with respect to an ancestor
  • Ancestors are objects that are either the Target of a policy or above it in the hierarchy.
  • Almost always, the Gateway will be the most useful object to place Policy status on. Implementations SHOULD use Gateway at the status object unless there’s a very good reason otherwise.
  • Ancestor is used to distinguish which resource results in a distinct application of this policy. For example, if a policy targets a Service, it may have a distinct result per attached Gateway.
  • Max number of ancestors is 16. if ancestors is full, the policy can not be implemented by any additional ancestor

I keep coming back to this bullet point:

Ancestor is used to distinguish which resource results in a distinct application of this policy. For example, if a policy targets a Service, it may have a distinct result per attached Gateway.

For this policy, when attached to a route, there is a distinct result per Route, not per Gateway. This is especially true if we want to support multiple target refs. Let's say a policy attaches to route foo and route bar. Both routes attach to the same Gateway listener, but foo is invalid. What would the ancestor status be if the ancestor object is the Gateway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I believe the ancestor should be the HTTPRoute instead of the Gateway. The policy affects the location blocks in the NGINX config which traces back to the HTTPRoute, not the Gateway.

What if Gateway has 1000+ HTTPRoutes? Will it blow up the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we also have the case where a policy targets a route that references multiple listeners on a gateway.

Route bar:

  • parentRefs:
    • name: gateway
      sectionName: listener-1
    • name: gateway
      sectionName: listener-2

Policy csp

  • targetRef:
    • name: bar
      kind: HTTPRoute

Let's say bar attaches to the listener-1 on the gateway, but not listener-2.

That means the policy will be implemented in the listener-1 server block, but not the listener-2 server block.

In this case, the ancestors should be the Gateway listeners:

- kind: Gateway
  name: gateway
  sectionName: listener-1
  conditions: 
  - Accepted/True/Accepted
- kind: Gateway
  name: gateway
  sectionName: listener-2
  conditions: 
  - Accepted/False/TargetRef failed to attach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think that Client Settings shouldn't support multiple target refs. Since it can be attached at a Gateway, I don't think there's a big need for multiple target refs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pleshakov

What if Gateway has 1000+ HTTPRoutes? Will it blow up the status?

When attaching to a Gateway, the ancestor object would be the Gateway so I don't see this as an issue.

However, I do see an issue with using HTTPRoutes as the ancestor. See my last two comments

staticConds.NewPolicyTargetNotFound("TargetRef is invalid"),
)
} else if policy.Valid {
ancestor.Conditions = append(ancestor.Conditions, staticConds.NewPolicyAccepted())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll remove this and add the policy accepted when building the status.

return ancestor, false
}

ancestor.Conditions = append(ancestor.Conditions, staticConds.NewPolicyAccepted())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about moving the accepted status.

)

// attachPolicies attaches the graph's processed policies to the resources they target. It modifies the graph in place.
func (g *Graph) attachPolicies() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add logic to verify that there's enough room in the ancestors slice to attach the policy. Similar to this function:

func validateAncestorMaxCount(backendTLSPolicy *v1alpha2.BackendTLSPolicy, ctlrName string, gateway *Gateway) error {

@ciarams87
Copy link
Member

Great work @kate-osborn! I like the approach, looks good!

Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kate-osborn great approach!

I see the following potential problems in the future:

  • if a Policy needs to modify existing directives (like adding slow start to upstream server)
  • if a Policy can't merge config based on NGINX config - needs NGF-level based logic. Example - auth. Auth in location resets any authentication in server context.
  • If multiple Policies produce the same directives (ex. proxy_set_headers for a particular feature... )

@@ -87,6 +89,7 @@ type ChangeProcessorImpl struct {
updater Updater
// getAndResetClusterStateChanged tells if and how the cluster state has changed.
getAndResetClusterStateChanged func() ChangeType
extractGVK extractGVKFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look like it is used anywhere except NewChangeProcessorImpl

for _, pr := range route.ParentRefs {
ancestor := PolicyAncestor{
Ancestor: PolicyAncestorRef{
Kind: "Gateway",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I believe the ancestor should be the HTTPRoute instead of the Gateway. The policy affects the location blocks in the NGINX config which traces back to the HTTPRoute, not the Gateway.

What if Gateway has 1000+ HTTPRoutes? Will it blow up the status?

internal/mode/static/state/store.go Show resolved Hide resolved

// NewClientSettingsGeneratorFunc returns a function that generates configuration as []byte for a ClientSettingsPolicy.
func NewClientSettingsGeneratorFunc() func(policy policies.Policy) []byte {
return func(policy policies.Policy) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to return a function here?
the returned function has the same signature as the generator function, so NewClientSettingsGeneratorFunc() always create the same function

internal/mode/static/state/graph/graph.go Show resolved Hide resolved
Source: policy,
Valid: len(conds) == 0,
Conditions: conds,
TargetRef: PolicyTargetRef{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like we're attaching already here (relevant to my previous comment about similarity of functions for processing and attaching Policies)

},
)

for i := range policyList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is a bit complicated (imho)
it looks like every policy could conflict with every other policy.
However, once we iterate over loop, we for j := i + 1... , we mark some conflicted policies as invalid, which I think combined with

				if !policyList[i].Valid {
					continue
				}

might free some other policies later in the sorted order from conflicts...

I wonder if it is possible to add a comment that clarifies the logic


for _, policyList := range possibles {
if len(policyList) > 1 {
sort.SliceStable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why stable is needed.
ClientObject sorts based on timestamp, namespace, and name, which should give a unique position in the order. So not sure why stable is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a copy-paste. I'll update

@kate-osborn
Copy link
Contributor Author

@pleshakov thanks for the review and the great callouts! See below for my perspective on the future risks

if a Policy needs to modify existing directives (like adding slow start to upstream server)

Yes, this solution only works for pure additions to the NGINX config, but I think it can be extended in the future to handle modifications. What if I changed some of the names to make it clear that these are additions and we plan on extending this solution to support modification when we implement a policy that involves modifications?

if a Policy can't merge config based on NGINX config - needs NGF-level based logic. Example - auth. Auth in location resets any authentication in server context.

That is an issue I didn't consider with this solution. Your example of auth, however, doesn't strictly apply since we are planning on implementing auth as a filter. I think almost all nginx directives follow the standard nginx inheritance behavior. Can we punt on dealing with this until we have a policy that needs special merge rules?

If multiple Policies produce the same directives (ex. proxy_set_headers for a particular feature... )

We should avoid adding policies that produce the same directives. Policies should not conflict with one another. I think this should influence how we create policies and not something we should enforce in the code.

@pleshakov
Copy link
Contributor

@kate-osborn

What if I changed some of the names to make it clear that these are additions and we plan on extending this solution to support modification when we implement a policy that involves modifications?

👍

Can we punt on dealing with this until we have a policy that needs special merge rules?

👍

We should avoid adding policies that produce the same directives. Policies should not conflict with one another.

Things I had in mind are auth-related again - propagating auth results to backends via headers.

Since we're gonna start with filters for auth, as you mentioned, we don't need to consider it now

@kate-osborn
Copy link
Contributor Author

I will re-open once the PR is ready.

@kate-osborn kate-osborn closed this May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

ClientSettingsPolicy for Routes
4 participants