New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-29975: Allow multiple machine networks #6071
base: master
Are you sure you want to change the base?
Conversation
It's confusing that GetMachineNetworksFromBootstrapHost() returns the existing MachineNetworks in the cluster (and does *not* get them from the bootstrap host) if they already exist there. Refactor to make the logic clearer.
The network type is set for all platforms in getBasicInstallConfig(). There is no need to set it again in the none platform provider.
@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zaneb The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/jira refresh |
@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Infra issues with the CentOS repos |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6071 +/- ##
==========================================
- Coverage 68.40% 68.36% -0.04%
==========================================
Files 239 239
Lines 35453 35474 +21
==========================================
+ Hits 24251 24252 +1
- Misses 9098 9119 +21
+ Partials 2104 2103 -1
|
68f0945
to
b996310
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what are the restrictions here:
Do we allow multiple API VIPs per address family?
Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.
What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?
Can a machine-network be stale (without hosts)?
@@ -21,7 +22,7 @@ const MinSNOMachineMaskDelta = 1 | |||
func parseCIDR(cidr string) (ip net.IP, ipnet *net.IPNet, err error) { | |||
ip, ipnet, err = net.ParseCIDR(cidr) | |||
if err != nil { | |||
err = errors.Wrapf(err, "Failed to parse CIDR '%s'", cidr) | |||
err = fmt.Errorf("Failed to parse CIDR '%s': %w", cidr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this was changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's a pain to import both the standard library "errors" package and the deprecated old-timey hack "github.com/pkg/errors" package in the same file. This is a trivial refactor, the result is the same.
if err != nil { | ||
return err | ||
} | ||
|
||
if overlap { | ||
return errors.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr) | ||
return fmt.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not change the errors unless necessary
log.WithError(err).Warnf("Verify VIPs") | ||
return common.NewApiError(http.StatusBadRequest, err) | ||
} | ||
} | ||
|
||
} else { | ||
primaryMachineNetworkCidr, err = network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired) | ||
primaryMachineNetworkCidr, err := network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be wrong according the concept presented by this PR, because we may have to machine-network - one per vip and one of both vips.
if err := checkCidrsOverlapping(c.cluster); err != nil { | ||
return ValidationFailure, fmt.Sprintf("CIDRS Overlapping: %s.", err.Error()) | ||
if err := network.VerifyNoNetworkCidrOverlaps(c.cluster.ClusterNetworks, c.cluster.MachineNetworks, c.cluster.ServiceNetworks); err != nil { | ||
return ValidationFailure, err.Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation error messages need to be agreed upon. They are presented by UI. Here it is not clear what is expected as text of this error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I did change them, you can see the change in the tests here: 51e76b1#diff-854f83029709261bb4e532a5a6839b51ddeb2beb2f6e921056016a74340d06a3
@@ -600,7 +600,7 @@ func (v *validator) belongsToMachineCidr(c *validationContext) (ValidationStatus | |||
if c.inventory == nil || !network.IsMachineCidrAvailable(c.cluster) { | |||
return ValidationPending, "Missing inventory or machine network CIDR" | |||
} | |||
if !network.IsHostInPrimaryMachineNetCidr(v.log, c.cluster, c.host) { | |||
if !network.IsHostInMachineNetCidrs(v.log, c.cluster, c.host) { | |||
return ValidationFailure, "Host does not belong to machine network CIDRs. Verify that the host belongs to every CIDR listed under machine networks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is not true now. Right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, last part is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording is awkward here, because we still have to belong to both IPv4 and IPv6 CIDRs if both appear in the list. What do you think of "Verify that the host belongs to a CIDR for every IP version listed under machine networks"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be OK. @avishayt what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Verify that the host belongs to a machine network CIDR for every used IP version"?
return models.VipVerificationUnverified, errors.Errorf("%s <%s> cannot be set if Machine Network CIDR is empty", vipName, vip) | ||
} | ||
if !ipInCidr(vip, machineNetworkCidr) { | ||
return models.VipVerificationFailed, errors.Errorf("%s <%s> does not belong to machine-network-cidr <%s>", vipName, vip, machineNetworkCidr) | ||
if machineNetworkCidr == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is irrelevant. The machine network cannot be valid and empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can - validMachineNetwork will be true if any machine networks are defined, but machineNetworkCidr will only be set if the VIP is in one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you don't need the test for valid-network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is maintaining the previous behaviour. If there are no machineNetworks specified, we return VipVerificationUnverified
. If there are machineNetworks specified and the VIP isn't in any of them, we return VipVerificationFailed
.
I think what's confusing in the diff is that machineNetworkCidr
was previously a parameter, and now it's a local variable.
b996310
to
c6e01dc
Compare
None of the subnets specified in any of the machineNetworks, clusterNetworks, or serviceNetworks should overlap. Validate all of these combinations, as openshift-installer does, instead of making assumptions about indices being aligned to address families.
Don't make assumptions about a 1:1 mapping between MachineNetworks and VIPs.
In the belongs-to-machine-cidr validation, allow the host to be a member of any MachineNetwork. In a dual-stack cluster, require it to be a member of both an IPv4 and an IPv6 network. Previously it was assumed that the only reason for multiple MachineNetworks to appear was that a dual stack cluster could contain exactly one IPv4 and one IPv6 MachineNetwork.
Multiple MachineNetworks in the same address family and IPv6-primary dual-stack clusters are a thing, so relax the dual-stack validation requirements for machine networks to allow them.
Allow users to specify multiple machine networks of the same address family. This is a documented and supported feature of OpenShift. This reverts commit 873dd81.
Don't restrict ourselves to the first machine networking when looking for an interface on a machine network to set the BootMACAddress.
OpenShift has ~always supported having machines in multiple machineNetworks, so update the TODO comment to reflect that accounting for this is already something we need to do to fully support non-UserManagedNetworking clusters. (UserManagedNetworking clusters use only the L3 connectivity check.) See https://issues.redhat.com/browse/OCPBUGS-30730 for more details.
c6e01dc
to
ff9be84
Compare
No.
No.
Yes (and in fact this is likely to be a common case for
I don't see why not. |
@zaneb: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
So workers do not have to belong to machine-networks of ingress-vips? Maybe we have to verify that we have at least 2 workers belonging to machine-network of ingress VIPs ? If not it will cause a mess. The logic is already complicated and if we don't set rules, users will be confused.
None platform does not have machine-networks at all (At least how we implemented it).
Why is it needed? |
Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).
None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)
This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.
(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the
belongs-to-majority-group
host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist