You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is "expected" as the precondition needed (aka the pod running which it tries to access) is not met. However it feels even then weird that this is a segfault rather than an error. Though it shouldnt have went into this in the first place.
General Information
Cilium CLI version (run cilium version)
cilium-cli: v0.15.20 compiled with go1.21.6 on linux/amd64
cilium image (default): v1.14.5
cilium image (stable): v1.14.6
cilium image (running): 1.14.6
Orchestration system version in use (e.g. kubectl version, ...)
Client Version: v1.28.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2
This is rather unclear. However here are some known hints:
The helm chart deployed is:
---
bpf:
hostLegacyRouting: false
masquerade: true
cluster:
# -- Name of the cluster. Only required for Cluster Mesh and mutual authentication with SPIRE.
name: <redacted>
# -- (int) Unique ID of the cluster. Must be unique across all connected
# clusters and in the range of 1 to 255. Only required for Cluster Mesh,
# may be 0 if Cluster Mesh is not used.
id: 0
cni:
customConf: false
uninstall: false
ipam:
operator:
clusterPoolIPv4PodCIDRList:
- 10.245.0.0/16
clusterPoolIPv6PodCIDRList:
- fd00::/104
operator:
unmanagedPodWatcher:
restart: true
prometheus:
enabled: true
serviceMonitor:
enabled: true
dashboards:
enabled: true
policyEnforcementMode: default
kubeProxyReplacement: "true"
routingMode: tunnel
tunnelProtocol: vxlan
#tunnelProtocol: geneve
tunnel: vxlan
tunnelPort: 8473
sessionAffinity: true
prometheus:
enabled: true
serviceMonitor:
enabled: true
dashboards:
enabled: true
hubble:
relay:
enabled: true
prometheus:
enabled: true
ui:
enabled: true
metrics:
enabled:
- dns
- tcp
- httpV2
metrics:
enableOpenMetrics: true
enabled:
- dns:query;ignoreAAAA
- drop
- flow
- flows-to-world
- httpV2:exemplars=true;labelsContext=source_ip
# - source_namespace
# - source_workload
# - destination_ip
# - destination_namespace
# - destination_workload
# - traffic_direction
- icmp
- port-distribution
- tcp
endpointStatus:
enabled: true
status: "policy"
nodePort:
enabled: false
# Turn on after migration
l2announcements:
enabled: true
k8sClientRateLimit:
qps: 50
burst: 100
k8sServiceHost: <redacted>
k8sServicePort: 6443
ipv6:
enabled: true
rollOutCiliumPods: true
# Possibly broken
#enableIPv6Masquerade: false
#nat46x64Gateway:
# enabled: true
The cluster at one point had wireguard encryption between nodes enabled via cilium which didnt work and hence was rolled back on the control plane. Since the nodes were locked out I did remove them via the normal kubeadm way and then readded them under the same node names.
I believe I got whats going on. The worker nodes have an arm taint. The daemonset however does not allow that. Hence only one of 3 pods is being started. This leads to "serverHost" being an empty variable. Which probably then segfaults
Ok I confirmed it. the segfault is caused by the daemonset not working nicely with the taints. I will leave this open though as I believe it should be a test failure rather than a segfault :)
Bug report
The tests fail with the following segfault:
This is "expected" as the precondition needed (aka the pod running which it tries to access) is not met. However it feels even then weird that this is a segfault rather than an error. Though it shouldnt have went into this in the first place.
General Information
cilium version
)kubectl version
, ...)Bare-metal kubeadm cluster. control plane node is on a gentoo with a 6.1.60 kernel and 2 workers running nixos are on 6.7.1.
The control-plane is x86 and the 2 workers are arm64. All 3 nodes are allowed to run pods
A lot of info is over at https://cilium.slack.com/archives/C1MATJ5U5/p1706192594540579
cilium sysdump
(Hosted via matrix since its 2MB larger than what github allows here :( )
https://matrix.org/_matrix/media/v3/download/midnightthoughts.space/64ef2c6b31d3c8edab052443335f220439e64fb51750678141078077440
How to reproduce the issue
This is rather unclear. However here are some known hints:
The helm chart deployed is:
The cluster at one point had wireguard encryption between nodes enabled via cilium which didnt work and hence was rolled back on the control plane. Since the nodes were locked out I did remove them via the normal kubeadm way and then readded them under the same node names.
The slack thread lead me to look at https://github.com/cilium/cilium-cli/blob/v0.15.20/connectivity/check/features.go#L185 which presumably is the precondition to be met for the tests to run. All 3 nodes however return:
for
cilium status -o json
being run in the respective pods.This is the state I am at
The text was updated successfully, but these errors were encountered: