After disabling ClusterMesh, Cilium seems to think it still exists despite it was disabled #32300

samip5 · 2024-05-02T08:40:11Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

I had used ClusterMesh for a moment, and after disabling it, Cilium's daemon health seems to suggest that it's enabled still? I have uninstalled it via Helm as well, but it made no difference.

Cilium Version

cilium-cli: v0.16.4 compiled with go1.22.1 on darwin/arm64
cilium image (default): v1.15.3
cilium image (stable): v1.15.4
cilium image (running): 1.15.4

Kernel Version

5.15.0-105-generic

Kubernetes Version

v1.28.2+k3s1

Regression

No response

Sysdump

cilium-sysdump-20240502-113512.zip

Relevant log output

$ ./k8s-cilium-exec.sh cilium-dbg status
==== detail from pod cilium-wqk57 , on node plex-server
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.28 (v1.28.2+k3s1) [linux/amd64]
Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumLocalRedirectPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict   [vlan.10     192.168.2.129 2001:14ba:7475:4900::500 2001:14ba:7475:4900:a236:9fff:fe18:55fb fe80::a236:9fff:fe18:55fb (Direct Routing)]
Host firewall:          Disabled
SRv6:                   Disabled
CNI Chaining:           none
Cilium:                 Ok   1.15.4 (v1.15.4-9b3f9a8c)
NodeMonitor:            Listening for events on 8 CPUs with 64x4096 of shared memory
IPAM:                   IPv4: 38/254 allocated from 10.40.0.0/24, IPv6: 38/18446744073709551614 allocated from fd94:9bde:1ebb::/64
ClusterMesh:            0/1 clusters ready, 0 global-services
   nebula: not-ready, 0 nodes, 0 endpoints, 0 identities, 0 services, 0 failures (last: never)
   └  Waiting for initial connection to be established
   └  remote configuration: expected=unknown, retrieved=unknown
   └  synchronization status: nodes=false, endpoints=false, identities=false, services=false
IPv4 BIG TCP:        Disabled
IPv6 BIG TCP:        Disabled
BandwidthManager:    Disabled
Host Routing:        BPF
Masquerading:        BPF   [vlan.10]   10.40.0.0/16 [IPv4: Enabled, IPv6: Enabled]
Controller Status:   234/236 healthy
  Name                                  Last success   Last error   Count   Message
  endpoint-2485-regeneration-recovery   never          54s ago      230     regeneration recovery failed
  remote-etcd-nebula                    never          3m23s ago    57      timed out while waiting for etcd session. Ensure that etcd is running on [https://nebula.mesh.cilium.io:2379]
Proxy Status:            OK, ip 10.40.0.92, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 196608, max 262143
Hubble:                  Ok         Current/Max Flows: 4095/4095 (100.00%), Flows/s: 129.40   Metrics: Ok
Encryption:              Disabled
Cluster health:                        Warning   cilium-health daemon unreachable
Modules Health:          Stopped(0) Degraded(1) OK(10) Unknown(3)

Anything else?

No response

Cilium Users Document

Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

I agree to follow this project's Code of Conduct

youngnick · 2024-05-03T04:58:09Z

Thanks for this issue @samip5, I can see there don't seem to be any directions for removing clustermesh on docs.cilium.io, so I'll mark this for the team's attention.

samip5 · 2024-05-03T09:31:16Z

Thanks for this issue @samip5, I can see there don't seem to be any directions for removing clustermesh on docs.cilium.io, so I'll mark this for the team's attention.

I did try the normal cilium clustermesh disable which I did find in the cli command help.

cilium/cilium#28763 decoupled the helm settings to enable the clustermesh-apiserver and provide the list of clusters to connect to. Let's reflect this change to the 'cilium clustermesh disable' command as well, explicitly disabling and resetting the remote clusters configs when invoked, to correctly disconnect from possible leftover clusters. Fixes: cilium/cilium#32300 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2024-05-10T07:33:53Z

Thanks @samip5 for the report. I've raised cilium/cilium-cli#2544 to fix the issue by explicitly disabling the clustermesh configuration and resetting the list of connected clusters when running cilium clustermesh disable.

cilium/cilium#28763 decoupled the helm settings to enable the clustermesh-apiserver and provide the list of clusters to connect to. Let's reflect this change to the 'cilium clustermesh disable' command as well, explicitly disabling and resetting the remote clusters configs when invoked, to correctly disconnect from possible leftover clusters. Fixes: cilium/cilium#32300 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

samip5 added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 2, 2024

youngnick added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/clustermesh Relates to multi-cluster routing functionality in Cilium. labels May 3, 2024

giorio94 mentioned this issue May 10, 2024

clustermesh: reset remote clusters configuration upon disconnection cilium/cilium-cli#2544

Merged

michi-covalent closed this as completed in cilium/cilium-cli#2544 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After disabling ClusterMesh, Cilium seems to think it still exists despite it was disabled #32300

After disabling ClusterMesh, Cilium seems to think it still exists despite it was disabled #32300

samip5 commented May 2, 2024 •

edited

youngnick commented May 3, 2024

samip5 commented May 3, 2024 •

edited

giorio94 commented May 10, 2024

After disabling ClusterMesh, Cilium seems to think it still exists despite it was disabled #32300

After disabling ClusterMesh, Cilium seems to think it still exists despite it was disabled #32300

Comments

samip5 commented May 2, 2024 • edited

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

youngnick commented May 3, 2024

samip5 commented May 3, 2024 • edited

giorio94 commented May 10, 2024

samip5 commented May 2, 2024 •

edited

samip5 commented May 3, 2024 •

edited