New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet fails to restart with CPUManager policy static when node's some CPU is offline #124066
Comments
/sig node |
/cc |
2 similar comments
/cc |
/cc |
/kind feature kubelet does not respond well to dynamic changes to resources on the node. There is an open KEP-3953 to improve this with dynamic changes to resources. |
/cc @haircommander |
a bit more context on @AnishShah 's comment: In the SIG-Node CI meeting today we talked about this and came to the conclusion that this is a feature request. In part because of the structured error that is clearly catching this problem, and in part because this is ultimately a hotplugged resource problem, even though the kubelet is restarting in between. How the kubelet responds to cpus being taken offline underneath it would be solved the same way if the kubelet was continuously running or if it was restarted before and after. I thought there was an open KEP about hotplugging memory/cpu and having the kubelet respond, but I can't find it. We also talked a bit about what the kubelet should do in this case. We decided the kubelet should probably evict in this case, but that means making the eviction manager aware of the needs of the cpu manager. All of this together informed the decision to mark this as a feature |
thanks for the comment. PS: the node's one CPU is always offline because of the HPC issue, it is necessary for tunning in some AMD architectures. |
I think "very beginning" is questionable, since you have running containers and an existing state file. I get the point that it's the beginning of the kubelet's process execution, but clearly this node has not just been rebooted or started from scratch. That's why it feels like a hotplug case to me. |
/triage accepted |
I think this is what @haircommander mentioned before: kubernetes/enhancements#3953 |
Understood the case that resource dynamicly changes and then restart kubelet. About this case, we could reproduce it with no running containers and clear any existing state file before starting kubelet, it is technically a brand new node. To clarify,
|
after another careful review I think @mtawaken has a point here. The internal logic in cpumanager (e.g. topology.go) has implicit assumptions about cpus (hyperthreads) distribution on cores (physical entities) and this seem to break in corner cases like this. IOW, the internal logic makes too many assumptions, exemplified in the code snipped shared in the issue description, trivially dividing cpu count (virtual) by cpu count (physical) vs checking the actual topology and alignment. The easiest way to break this logic is indeed disabling a single core like in this case. A workaround COULD be using the I'm inclined to re-triage this as bug, but with lower priority because it seems a very narrow corner case. |
@ffromani sorry for the late reply. I evaluated I submit a pr as the workaround inside out environment by modifying IsCoreFree/IsSocketFree method. It is not a full solution for the offline CPU scenario(maybe should covered by the dynamic node resize feature?), but it can solve the reuse problem of reserved CPUs, and make kubelet restarting back to normal. would you mind to take a review @ffromani |
Right. The critical flow is that we do trivial divisions in the cpu topology code (and in the surrounding code) rather than enumerating the CPUs and taking their relationship (and online/offline status) intou account.
This seems promising, I'll have a look |
/kind bug I believe this is a legit bug, albeit I expect to be pretty rare in practice. Retriaging as such. |
What happened?
I have a node with Configuration:
a. 96 CPUS with one CPU(ID=95) disabled.
b. kubelet CPUManager static policy enabled.
c. kubeReserved: cpu: 2, systemReserved: cpu: 2
After start kubelet, the topology kubelet detected shows
"Detected CPU topoloogy" topology=&{NumCPUs:95 NumCores:48 NumSockets:2 NumNUMANodes:2 CPUDetails:......
Notes Core 25 contains CPU 25,73。 Core 47 contains CPU 47,95,but 95 is disabled and not shown in the log.
The static Policy process CPU reservation log below.
"takeFullCores: clainming core" core=47
"takeFullCores: clainming core" core=24
"takeRemainingCPUs: claiming CPU" cpu=25
"Reserved CPUs not available for exclusive assignment" reservedSize=4 reserved="24-25,47,72"
"Updated default CPUSet" cpuSet="0-94"
The weird thing is when the first Pod admits, the reserved CPU(ID=25) is reused.
"Topology Affinity" pod="xxx" containerName="xxx" affinity={NUMANodeAffinity:<nil> Prefered:false}
"AllocateCPUs" numCPUs=20 socket=<nil>
"takeFullCores: clainming core" core=25
"takeRemainingCPUs: claiming CPU" cpu=26
"takeRemainingCPUs: claiming CPU" cpu=74
....
"Updated default CPUSet" cpuSet="0-24,35-72,83-94"
"AllocateCPUs" result="25-34,73-82"
The confliction would further cause kubelet restart failure because of the static policy validation
"Static policy invalid state, please drain node and remove policy state file" err="not all reserved cpus: \"24-25,47,72\" are present in defaultCpuSet: \"0-24,35-72,83-94\""
What did you expect to happen?
Pod Cpuset Allocation should not share with Reserved CPUs
How can we reproduce it (as minimally and precisely as possible)?
chcpu -d
to disable one CPU in the node. Check/sys/devices/system/cpu/online
file to make sure it works.a. cpuManagerPolicy: static
b. kubeReserved: cpu: 2, systemReserved: cpu: 2
Anything else we need to know?
it seems that kubelet CPUManager static Policy have not taken CPU offline senario into account.
kubernetes/pkg/kubelet/cm/cpumanager/topology/topology.go
Lines 49 to 54 in 227c2e7
kubernetes/pkg/kubelet/cm/cpumanager/cpu_assignment.go
Lines 259 to 262 in 227c2e7
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: