Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On an OpenShift 4.13 cluster, the cifsd process causes CPU hangs #772

Open
berendiwema opened this issue Apr 26, 2024 · 1 comment
Open

Comments

@berendiwema
Copy link

berendiwema commented Apr 26, 2024

I'm not sure if cifsd is a part of this driver, or is supplied by the host OS. I wasn't able to locate the cifsd process either on the file system of the affected hosts.
Furthermore, reading the source code did not make it clear for me if cifds is a part of this driver or not.

I hope someone is familiair with issues like this and might know a way to mitigate it.

What happened:
On several nodes within an OpenShift 4.13 cluster we see nodes with hanging CPUs due to cifsd driver issues.

What you expected to happen:
The CIFSD driver does not cause hanging CPU's.

How to reproduce it:
Difficult: looks like network issues cause the share to hang or a lock to timeout, but we haven't been able to pinpoint it.

Anything else we need to know?:
System logs show:

[920285.608500] watchdog: BUG: soft lockup - CPU#1 stuck for 3703s! [.NET ThreadPool:2647689]
[920289.653493] watchdog: BUG: soft lockup - CPU#15 stuck for 3707s! [cifsd:17906]
[920301.643461] watchdog: BUG: soft lockup - CPU#12 stuck for 3595s! [cifsd:18471]
[920305.624468] watchdog: BUG: soft lockup - CPU#6 stuck for 2295s! [cifsd:18190]
[920313.608432] watchdog: BUG: soft lockup - CPU#1 stuck for 3729s! [.NET ThreadPool:2647689]
[920317.653421] watchdog: BUG: soft lockup - CPU#15 stuck for 3733s! [cifsd:17906]
[920322.866740] systemd[1]: Failed to start Journal Service.
[920329.643386] watchdog: BUG: soft lockup - CPU#12 stuck for 3621s! [cifsd:18471]
[920331.397393] rcu: INFO: rcu_preempt self-detected stall on CPU
[920331.402183] rcu:     15-....: (4019573 ticks this GP) idle=93d/1/0x4000000000000000 softirq=100792278/100814899 fqs=928269 
[920332.922398] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... 15-... } 4019285 jiffies s: 4880525 root: 0x8002/.
[920332.924796] rcu: blocking rcu_node structures (internal RCU debug):
[920333.624382] watchdog: BUG: soft lockup - CPU#6 stuck for 2321s! [cifsd:18190]
[920341.608365] watchdog: BUG: soft lockup - CPU#1 stuck for 3755s! [.NET ThreadPool:2647689]
[920357.643318] watchdog: BUG: soft lockup - CPU#12 stuck for 3647s! [cifsd:18471]
[920357.653318] watchdog: BUG: soft lockup - CPU#15 stuck for 3770s! [cifsd:17906]
[920361.624311] watchdog: BUG: soft lockup - CPU#6 stuck for 2347s! [cifsd:18190]
[920369.608292] watchdog: BUG: soft lockup - CPU#1 stuck for 3781s! [.NET ThreadPool:2647689]
[920385.643249] watchdog: BUG: soft lockup - CPU#12 stuck for 3673s! [cifsd:18471]
[920385.653254] watchdog: BUG: soft lockup - CPU#15 stuck for 3796s! [cifsd:17906]
[920389.624241] watchdog: BUG: soft lockup - CPU#6 stuck for 2373s! [cifsd:18190]
[920397.608226] watchdog: BUG: soft lockup - CPU#1 stuck for 3808s! [.NET ThreadPool:2647689]
[920398.458234] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... 15-... } 4084821 jiffies s: 4880525 root: 0x8002/.
[920398.460628] rcu: blocking rcu_node structures (internal RCU debug):

Environment:

  • CSI Driver version: registry.k8s.io/sig-storage/smbplugin:v1.14.0
  • Kubernetes version (use kubectl version): Kubernetes Version: v1.26.13+8f85140
  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS 413.92.202402131523-0 (Plow)
  • Kernel (e.g. uname -a): 5.14.0-284.52.1.el9_2.x86_64
@andyzhangx
Copy link
Member

cifsd is NOT part of this driver, it's supplied by the host

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants