Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

启动内存映射问题 #2

Open
timeboy opened this issue Jun 11, 2018 · 2 comments
Open

启动内存映射问题 #2

timeboy opened this issue Jun 11, 2018 · 2 comments

Comments

@timeboy
Copy link

timeboy commented Jun 11, 2018

你好,guo ren:
你邮件中提到的直检区是否就是SSEG0区域,cpu强行映射的(0x80000000~0x9FFFFFFF)区域?

@guoren83
Copy link
Member

为了方便 软件开发者, csky cpu 在 传统 mmu 基础上,在虚拟地址空间,提供了 2-3个 512MB 大页直接映射空间,分别是 2G-2.5G 2.5G-3G 3G-3.5G,它们可以按需求任意开关,映射 512MB 对齐的物理地址。
好处是,

  • 让 uboot RTOS 开发变的简单,不需要关心页表。
  • 减少pte_dir 的数量。
  • 映射效率高
    有点 mips 直检区的味道,但比它更灵活。

@timeboy
Copy link
Author

timeboy commented Jun 14, 2018

好的,谢谢

guoren83 pushed a commit that referenced this issue Oct 25, 2018
…equests

Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
function provided by DRM as it's output_poll_changed callback.
Unfortunately however, this function doesn't grab runtime PM references
early enough and even if it did-we can't block waiting for the device to
resume in output_poll_changed() since it's very likely that we'll need
to grab the fb_helper lock at some point during the runtime resume
process. This currently results in deadlocking like so:

[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
           Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870

[  246.775522] =============================================

After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.

Changes since v7:
 - Fixup commit message - Daniel Vetter

Changes since v6:
 - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia

Changes since v5:
 - Come up with the (hopefully final) solution for solving this dumb
   problem, one that is a lot less likely to cause issues with locking in
   the future. This should work around all deadlock conditions with fbcon
   brought up thus far.

Changes since v4:
 - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
   condition that Lukas described
 - Just move all of this out of drm_fb_helper. It seems that other DRM
   drivers have already figured out other workarounds for this. If other
   drivers do end up needing this in the future, we can just move this
   back into drm_fb_helper again.

Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
  the following conditions are true (as otherwise, calling this in the
  wrong spot will cause Bad Things to happen):
  - fb_helper hotplug handling was actually inhibited previously
  - fb_helper actually has a delayed hotplug pending
  - fb_helper is actually bound
  - fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
  situation where a driver would actually want to use this without
  checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
  that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
  isn't enabled
- Actually grab the toplevel fb_helper lock in
  drm_fb_helper_resume_hotplug(), since it's possible other activity
  (such as a hotplug) could be going on at the same time the driver
  calls drm_fb_helper_resume_hotplug(). We need this to check whether or
  not drm_fb_helper_hotplug_event() needs to be called anyway

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this by making it so that normal hotplug handling /only/ happens
so long as the GPU is currently awake without any pending runtime PM
requests. In the event that a hotplug occurs while the device is
suspending or resuming, we can simply defer our response until the GPU
is fully runtime resumed again.

Changes since v4:
- Use a new trick I came up with using pm_runtime_get() instead of the
  hackish junk we had before

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features:

  ======================================================
  WARNING: possible circular locking dependency detected
  4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted
  ------------------------------------------------------
  sh/3358 is trying to acquire lock:
  000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30
  but task is already holding lock:
  00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  which lock already depends on the new lock.
  the existing dependency chain (in reverse order) is:
  -> #3 (&sb->s_type->i_mutex_key#3){+.+.}:
         lock_acquire+0xb8/0x148
         down_write+0xac/0x140
         start_creating+0x5c/0x168
         debugfs_create_dir+0x18/0x220
         opp_debug_register+0x8c/0x120
         _add_opp_dev+0x104/0x1f8
         dev_pm_opp_get_opp_table+0x174/0x340
         _of_add_opp_table_v2+0x110/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #2 (opp_table_lock){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         _of_add_opp_table_v2+0xb4/0x760
         dev_pm_opp_of_add_table+0x5c/0x240
         dev_pm_opp_of_cpumask_add_table+0x5c/0x100
         cpufreq_init+0x160/0x430
         cpufreq_online+0x1cc/0xe30
         cpufreq_add_dev+0x78/0x198
         subsys_interface_register+0x168/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #1 (subsys mutex#6){+.+.}:
         lock_acquire+0xb8/0x148
         __mutex_lock+0x104/0xf50
         mutex_lock_nested+0x1c/0x28
         subsys_interface_register+0xd8/0x270
         cpufreq_register_driver+0x1c8/0x278
         dt_cpufreq_probe+0xdc/0x1b8
         platform_drv_probe+0xb4/0x168
         driver_probe_device+0x318/0x4b0
         __device_attach_driver+0xfc/0x1f0
         bus_for_each_drv+0xf8/0x180
         __device_attach+0x164/0x200
         device_initial_probe+0x10/0x18
         bus_probe_device+0x110/0x178
         device_add+0x6d8/0x908
         platform_device_add+0x138/0x3d8
         platform_device_register_full+0x1cc/0x1f8
         cpufreq_dt_platdev_init+0x174/0x1bc
         do_one_initcall+0xb8/0x310
         kernel_init_freeable+0x4b8/0x56c
         kernel_init+0x10/0x138
         ret_from_fork+0x10/0x18
  -> #0 (cpu_hotplug_lock.rw_sem){++++}:
         __lock_acquire+0x203c/0x21d0
         lock_acquire+0xb8/0x148
         cpus_read_lock+0x58/0x1c8
         static_key_enable+0x14/0x30
         sched_feat_write+0x314/0x428
         full_proxy_write+0xa0/0x138
         __vfs_write+0xd8/0x388
         vfs_write+0xdc/0x318
         ksys_write+0xb4/0x138
         sys_write+0xc/0x18
         __sys_trace_return+0x0/0x4
  other info that might help us debug this:
  Chain exists of:
    cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(&sb->s_type->i_mutex_key#3);
                                 lock(opp_table_lock);
                                 lock(&sb->s_type->i_mutex_key#3);
    lock(cpu_hotplug_lock.rw_sem);
   *** DEADLOCK ***
  2 locks held by sh/3358:
   #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318
   #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
  stack backtrace:
  CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18
  Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
  Call trace:
   dump_backtrace+0x0/0x288
   show_stack+0x14/0x20
   dump_stack+0x13c/0x1ac
   print_circular_bug.isra.10+0x270/0x438
   check_prev_add.constprop.16+0x4dc/0xb98
   __lock_acquire+0x203c/0x21d0
   lock_acquire+0xb8/0x148
   cpus_read_lock+0x58/0x1c8
   static_key_enable+0x14/0x30
   sched_feat_write+0x314/0x428
   full_proxy_write+0xa0/0x138
   __vfs_write+0xd8/0x388
   vfs_write+0xdc/0x318
   ksys_write+0xb4/0x138
   sys_write+0xc/0x18
   __sys_trace_return+0x0/0x4

This is because when loading the cpufreq_dt module we first acquire
cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking
the &sb->s_type->i_mutex_key lock.

But when writing to /sys/kernel/debug/sched_features, the
cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock.

To fix this bug, reverse the lock acquisition order when writing to
sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on
&sb->s_type->i_mutex_key.

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
Cc: George G. Davis <george_davis@mentor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
When netvsc device is removed it can call reschedule in RCU context.
This happens because canceling the subchannel setup work could (in theory)
cause a reschedule when manipulating the timer.

To reproduce, run with lockdep enabled kernel and unbind
a network device from hv_netvsc (via sysfs).

[  160.682011] WARNING: suspicious RCU usage
[  160.707466] 4.19.0-rc3-uio+ #2 Not tainted
[  160.709937] -----------------------------
[  160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[  160.723691]
[  160.723691] other info that might help us debug this:
[  160.723691]
[  160.730955]
[  160.730955] rcu_scheduler_active = 2, debug_locks = 1
[  160.762813] 5 locks held by rebind-eth.sh/1812:
[  160.766851]  #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0
[  160.773416]  #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
[  160.783766]  #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0
[  160.787465]  #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250
[  160.816987]  #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc]
[  160.828629]
[  160.828629] stack backtrace:
[  160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2
[  160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[  160.832952] Call Trace:
[  160.832952]  dump_stack+0x85/0xcb
[  160.832952]  ___might_sleep+0x1a3/0x240
[  160.832952]  __flush_work+0x57/0x2e0
[  160.832952]  ? __mutex_lock+0x83/0x990
[  160.832952]  ? __kernfs_remove+0x24f/0x2e0
[  160.832952]  ? __kernfs_remove+0x1b2/0x2e0
[  160.832952]  ? mark_held_locks+0x50/0x80
[  160.832952]  ? get_work_pool+0x90/0x90
[  160.832952]  __cancel_work_timer+0x13c/0x1e0
[  160.832952]  ? netvsc_remove+0x1e/0x250 [hv_netvsc]
[  160.832952]  ? __lock_is_held+0x55/0x90
[  160.832952]  netvsc_remove+0x9a/0x250 [hv_netvsc]
[  160.832952]  vmbus_remove+0x26/0x30
[  160.832952]  device_release_driver_internal+0x18a/0x250
[  160.832952]  unbind_store+0xb4/0x180
[  160.832952]  kernfs_fop_write+0x113/0x1a0
[  160.832952]  __vfs_write+0x36/0x1a0
[  160.832952]  ? rcu_read_lock_sched_held+0x6b/0x80
[  160.832952]  ? rcu_sync_lockdep_assert+0x2e/0x60
[  160.832952]  ? __sb_start_write+0x141/0x1a0
[  160.832952]  ? vfs_write+0x184/0x1b0
[  160.832952]  vfs_write+0xbe/0x1b0
[  160.832952]  ksys_write+0x55/0xc0
[  160.832952]  do_syscall_64+0x60/0x1b0
[  160.832952]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  160.832952] RIP: 0033:0x7fe48f4c8154

Resolve this by getting RTNL earlier. This is safe because the subchannel
work queue does trylock on RTNL and will detect the race.

Fixes: 7b2ee50 ("hv_netvsc: common detach logic")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
> 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
[203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
[203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
[203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
[203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
[203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
[203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
[203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
[203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
[203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
[203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775]  RSP <ffff88006ff4b818>

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
This change has the following effects, in order of descreasing importance:

1) Prevent a stack buffer overflow

2) Do not append an unnecessary NULL to an anyway binary buffer, which
   is writing one byte past client_digest when caller is:
   chap_string_to_hex(client_digest, chap_r, strlen(chap_r));

The latter was found by KASAN (see below) when input value hes expected size
(32 hex chars), and further analysis revealed a stack buffer overflow can
happen when network-received value is longer, allowing an unauthenticated
remote attacker to smash up to 17 bytes after destination buffer (16 bytes
attacker-controlled and one null).  As switching to hex2bin requires
specifying destination buffer length, and does not internally append any null,
it solves both issues.

This addresses CVE-2018-14633.

Beyond this:

- Validate received value length and check hex2bin accepted the input, to log
  this rejection reason instead of just failing authentication.

- Only log received CHAP_R and CHAP_C values once they passed sanity checks.

==================================================================
BUG: KASAN: stack-out-of-bounds in chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
Write of size 1 at addr ffff8801090ef7c8 by task kworker/0:0/1021

CPU: 0 PID: 1021 Comm: kworker/0:0 Tainted: G           O      4.17.8kasan.sess.connops+ #2
Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014
Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod]
Call Trace:
 dump_stack+0x71/0xac
 print_address_description+0x65/0x22e
 ? chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 kasan_report.cold.6+0x241/0x2fd
 chap_string_to_hex+0x32/0x60 [iscsi_target_mod]
 chap_server_compute_md5.isra.2+0x2cb/0x860 [iscsi_target_mod]
 ? chap_binaryhex_to_asciihex.constprop.5+0x50/0x50 [iscsi_target_mod]
 ? ftrace_caller_op_ptr+0xe/0xe
 ? __orc_find+0x6f/0xc0
 ? unwind_next_frame+0x231/0x850
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? ret_from_fork+0x35/0x40
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? deref_stack_reg+0xd0/0xd0
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? is_module_text_address+0xa/0x11
 ? kernel_text_address+0x4c/0x110
 ? __save_stack_trace+0x82/0x100
 ? ret_from_fork+0x35/0x40
 ? save_stack+0x8c/0xb0
 ? 0xffffffffc1660000
 ? iscsi_target_do_login+0x155/0x8d0 [iscsi_target_mod]
 ? iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? process_one_work+0x35c/0x640
 ? worker_thread+0x66/0x5d0
 ? kthread+0x1a0/0x1c0
 ? ret_from_fork+0x35/0x40
 ? iscsi_update_param_value+0x80/0x80 [iscsi_target_mod]
 ? iscsit_release_cmd+0x170/0x170 [iscsi_target_mod]
 chap_main_loop+0x172/0x570 [iscsi_target_mod]
 ? chap_server_compute_md5.isra.2+0x860/0x860 [iscsi_target_mod]
 ? rx_data+0xd6/0x120 [iscsi_target_mod]
 ? iscsit_print_session_params+0xd0/0xd0 [iscsi_target_mod]
 ? cyc2ns_read_begin.part.2+0x90/0x90
 ? _raw_spin_lock_irqsave+0x25/0x50
 ? memcmp+0x45/0x70
 iscsi_target_do_login+0x875/0x8d0 [iscsi_target_mod]
 ? iscsi_target_check_first_request.isra.5+0x1a0/0x1a0 [iscsi_target_mod]
 ? del_timer+0xe0/0xe0
 ? memset+0x1f/0x40
 ? flush_sigqueue+0x29/0xd0
 iscsi_target_do_login_rx+0x3bc/0x4c0 [iscsi_target_mod]
 ? iscsi_target_nego_release+0x80/0x80 [iscsi_target_mod]
 ? iscsi_target_restore_sock_callbacks+0x130/0x130 [iscsi_target_mod]
 process_one_work+0x35c/0x640
 worker_thread+0x66/0x5d0
 ? flush_rcu_work+0x40/0x40
 kthread+0x1a0/0x1c0
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

The buggy address belongs to the page:
page:ffffea0004243bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x17fffc000000000()
raw: 017fffc000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0004243c20 ffffea0004243ba0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801090ef680: f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2 f2 f2 f2 00
 ffff8801090ef700: f2 f2 f2 f2 f2 f2 f2 00 02 f2 f2 f2 f2 f2 f2 00
>ffff8801090ef780: 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00
                                              ^
 ffff8801090ef800: 00 f2 f2 f2 f2 f2 f2 00 00 00 00 02 f2 f2 f2 f2
 ffff8801090ef880: f2 f2 f2 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00
==================================================================

Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
Syzkaller reported this on a slightly older kernel but it's still
applicable to the current kernel -

======================================================
WARNING: possible circular locking dependency detected
4.18.0-next-20180823+ #46 Not tainted
------------------------------------------------------
syz-executor4/26841 is trying to acquire lock:
00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652

but task is already holding lock:
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (rtnl_mutex){+.+.}:
       __mutex_lock_common kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
       rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
       bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
       bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
       process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
       process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #0 ((wq_completion)bond_dev->name){+.+.}:
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock((work_completion)(&(&nnw->work)->work));
                               lock(rtnl_mutex);
  lock((wq_completion)bond_dev->name);

 *** DEADLOCK ***

1 lock held by syz-executor4/26841:

stack backtrace:
CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
 check_prev_add kernel/locking/lockdep.c:1862 [inline]
 check_prevs_add kernel/locking/lockdep.c:1975 [inline]
 validate_chain kernel/locking/lockdep.c:2416 [inline]
 __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
 register_netdevice+0x337/0x1100 net/core/dev.c:8410
 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:632
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
 __sys_sendmsg+0x11d/0x290 net/socket.c:2153
 __do_sys_sendmsg net/socket.c:2162 [inline]
 __se_sys_sendmsg net/socket.c:2160 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
If I attach a vfio-ccw device to my guest, I get the following warning
on the host when the host kernel is CONFIG_HARDENED_USERCOPY=y

[250757.595325] Bad or missing usercopy whitelist? Kernel memory overwrite attempt detected to SLUB object 'dma-kmalloc-512' (offset 64, size 124)!
[250757.595365] WARNING: CPU: 2 PID: 10958 at mm/usercopy.c:81 usercopy_warn+0xac/0xd8
[250757.595369] Modules linked in: kvm vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c devlink tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sunrpc dm_multipath s390_trng crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha1_s390 eadm_sch tape_3590 tape tape_class qeth_l2 qeth ccwgroup vfio_ccw vfio_mdev zcrypt_cex4 mdev vfio_iommu_type1 zcrypt vfio sha256_s390 sha_common zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod
[250757.595424] CPU: 2 PID: 10958 Comm: CPU 2/KVM Not tainted 4.18.0-derp #2
[250757.595426] Hardware name: IBM 3906 M05 780 (LPAR)
...snip regs...
[250757.595523] Call Trace:
[250757.595529] ([<0000000000349210>] usercopy_warn+0xa8/0xd8)
[250757.595535]  [<000000000032daaa>] __check_heap_object+0xfa/0x160
[250757.595540]  [<0000000000349396>] __check_object_size+0x156/0x1d0
[250757.595547]  [<000003ff80332d04>] vfio_ccw_mdev_write+0x74/0x148 [vfio_ccw]
[250757.595552]  [<000000000034ed12>] __vfs_write+0x3a/0x188
[250757.595556]  [<000000000034f040>] vfs_write+0xa8/0x1b8
[250757.595559]  [<000000000034f4e6>] ksys_pwrite64+0x86/0xc0
[250757.595568]  [<00000000008959a0>] system_call+0xdc/0x2b0
[250757.595570] Last Breaking-Event-Address:
[250757.595573]  [<0000000000349210>] usercopy_warn+0xa8/0xd8

While vfio_ccw_mdev_{write|read} validates that the input position/count
does not run over the ccw_io_region struct, the usercopy code that does
copy_{to|from}_user doesn't necessarily know this. It sees the variable
length and gets worried that it's affecting a normal kmalloc'd struct,
and generates the above warning.

Adjust how the ccw_io_region is alloc'd with a whitelist to remove this
warning. The boundary checking will continue to do its thing.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20180921204013.95804-3-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x00005555557bdc4a in callchain_srcline (ip=<error reading variable: Cannot access memory at address 0x38>, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
  #2  0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 <unwind_entry>, thread=<optimized out>, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 <unwind_entry>, arg=0x7ffff5642498, max_stack=<optimized out>) at util/unwind-libunwind-local.c:703
  #4  0x0000555555837192 in _unwind__get_entries (cb=<optimized out>, arg=<optimized out>, thread=<optimized out>, data=<optimized out>, max_stack=<optimized out>) at util/unwind-libunwind-local.c:725
  #5  0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
  #7  0x00005555557ba4ee in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fffffff97b8, evsel=<optimized out>, al=al@entry=0x7fffffff9750,
      max_stack=<optimized out>) at util/callchain.c:1085

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 2a9d505 ("perf script: Show correct offsets for DWARF-based unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
Julian Wiedmann says:

====================
s390/qeth: fixes 2019-09-26

please apply two qeth patches for -net. The first is a trivial cleanup
required for patch #2 by Jean, which fixes a potential endless loop.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
This reverts commit d76c743.

While commit d76c743 ("serial: 8250_dw: Fix runtime PM handling")
fixes runtime PM handling when using kgdb, it introduces a traceback for
everyone else.

BUG: sleeping function called from invalid context at
	/mnt/host/source/src/third_party/kernel/next/drivers/base/power/runtime.c:1034
in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0
7 locks held by swapper/0/1:
 #0: 000000005ec5bc72 (&dev->mutex){....}, at: __driver_attach+0xb5/0x12b
 #1: 000000005d5fa9e5 (&dev->mutex){....}, at: __device_attach+0x3e/0x15b
 #2: 0000000047e93286 (serial_mutex){+.+.}, at: serial8250_register_8250_port+0x51/0x8bb
 #3: 000000003b328f07 (port_mutex){+.+.}, at: uart_add_one_port+0xab/0x8b0
 #4: 00000000fa313d4d (&port->mutex){+.+.}, at: uart_add_one_port+0xcc/0x8b0
 #5: 00000000090983ca (console_lock){+.+.}, at: vprintk_emit+0xdb/0x217
 #6: 00000000c743e583 (console_owner){-...}, at: console_unlock+0x211/0x60f
irq event stamp: 735222
__down_trylock_console_sem+0x4a/0x84
console_unlock+0x338/0x60f
__do_softirq+0x4a4/0x50d
irq_exit+0x64/0xe2
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc5 #6
Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.286.0 03/15/2017
Call Trace:
 dump_stack+0x7d/0xbd
 ___might_sleep+0x238/0x259
 __pm_runtime_resume+0x4e/0xa4
 ? serial8250_rpm_get+0x2e/0x44
 serial8250_console_write+0x44/0x301
 ? lock_acquire+0x1b8/0x1fa
 console_unlock+0x577/0x60f
 vprintk_emit+0x1f0/0x217
 printk+0x52/0x6e
 register_console+0x43b/0x524
 uart_add_one_port+0x672/0x8b0
 ? set_io_from_upio+0x150/0x162
 serial8250_register_8250_port+0x825/0x8bb
 dw8250_probe+0x80c/0x8b0
 ? dw8250_serial_inq+0x8e/0x8e
 ? dw8250_check_lcr+0x108/0x108
 ? dw8250_runtime_resume+0x5b/0x5b
 ? dw8250_serial_outq+0xa1/0xa1
 ? dw8250_remove+0x115/0x115
 platform_drv_probe+0x76/0xc5
 really_probe+0x1f1/0x3ee
 ? driver_allows_async_probing+0x5d/0x5d
 driver_probe_device+0xd6/0x112
 ? driver_allows_async_probing+0x5d/0x5d
 bus_for_each_drv+0xbe/0xe5
 __device_attach+0xdd/0x15b
 bus_probe_device+0x5a/0x10b
 device_add+0x501/0x894
 ? _raw_write_unlock+0x27/0x3a
 platform_device_add+0x224/0x2b7
 mfd_add_device+0x718/0x75b
 ? __kmalloc+0x144/0x16a
 ? mfd_add_devices+0x38/0xdb
 mfd_add_devices+0x9b/0xdb
 intel_lpss_probe+0x7d4/0x8ee
 intel_lpss_pci_probe+0xac/0xd4
 pci_device_probe+0x101/0x18e
...

Revert the offending patch until a more comprehensive solution
is available.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Fixes: d76c743 ("serial: 8250_dw: Fix runtime PM handling")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
guoren83 pushed a commit that referenced this issue Oct 25, 2018
…inux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm fixes for 4.19, take #2

- Correctly order GICv3 SGI registers in the cp15 array
guoren83 pushed a commit that referenced this issue Oct 25, 2018
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:

  #0  0x0000555555895c01 in bfd_demangle ()
  #1  0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
  #2  dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
  #3  0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
  #4  inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
  #5  0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
      line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
  #6  0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
      at util/srcline.c:358

So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:

  $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
  main
  /usr/include/c++/8.2.1/complex:610
  ??
  /usr/include/c++/8.2.1/complex:618
  ??
  /usr/include/c++/8.2.1/complex:675
  ??
  /usr/include/c++/8.2.1/complex:685
  main
  /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there which
should fix this issue:

https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a64489c ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
  introduced, i.e.  using a2l->funcname without checking if it is NULL,
  but this current patch fixes the current codebase, i.e. multiple csets
  were applied after a64489c before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 #48 [9a06f388]  536 bytes  wb_workfn at a28228
 #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 #51 [9a06fe40]  104 bytes  kthread at 26545a
 #52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
[why]
Removing connector reusage from DM to match the rest of the tree ended
up revealing an issue that was surprisingly subtle. The original amdgpu
code for DC that was submitted appears to have left a chunk in
dm_dp_create_fake_mst_encoder() that tries to find a "master encoder",
the likes of which isn't actually used or stored anywhere. It does so at
the wrong time as well by trying to access parts of the drm_connector
from the encoder init before it's actually been initialized. This
results in a NULL pointer deref on MST hotplugs:

[  160.696613] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  160.697234] PGD 0 P4D 0
[  160.697814] Oops: 0010 [#1] SMP PTI
[  160.698430] CPU: 2 PID: 64 Comm: kworker/2:1 Kdump: loaded Tainted: G           O      4.19.0Lyude-Test+ #2
[  160.699020] Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.22 05/17/2018
[  160.699672] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  160.700322] RIP: 0010:          (null)
[  160.700920] Code: Bad RIP value.
[  160.701541] RSP: 0018:ffffc9000029fc78 EFLAGS: 00010206
[  160.702183] RAX: 0000000000000000 RBX: ffff8804440ed468 RCX: ffff8804440e9158
[  160.702778] RDX: 0000000000000000 RSI: ffff8804556c5700 RDI: ffff8804440ed000
[  160.703408] RBP: ffff880458e21800 R08: 0000000000000002 R09: 000000005fca0a25
[  160.704002] R10: ffff88045a077a3d R11: ffff88045a077a3c R12: ffff8804440ed000
[  160.704614] R13: ffff880458e21800 R14: ffff8804440e9000 R15: ffff8804440e9000
[  160.705260] FS:  0000000000000000(0000) GS:ffff88045f280000(0000) knlGS:0000000000000000
[  160.705854] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.706478] CR2: ffffffffffffffd6 CR3: 000000000200a001 CR4: 00000000003606e0
[  160.707124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.707724] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.708372] Call Trace:
[  160.708998]  ? dm_dp_add_mst_connector+0xed/0x1d0 [amdgpu]
[  160.709625]  ? drm_dp_add_port+0x2fa/0x470 [drm_kms_helper]
[  160.710284]  ? wake_up_q+0x54/0x70
[  160.710877]  ? __mutex_unlock_slowpath.isra.18+0xb3/0x110
[  160.711512]  ? drm_dp_dpcd_access+0xe7/0x110 [drm_kms_helper]
[  160.712161]  ? drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper]
[  160.712762]  ? drm_dp_check_and_send_link_address+0xa3/0xd0 [drm_kms_helper]
[  160.713408]  ? drm_dp_mst_link_probe_work+0x4b/0x80 [drm_kms_helper]
[  160.714013]  ? process_one_work+0x1a1/0x3a0
[  160.714667]  ? worker_thread+0x30/0x380
[  160.715326]  ? wq_update_unbound_numa+0x10/0x10
[  160.715939]  ? kthread+0x112/0x130
[  160.716591]  ? kthread_create_worker_on_cpu+0x70/0x70
[  160.717262]  ? ret_from_fork+0x35/0x40
[  160.717886] Modules linked in: amdgpu(O) vfat fat snd_hda_codec_generic joydev i915 chash gpu_sched ttm i2c_algo_bit drm_kms_helper snd_hda_codec_hdmi hp_wmi syscopyarea iTCO_wdt sysfillrect sparse_keymap sysimgblt fb_sys_fops snd_hda_intel usbhid wmi_bmof drm snd_hda_codec btusb snd_hda_core intel_rapl btrtl x86_pkg_temp_thermal btbcm btintel coretemp snd_pcm crc32_pclmul bluetooth psmouse snd_timer snd pcspkr i2c_i801 mei_me i2c_core soundcore mei tpm_tis wmi tpm_tis_core hp_accel ecdh_generic lis3lv02d tpm video rfkill acpi_pad input_polldev hp_wireless pcc_cpufreq crc32c_intel serio_raw tg3 xhci_pci xhci_hcd [last unloaded: amdgpu]
[  160.720141] CR2: 0000000000000000

Somehow the connector reusage DM was using for MST connectors managed to
paper over this issue entirely; hence why this was never caught until
now.

[how]
Since this code isn't used anywhere and seems useless anyway, we can
just drop it entirely. This appears to fix the issue on my HP ZBook with
an AMD WX4150.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Turns out that if you trigger an HPD storm on a system that has an MST
topology connected to it, you'll end up causing the kernel to eventually
hit a NULL deref:

[  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
[  332.340906] PGD 0 P4D 0
[  332.342750] Oops: 0000 [#1] SMP PTI
[  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
[  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
[  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
[  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
[  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
[  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
[  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
[  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
[  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
[  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
[  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
[  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
[  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
[  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  332.376798] Call Trace:
[  332.378809]  process_one_work+0x1a1/0x350
[  332.380806]  worker_thread+0x30/0x380
[  332.382777]  ? wq_update_unbound_numa+0x10/0x10
[  332.384772]  kthread+0x112/0x130
[  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
[  332.388706]  ret_from_fork+0x35/0x40
[  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
[  332.394963] CR2: 00000000000000ec

This appears to be due to the fact that with an MST topology, not all
intel_connector structs will have ->encoder set. So, fix this by
skipping connectors without encoders in
intel_hpd_irq_storm_reenable_work().

For those wondering, this bug was found on accident while simulating HPD
storms using a Chamelium connected to a ThinkPad T450s (Broadwell).

Changes since v1:
- Check intel_connector->mst_port instead of intel_connector->encoder

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-3-lyude@redhat.com
(cherry picked from commit fee61de)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Adam reported a record command crash for simple session like:

  $ perf record -e cpu-clock ls

with following backtrace:

  Program received signal SIGSEGV, Segmentation fault.
  3543            ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]);
  (gdb) bt
  #0  perf_event__synthesize_event_update_unit
  #1  0x000000000051e469 in perf_event__synthesize_extra_attr
  #2  0x00000000004445cb in record__synthesize
  #3  0x0000000000444bc5 in __cmd_record
  ...

We synthesize an update event that needs to touch the evsel id array,
which is not defined at that time. Fix this by forcing the id allocation
for events with their unit defined.

Reflecting possible read_format ID bit in the attr tests.

Reported-by: Yongxin Liu <yongxin.liu@outlook.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adam Lee <leeadamrobert@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477
Fixes: bfd8f72 ("perf record: Synthesize unit/scale/... in event update")
Link: http://lkml.kernel.org/r/20181112130012.5424-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
On systems with IMA-appraisal enabled with a policy requiring file
signatures, the "good" signature values are stored on the filesystem as
extended attributes (security.ima).  Signature verification failure
would normally be limited to just a particular file (eg. executable),
but during boot signature verification failure could result in a system
hang.

Defining and requiring a new public_key_signature field requires all
callers of asymmetric signature verification to be updated to reflect
the change.  This patch updates the integrity asymmetric_verify()
caller.

Fixes: 82f94f2 ("KEYS: Provide software public key query function [ver #2]")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
…kernel/git/jmorris/linux-security

Pull integrity fix from James Morris:
 "Fix a bug introduced with in this merge window in 82f94f2 ("KEYS:
  Provide software public key query function [ver #2]")"

* 'fixes-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity: support new struct public_key_signature encoding field
guoren83 pushed a commit that referenced this issue Dec 29, 2018
We see the following lockdep warning:

[ 2284.078521] ======================================================
[ 2284.078604] WARNING: possible circular locking dependency detected
[ 2284.078604] 4.19.0+ #42 Tainted: G            E
[ 2284.078604] ------------------------------------------------------
[ 2284.078604] rmmod/254 is trying to acquire lock:
[ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
[ 2284.078604]
[ 2284.078604] but task is already holding lock:
[ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
[ 2284.078604]
[ 2284.078604] which lock already depends on the new lock.
[ 2284.078604]
[ 2284.078604]
[ 2284.078604] the existing dependency chain (in reverse order) is:
[ 2284.078604]
[ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
[ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
[ 2284.078604]        call_timer_fn+0xa1/0x280
[ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
[ 2284.078604]        __do_softirq+0xfc/0x413
[ 2284.078604]        irq_exit+0xb5/0xc0
[ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
[ 2284.078604]        apic_timer_interrupt+0xf/0x20
[ 2284.078604]        default_idle+0x1c/0x140
[ 2284.078604]        do_idle+0x1bc/0x280
[ 2284.078604]        cpu_startup_entry+0x19/0x20
[ 2284.078604]        start_secondary+0x187/0x1c0
[ 2284.078604]        secondary_startup_64+0xa4/0xb0
[ 2284.078604]
[ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
[ 2284.078604]        del_timer_sync+0x34/0xa0
[ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
[ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
[ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
[ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
[ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
[ 2284.078604]        unregister_pernet_operations+0x87/0xd0
[ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
[ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
[ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
[ 2284.078604]        do_syscall_64+0x66/0x460
[ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2284.078604]
[ 2284.078604] other info that might help us debug this:
[ 2284.078604]
[ 2284.078604]  Possible unsafe locking scenario:
[ 2284.078604]
[ 2284.078604]        CPU0                    CPU1
[ 2284.078604]        ----                    ----
[ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604]                                lock((&n->timer)#2);
[ 2284.078604]                                lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604]   lock((&n->timer)#2);
[ 2284.078604]
[ 2284.078604]  *** DEADLOCK ***
[ 2284.078604]
[ 2284.078604] 3 locks held by rmmod/254:
[ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
[ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
[ 2284.078604]  #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
[...}

The reason is that the node timer handler sometimes needs to delete a
node which has been disconnected for too long. To do this, it grabs
the lock 'node_list_lock', which may at the same time be held by the
generic node cleanup function, tipc_node_stop(), during module removal.
Since the latter is calling del_timer_sync() inside the same lock, we
have a potential deadlock.

We fix this letting the timer cleanup function use spin_trylock()
instead of just spin_lock(), and when it fails to grab the lock it
just returns so that the timer handler can terminate its execution.
This is safe to do, since tipc_node_stop() anyway is about to
delete both the timer and the node instance.

Fixes: 6a939f3 ("tipc: Auto removal of peer down node instance")
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Yonghong Song says:

====================
This patch set added name checking for PTR, ARRAY, VOLATILE, TYPEDEF,
CONST, RESTRICT, STRUCT, UNION, ENUM and FWD types. Such a strict
name checking makes BTF more sound in the kernel and future
BTF-to-header-file converesion ([1]) less fragile.

Patch #1 implemented btf_name_valid_identifier() for name checking
which will be used in Patch #2.
Patch #2 checked name validity for the above mentioned types.
Patch #3 fixed two existing test_btf unit tests exposed by the strict
name checking.
Patch #4 added additional test cases.

This patch set is against bpf tree.

Patch #1 has been implemented in bpf-next commit
Commit 2667a26 ("bpf: btf: Add BTF_KIND_FUNC
and BTF_KIND_FUNC_PROTO"), so there is no need to apply this
patch to bpf-next. In case this patch is applied to bpf-next,
there will be a minor conflict like
  diff --cc kernel/bpf/btf.c
  index a09b2f94ab25,93c233ab2db6..000000000000
  --- a/kernel/bpf/btf.c
  +++ b/kernel/bpf/btf.c
  @@@ -474,7 -451,7 +474,11 @@@ static bool btf_name_valid_identifier(c
          return !*src;
    }

  ++<<<<<<< HEAD
   +const char *btf_name_by_offset(const struct btf *btf, u32 offset)
  ++=======
  + static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
  ++>>>>>>> fa9566b0847d... bpf: btf: implement btf_name_valid_identifier()
    {
          if (!offset)
                  return "(anon)";
Just resolve the conflict by taking the "const char ..." line.

Patches #2, #3 and #4 can be applied to bpf-next without conflict.

[1]: http://vger.kernel.org/lpc-bpf2018.html#session-2
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
We can concurrently try to open the same sub-channel from 2 paths:

path #1: vmbus_onoffer() -> vmbus_process_offer() -> handle_sc_creation().
path #2: storvsc_probe() -> storvsc_connect_to_vsp() ->
	 -> storvsc_channel_init() -> handle_multichannel_storage() ->
	 -> vmbus_are_subchannels_present() -> handle_sc_creation().

They conflict with each other, but it was not an issue before the recent
commit ae6935e ("vmbus: split ring buffer allocation from open"),
because at the beginning of vmbus_open() we checked newchannel->state so
only one path could succeed, and the other would return with -EINVAL.

After ae6935e, the failing path frees the channel's ringbuffer by
vmbus_free_ring(), and this causes a panic later.

Commit ae6935e itself is good, and it just reveals the longstanding
race. We can resolve the issue by removing path #2, i.e. removing the
second vmbus_are_subchannels_present() in handle_multichannel_storage().

BTW, the comment "Check to see if sub-channels have already been created"
in handle_multichannel_storage() is incorrect: when we unload the driver,
we first close the sub-channel(s) and then close the primary channel, next
the host sends rescind-offer message(s) so primary->sc_list will become
empty. This means the first vmbus_are_subchannels_present() in
handle_multichannel_storage() is never useful.

Fixes: ae6935e ("vmbus: split ring buffer allocation from open")
Cc: stable@vger.kernel.org
Cc: Long Li <longli@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.

Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200             mcount_get_lr_addr        x0    //     pointer to function's saved lr
(gdb) bt
\#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\#1  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#2  0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\#3  0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\#4  0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\#5  ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\#6  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#7  0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\#8  0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\#9  0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205

Rework so we mark stackleak_track_stack as notrace

Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
guoren83 pushed a commit that referenced this issue Dec 29, 2018
Ido Schimmel says:

====================
mlxsw: Various fixes

Patches #1 and #2 fix two VxLAN related issues. The first patch removes
warnings that can currently be triggered from user space. Second patch
avoids leaking a FID in an error path.

Patch #3 fixes a too strict check that causes certain host routes not to
be promoted to perform GRE decapsulation in hardware.

Last patch avoids a use-after-free when deleting a VLAN device via an
ioctl when it is enslaved to a bridge. I have a patchset for net-next
that reworks this code and makes the driver more robust.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Jan 24, 2019
This patch modified test_btf pretty print test to cover
the bitfield with struct member equal to or greater 256.

Without the previous kernel patch fix, the modified test will fail:

  $ test_btf -p
  ......
  BTF pretty print array(#1)......unexpected pprint output
  expected: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x1}
      read: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x0}

  BTF pretty print array(#2)......unexpected pprint output
  expected: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x1}
      read: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x0}

  PASS:6 SKIP:0 FAIL:2

With the kernel fix, the modified test will succeed:
  $ test_btf -p
  ......
  BTF pretty print array(#1)......OK
  BTF pretty print array(#2)......OK
  PASS:8 SKIP:0 FAIL:0

Fixes: 9d5f9f7 ("bpf: btf: fix struct/union/fwd types with kind_flag")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
guoren83 pushed a commit that referenced this issue Jan 24, 2019
Yonghong Song says:

====================
The previous BTF kind_flag support patch set introduced a bug
for kernel bpffs pretty printing and another bug for bpftool
map pretty printing. If a bitfield struct member offset is
greater than 256 bits, printed value for that struct
member will be incorrect.

- Patch #1 fixed the bug in kernel bpffs pretty printing.
- Patch #2 enhanced the test_btf test case to cover the
           issue exposed by patch #1.
- Patch #3 fixed the bug in bpftool map pretty printing.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
guoren83 pushed a commit that referenced this issue Feb 1, 2022
Fix the following false positive warning:
 =============================
 WARNING: suspicious RCU usage
 5.16.0-rc4+ #57 Not tainted
 -----------------------------
 arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

 stack backtrace:
 CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
guoren83 pushed a commit that referenced this issue Feb 10, 2022
Add a test that sends large udp packet (which is fragmented)
via a stateless nft nat rule, i.e. 'ip saddr set 10.2.3.4'
and check that the datagram is received by peer.

On kernels without
commit 4e1860a ("netfilter: nft_payload: do not update layer 4 checksum when mangling fragments")',
this will fail with:

cmp: EOF on /tmp/tmp.V1q0iXJyQF which is empty
-rw------- 1 root root 4096 Jan 24 22:03 /tmp/tmp.Aaqnq4rBKS
-rw------- 1 root root    0 Jan 24 22:03 /tmp/tmp.V1q0iXJyQF
ERROR: in and output file mismatch when checking udp with stateless nat
FAIL: nftables v1.0.0 (Fearless Fosdick #2)

On patched kernels, this will show:
PASS: IP statless for ns2-PFp89amx

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
guoren83 pushed a commit that referenced this issue Feb 10, 2022
Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.

The deadlock happens with the steps following:

1) Task A calls ioctl to disable quota. It starts a transaction and
   waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
   joins to the transaction that task A started. Then task B commits to
   the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
   transaction. In this transaction start, task C waits for completion
   of the transaction that task A started and task B committed.

This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.

An example report of the deadlock:

  [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
  [372.479944]       Not tainted 5.16.0-rc8 #7
  [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.493898] task:kworker/u16:6   state:D stack:    0 pid:  103 ppid:     2 flags:0x00004000
  [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
  [372.510782] Call Trace:
  [372.514092]  <TASK>
  [372.521684]  __schedule+0xb56/0x4850
  [372.530104]  ? io_schedule_timeout+0x190/0x190
  [372.538842]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.547092]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.555591]  schedule+0xe0/0x270
  [372.561894]  btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
  [372.570506]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [372.578875]  ? free_unref_page+0x3f2/0x650
  [372.585484]  ? finish_wait+0x270/0x270
  [372.591594]  ? release_extent_buffer+0x224/0x420 [btrfs]
  [372.599264]  btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
  [372.607157]  ? lock_release+0x3a9/0x6d0
  [372.613054]  ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
  [372.620960]  ? do_raw_spin_lock+0x11e/0x250
  [372.627137]  ? rwlock_bug.part.0+0x90/0x90
  [372.633215]  ? lock_is_held_type+0xe4/0x140
  [372.639404]  btrfs_work_helper+0x1ae/0xa90 [btrfs]
  [372.646268]  process_one_work+0x7e9/0x1320
  [372.652321]  ? lock_release+0x6d0/0x6d0
  [372.658081]  ? pwq_dec_nr_in_flight+0x230/0x230
  [372.664513]  ? rwlock_bug.part.0+0x90/0x90
  [372.670529]  worker_thread+0x59e/0xf90
  [372.676172]  ? process_one_work+0x1320/0x1320
  [372.682440]  kthread+0x3b9/0x490
  [372.687550]  ? _raw_spin_unlock_irq+0x24/0x50
  [372.693811]  ? set_kthread_struct+0x100/0x100
  [372.700052]  ret_from_fork+0x22/0x30
  [372.705517]  </TASK>
  [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
  [372.729827]       Not tainted 5.16.0-rc8 #7
  [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.767106] task:btrfs-transacti state:D stack:    0 pid: 2347 ppid:     2 flags:0x00004000
  [372.787776] Call Trace:
  [372.801652]  <TASK>
  [372.812961]  __schedule+0xb56/0x4850
  [372.830011]  ? io_schedule_timeout+0x190/0x190
  [372.852547]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.871761]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.886792]  schedule+0xe0/0x270
  [372.901685]  wait_current_trans+0x22c/0x310 [btrfs]
  [372.919743]  ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
  [372.938923]  ? finish_wait+0x270/0x270
  [372.959085]  ? join_transaction+0xc75/0xe30 [btrfs]
  [372.977706]  start_transaction+0x938/0x10a0 [btrfs]
  [372.997168]  transaction_kthread+0x19d/0x3c0 [btrfs]
  [373.013021]  ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
  [373.031678]  kthread+0x3b9/0x490
  [373.047420]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.064645]  ? set_kthread_struct+0x100/0x100
  [373.078571]  ret_from_fork+0x22/0x30
  [373.091197]  </TASK>
  [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
  [373.114147]       Not tainted 5.16.0-rc8 #7
  [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.130393] task:btrfs           state:D stack:    0 pid: 3145 ppid:  3141 flags:0x00004000
  [373.140998] Call Trace:
  [373.145501]  <TASK>
  [373.149654]  __schedule+0xb56/0x4850
  [373.155306]  ? io_schedule_timeout+0x190/0x190
  [373.161965]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.168469]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [373.175468]  schedule+0xe0/0x270
  [373.180814]  wait_for_commit+0x104/0x150 [btrfs]
  [373.187643]  ? test_and_set_bit+0x20/0x20 [btrfs]
  [373.194772]  ? kmem_cache_free+0x124/0x550
  [373.201191]  ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
  [373.208738]  ? finish_wait+0x270/0x270
  [373.214704]  ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
  [373.222342]  btrfs_commit_transaction+0x44d/0x2610 [btrfs]
  [373.230233]  ? join_transaction+0x255/0xe30 [btrfs]
  [373.237334]  ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
  [373.245251]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [373.253296]  relocate_block_group+0x105/0xc20 [btrfs]
  [373.260533]  ? mutex_lock_io_nested+0x1270/0x1270
  [373.267516]  ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
  [373.275155]  ? merge_reloc_roots+0x710/0x710 [btrfs]
  [373.283602]  ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
  [373.291934]  ? kmem_cache_free+0x124/0x550
  [373.298180]  btrfs_relocate_block_group+0x35c/0x930 [btrfs]
  [373.306047]  btrfs_relocate_chunk+0x85/0x210 [btrfs]
  [373.313229]  btrfs_balance+0x12f4/0x2d20 [btrfs]
  [373.320227]  ? lock_release+0x3a9/0x6d0
  [373.326206]  ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
  [373.333591]  ? lock_is_held_type+0xe4/0x140
  [373.340031]  ? rcu_read_lock_sched_held+0x3f/0x70
  [373.346910]  btrfs_ioctl_balance+0x548/0x700 [btrfs]
  [373.354207]  btrfs_ioctl+0x7f2/0x71b0 [btrfs]
  [373.360774]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.367957]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.375327]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.383841]  ? find_held_lock+0x2c/0x110
  [373.389993]  ? lock_release+0x3a9/0x6d0
  [373.395828]  ? mntput_no_expire+0xf7/0xad0
  [373.402083]  ? lock_is_held_type+0xe4/0x140
  [373.408249]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.414486]  ? selinux_file_ioctl+0x349/0x4e0
  [373.420938]  ? trace_raw_output_lock+0xb4/0xe0
  [373.427442]  ? selinux_inode_getsecctx+0x80/0x80
  [373.434224]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.440660]  ? force_qs_rnp+0x2a0/0x6b0
  [373.446534]  ? lock_is_held_type+0x9b/0x140
  [373.452763]  ? __blkcg_punt_bio_submit+0x1b0/0x1b0
  [373.459732]  ? security_file_ioctl+0x50/0x90
  [373.466089]  __x64_sys_ioctl+0x127/0x190
  [373.472022]  do_syscall_64+0x3b/0x90
  [373.477513]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.484823] RIP: 0033:0x7f8f4af7e2bb
  [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
  [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
  [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
  [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
  [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
  [373.546506]  </TASK>
  [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
  [373.559383]       Not tainted 5.16.0-rc8 #7
  [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.575748] task:btrfs           state:D stack:    0 pid: 3146 ppid:  2168 flags:0x00000000
  [373.586314] Call Trace:
  [373.590846]  <TASK>
  [373.595121]  __schedule+0xb56/0x4850
  [373.600901]  ? __lock_acquire+0x23db/0x5030
  [373.607176]  ? io_schedule_timeout+0x190/0x190
  [373.613954]  schedule+0xe0/0x270
  [373.619157]  schedule_timeout+0x168/0x220
  [373.625170]  ? usleep_range_state+0x150/0x150
  [373.631653]  ? mark_held_locks+0x9e/0xe0
  [373.637767]  ? do_raw_spin_lock+0x11e/0x250
  [373.643993]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
  [373.651267]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.657677]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.664103]  wait_for_completion+0x163/0x250
  [373.670437]  ? bit_wait_timeout+0x160/0x160
  [373.676585]  btrfs_quota_disable+0x176/0x9a0 [btrfs]
  [373.683979]  ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
  [373.691340]  ? down_write+0xd0/0x130
  [373.696880]  ? down_write_killable+0x150/0x150
  [373.703352]  btrfs_ioctl+0x3945/0x71b0 [btrfs]
  [373.710061]  ? find_held_lock+0x2c/0x110
  [373.716192]  ? lock_release+0x3a9/0x6d0
  [373.722047]  ? __handle_mm_fault+0x23cd/0x3050
  [373.728486]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.737032]  ? set_pte+0x6a/0x90
  [373.742271]  ? do_raw_spin_unlock+0x55/0x1f0
  [373.748506]  ? lock_is_held_type+0xe4/0x140
  [373.754792]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.761083]  ? selinux_file_ioctl+0x349/0x4e0
  [373.767521]  ? selinux_inode_getsecctx+0x80/0x80
  [373.774247]  ? __up_read+0x182/0x6e0
  [373.780026]  ? count_memcg_events.constprop.0+0x46/0x60
  [373.787281]  ? up_write+0x460/0x460
  [373.792932]  ? security_file_ioctl+0x50/0x90
  [373.799232]  __x64_sys_ioctl+0x127/0x190
  [373.805237]  do_syscall_64+0x3b/0x90
  [373.810947]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.818102] RIP: 0033:0x7f1383ea02bb
  [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
  [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
  [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
  [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
  [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
  [373.879838]  </TASK>
  [373.884018]
               Showing all locks held in the system:
  [373.894250] 3 locks held by kworker/4:1/58:
  [373.900356] 1 lock held by khungtaskd/63:
  [373.906333]  #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
  [373.917307] 3 locks held by kworker/u16:6/103:
  [373.923938]  #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
  [373.936555]  #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
  [373.951109]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
  [373.964027] 2 locks held by less/1803:
  [373.969982]  #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
  [373.981295]  #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
  [373.992969] 1 lock held by btrfs-transacti/2347:
  [373.999893]  #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
  [374.015872] 3 locks held by btrfs/3145:
  [374.022298]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
  [374.034456]  #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
  [374.047646]  #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
  [374.063295] 4 locks held by btrfs/3146:
  [374.069647]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
  [374.081601]  #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
  [374.094283]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
  [374.106885]  #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]

  [374.126780] =============================================

To avoid the deadlock, wait for the qgroup rescan worker to complete
before starting the transaction for the quota disable ioctl. Clear
BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
request the worker to complete. On transaction start failure, set the
BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
changes can be done safely since the function btrfs_quota_disable is not
called concurrently because of fs_info->subvol_sem.

Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
another qgroup rescan worker to start after the previous qgroup worker
completed.

CC: stable@vger.kernel.org # 5.4+
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
guoren83 pushed a commit that referenced this issue Feb 10, 2022
…/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #2

- A couple of fixes when handling an exception while a SError has been
  delivered

- Workaround for Cortex-A510's single-step[ erratum
guoren83 pushed a commit that referenced this issue Feb 20, 2022
…egulator

The interrupt pin of the external ethernet phy is used, instead of the
enable-gpio pin of the tf-io regulator. The GPIOE_2 pin is located in
the gpio_ao bank.

This causes phy interrupt problems at system startup.
[   76.645190] irq 36: nobody cared (try booting with the "irqpoll" option)
[   76.649617] CPU: 0 PID: 1416 Comm: irq/36-0.0:00 Not tainted 5.16.0 #2
[   76.649629] Hardware name: Hardkernel ODROID-HC4 (DT)
[   76.649635] Call trace:
[   76.649638]  dump_backtrace+0x0/0x1c8
[   76.649658]  show_stack+0x14/0x60
[   76.649667]  dump_stack_lvl+0x64/0x7c
[   76.649676]  dump_stack+0x14/0x2c
[   76.649683]  __report_bad_irq+0x38/0xe8
[   76.649695]  note_interrupt+0x220/0x3a0
[   76.649704]  handle_irq_event_percpu+0x58/0x88
[   76.649713]  handle_irq_event+0x44/0xd8
[   76.649721]  handle_fasteoi_irq+0xa8/0x130
[   76.649730]  generic_handle_domain_irq+0x38/0x58
[   76.649738]  gic_handle_irq+0x9c/0xb8
[   76.649747]  call_on_irq_stack+0x28/0x38
[   76.649755]  do_interrupt_handler+0x7c/0x80
[   76.649763]  el1_interrupt+0x34/0x80
[   76.649772]  el1h_64_irq_handler+0x14/0x20
[   76.649781]  el1h_64_irq+0x74/0x78
[   76.649788]  irq_finalize_oneshot.part.56+0x68/0xf8
[   76.649796]  irq_thread_fn+0x5c/0x98
[   76.649804]  irq_thread+0x13c/0x260
[   76.649812]  kthread+0x144/0x178
[   76.649822]  ret_from_fork+0x10/0x20
[   76.649830] handlers:
[   76.653170] [<0000000025a6cd31>] irq_default_primary_handler threaded [<0000000093580eb7>] phy_interrupt
[   76.661256] Disabling IRQ #36

Fixes: 1f80a5c ("arm64: dts: meson-sm1-odroid: add missing enable gpio and supply for tf_io regulator")
Signed-off-by: Lutz Koschorreck <theleks@ko-hh.de>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
[narmstrong: removed spurious invalid & blank lines from commit message]
Link: https://lore.kernel.org/r/20220127130537.GA187347@odroid-VirtualBox
guoren83 pushed a commit that referenced this issue Feb 20, 2022
When using the flushoncommit mount option, during almost every transaction
commit we trigger a warning from __writeback_inodes_sb_nr():

  $ cat fs/fs-writeback.c:
  (...)
  static void __writeback_inodes_sb_nr(struct super_block *sb, ...
  {
        (...)
        WARN_ON(!rwsem_is_locked(&sb->s_umount));
        (...)
  }
  (...)

The trace produced in dmesg looks like the following:

  [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
  [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
  [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
  [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
  [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
  [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
  [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
  [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
  [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
  [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
  [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
  [947.553621] FS:  0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
  [947.560537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
  [947.571072] Call Trace:
  [947.572354]  <TASK>
  [947.573266]  btrfs_commit_transaction+0x1f1/0x998
  [947.576785]  ? start_transaction+0x3ab/0x44e
  [947.579867]  ? schedule_timeout+0x8a/0xdd
  [947.582716]  transaction_kthread+0xe9/0x156
  [947.585721]  ? btrfs_cleanup_transaction.isra.0+0x407/0x407
  [947.590104]  kthread+0x131/0x139
  [947.592168]  ? set_kthread_struct+0x32/0x32
  [947.595174]  ret_from_fork+0x22/0x30
  [947.597561]  </TASK>
  [947.598553] ---[ end trace 644721052755541c ]---

This is because we started using writeback_inodes_sb() to flush delalloc
when committing a transaction (when using -o flushoncommit), in order to
avoid deadlocks with filesystem freeze operations. This change was made
by commit ce8ea7c ("btrfs: don't call btrfs_start_delalloc_roots
in flushoncommit"). After that change we started producing that warning,
and every now and then a user reports this since the warning happens too
often, it spams dmesg/syslog, and a user is unsure if this reflects any
problem that might compromise the filesystem's reliability.

We can not just lock the sb->s_umount semaphore before calling
writeback_inodes_sb(), because that would at least deadlock with
filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
is called while we are holding that semaphore in write mode, and that can
trigger a transaction commit, resulting in a deadlock. It would also
trigger the same type of deadlock in the unmount path. Possibly, it could
also introduce some other locking dependencies that lockdep would report.

To fix this call try_to_writeback_inodes_sb() instead of
writeback_inodes_sb(), because that will try to read lock sb->s_umount
and then will only call writeback_inodes_sb() if it was able to lock it.
This is fine because the cases where it can't read lock sb->s_umount
are during a filesystem unmount or during a filesystem freeze - in those
cases sb->s_umount is write locked and sync_filesystem() is called, which
calls writeback_inodes_sb(). In other words, in all cases where we can't
take a read lock on sb->s_umount, writeback is already being triggered
elsewhere.

An alternative would be to call btrfs_start_delalloc_roots() with a
number of pages different from LONG_MAX, for example matching the number
of delalloc bytes we currently have, in which case we would end up
starting all delalloc with filemap_fdatawrite_wbc() and not with an
async flush via filemap_flush() - that is only possible after the rather
recent commit e076ab2 ("btrfs: shrink delalloc pages instead of
full inodes"). However that creates a whole new can of worms due to new
lock dependencies, which lockdep complains, like for example:

[ 8948.247280] ======================================================
[ 8948.247823] WARNING: possible circular locking dependency detected
[ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
[ 8948.248786] ------------------------------------------------------
[ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
[ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.250638]
               but task is already holding lock:
[ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.252018]
               which lock already depends on the new lock.

[ 8948.252710]
               the existing dependency chain (in reverse order) is:
[ 8948.253343]
               -> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
[ 8948.253950]        __mutex_lock+0x90/0x900
[ 8948.254354]        start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.254859]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.255408]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.255942]        btrfs_mksubvol+0x380/0x570 [btrfs]
[ 8948.256406]        btrfs_mksnapshot+0x81/0xb0 [btrfs]
[ 8948.256870]        __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
[ 8948.257413]        btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[ 8948.257961]        btrfs_ioctl+0x1196/0x3630 [btrfs]
[ 8948.258418]        __x64_sys_ioctl+0x83/0xb0
[ 8948.258793]        do_syscall_64+0x3b/0xc0
[ 8948.259146]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.259709]
               -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
[ 8948.260330]        __mutex_lock+0x90/0x900
[ 8948.260692]        btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.261234]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.261766]        btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
[ 8948.262379]        btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
[ 8948.262909]        open_ctree+0x1511/0x171e [btrfs]
[ 8948.263359]        btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 8948.263863]        legacy_get_tree+0x30/0x50
[ 8948.264242]        vfs_get_tree+0x28/0xc0
[ 8948.264594]        vfs_kern_mount.part.0+0x71/0xb0
[ 8948.265017]        btrfs_mount+0x11d/0x3a0 [btrfs]
[ 8948.265462]        legacy_get_tree+0x30/0x50
[ 8948.265851]        vfs_get_tree+0x28/0xc0
[ 8948.266203]        path_mount+0x2d4/0xbe0
[ 8948.266554]        __x64_sys_mount+0x103/0x140
[ 8948.266940]        do_syscall_64+0x3b/0xc0
[ 8948.267300]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.267790]
               -> #0 (sb_internal#2){.+.+}-{0:0}:
[ 8948.268322]        __lock_acquire+0x12e8/0x2260
[ 8948.268733]        lock_acquire+0xd7/0x310
[ 8948.269092]        start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.269591]        find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.270087]        btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.270588]        cow_file_range+0x17e/0x490 [btrfs]
[ 8948.271051]        btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.271586]        writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.272071]        __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.272579]        extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.273113]        extent_writepages+0x76/0x130 [btrfs]
[ 8948.273573]        do_writepages+0xd2/0x1c0
[ 8948.273942]        filemap_fdatawrite_wbc+0x68/0x90
[ 8948.274371]        start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.274876]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.275417]        flush_space+0x1f2/0x630 [btrfs]
[ 8948.275863]        btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.276438]        process_one_work+0x252/0x5a0
[ 8948.276829]        worker_thread+0x55/0x3b0
[ 8948.277189]        kthread+0xf2/0x120
[ 8948.277506]        ret_from_fork+0x22/0x30
[ 8948.277868]
               other info that might help us debug this:

[ 8948.278548] Chain exists of:
                 sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex

[ 8948.279601]  Possible unsafe locking scenario:

[ 8948.280102]        CPU0                    CPU1
[ 8948.280508]        ----                    ----
[ 8948.280915]   lock(&root->delalloc_mutex);
[ 8948.281271]                                lock(&fs_info->delalloc_root_mutex);
[ 8948.281915]                                lock(&root->delalloc_mutex);
[ 8948.282487]   lock(sb_internal#2);
[ 8948.282800]
                *** DEADLOCK ***

[ 8948.283333] 4 locks held by kworker/u16:18/933570:
[ 8948.283750]  #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.284609]  #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.285637]  #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.286674]  #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.287596]
              stack backtrace:
[ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
[ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[ 8948.290298] Call Trace:
[ 8948.290517]  <TASK>
[ 8948.290700]  dump_stack_lvl+0x59/0x73
[ 8948.291026]  check_noncircular+0xf3/0x110
[ 8948.291375]  ? start_transaction+0x228/0x6e0 [btrfs]
[ 8948.291826]  __lock_acquire+0x12e8/0x2260
[ 8948.292241]  lock_acquire+0xd7/0x310
[ 8948.292714]  ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.293241]  ? lock_is_held_type+0xea/0x140
[ 8948.293601]  start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.294055]  ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294518]  find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294957]  ? _raw_spin_unlock+0x29/0x40
[ 8948.295312]  ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
[ 8948.295813]  btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.296270]  cow_file_range+0x17e/0x490 [btrfs]
[ 8948.296691]  btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.297175]  ? find_lock_delalloc_range+0x247/0x270 [btrfs]
[ 8948.297678]  writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.298123]  __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.298570]  extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.299061]  extent_writepages+0x76/0x130 [btrfs]
[ 8948.299495]  do_writepages+0xd2/0x1c0
[ 8948.299817]  ? sched_clock_cpu+0xd/0x110
[ 8948.300160]  ? lock_release+0x155/0x4a0
[ 8948.300494]  filemap_fdatawrite_wbc+0x68/0x90
[ 8948.300874]  ? do_raw_spin_unlock+0x4b/0xa0
[ 8948.301243]  start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.301706]  ? lock_release+0x155/0x4a0
[ 8948.302055]  btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.302564]  flush_space+0x1f2/0x630 [btrfs]
[ 8948.302970]  btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.303510]  process_one_work+0x252/0x5a0
[ 8948.303860]  ? process_one_work+0x5a0/0x5a0
[ 8948.304221]  worker_thread+0x55/0x3b0
[ 8948.304543]  ? process_one_work+0x5a0/0x5a0
[ 8948.304904]  kthread+0xf2/0x120
[ 8948.305184]  ? kthread_complete_and_exit+0x20/0x20
[ 8948.305598]  ret_from_fork+0x22/0x30
[ 8948.305921]  </TASK>

It all comes from the fact that btrfs_start_delalloc_roots() takes the
delalloc_root_mutex, in the transaction commit path we are holding a
read lock on one of the superblock's freeze semaphores (via
sb_start_intwrite()), the async reclaim task can also do a call to
btrfs_start_delalloc_roots(), which ends up triggering writeback with
calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
in turn can call btrfs_start_transaction(), which will result in taking
the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
on all those locks which can be taken in different orders by different
code paths.

So just adopt the simple approach of calling try_to_writeback_inodes_sb()
at btrfs_start_delalloc_flush().

Link: https://lore.kernel.org/linux-btrfs/20220130005258.GA7465@cuci.nl/
Link: https://lore.kernel.org/linux-btrfs/43acc426-d683-d1b6-729d-c6bc4a2fff4d@gmail.com/
Link: https://lore.kernel.org/linux-btrfs/6833930a-08d7-6fbc-0141-eb9cdfd6bb4d@gmail.com/
Link: https://lore.kernel.org/linux-btrfs/20190322041731.GF16651@hungrycats.org/
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[ add more link reports ]
Signed-off-by: David Sterba <dsterba@suse.com>
guoren83 pushed a commit that referenced this issue Mar 16, 2022
Yonghong Song says:

====================

The patch [1] exposed a bpf_timer initialization bug in function
check_and_init_map_value(). With bug fix here, the patch [1]
can be applied with all selftests passed. Please see individual
patches for fix details.

  [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/

Changelog:
  v3 -> v4:
    . move header file in patch #1 to avoid bpf-next merge conflict
  v2 -> v3:
    . switch patch #1 and patch #2 for better bisecting
  v1 -> v2:
    . add Fixes tag for patch #1
    . rebase against bpf tree
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
guoren83 pushed a commit that referenced this issue Mar 16, 2022
This driver, like several others, uses a chained IRQ for each GPIO bank,
and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
a call to irq_set_irq_wake() needs to lock both the upstream and
downstream irq_desc's. Lockdep considers this to be a possible deadlock
when the irq_desc's share lockdep classes, which they do by default:

 ============================================
 WARNING: possible recursive locking detected
 5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
 --------------------------------------------
 init/307 is trying to acquire lock:
 c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 but task is already holding lock:
 c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by init/307:
  #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
  #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
  #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
  #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 stack backtrace:
 CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
 Hardware name: Allwinner sun8i Family
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x90
  dump_stack_lvl from __lock_acquire+0x1680/0x31a0
  __lock_acquire from lock_acquire+0x148/0x3dc
  lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
  _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
  __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
  irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
    [tail call from sunxi_pinctrl_irq_set_wake]
  irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
  gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
  gpio_keys_shutdown from device_shutdown+0x180/0x224
  device_shutdown from __do_sys_reboot+0x134/0x23c
  __do_sys_reboot from ret_fast_syscall+0x0/0x1c

However, this can never deadlock because the upstream and downstream
IRQs are never the same (nor do they even involve the same irqchip).

Silence this erroneous lockdep splat by applying what appears to be the
usual fix of moving the GPIO IRQs to separate lockdep classes.

Fixes: a59c99d ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220216040037.22730-1-samuel@sholland.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
guoren83 pushed a commit that referenced this issue Mar 16, 2022
in tunnel mode, if outer interface(ipv4) is less, it is easily to let
inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
is received. When send again, packets are fragmentized with 1280, they
are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().

According to RFC4213 Section3.2.2:
if (IPv4 path MTU - 20) is less than 1280
	if packet is larger than 1280 bytes
		Send ICMPv6 "packet too big" with MTU=1280
                Drop packet
        else
		Encapsulate but do not set the Don't Fragment
                flag in the IPv4 header.  The resulting IPv4
                packet might be fragmented by the IPv4 layer
                on the encapsulator or by some router along
                the IPv4 path.
	endif
else
	if packet is larger than (IPv4 path MTU - 20)
        	Send ICMPv6 "packet too big" with
                MTU = (IPv4 path MTU - 20).
                Drop packet.
        else
                Encapsulate and set the Don't Fragment flag
                in the IPv4 header.
        endif
endif
Packets should be fragmentized with ipv4 outer interface, so change it.

After it is fragemtized with ipv4, there will be double fragmenation.
No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
cannot decrypt it rightly.

48              2002::10        2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
49   0x0000 (0) 2002::10        2002::11 1304         IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
50   0x0000 (0) 2002::10        2002::11 200          ESP (SPI=0x00035000)
51              2002::10        2002::11 180          Echo (ping) request
52   0x56dc     2002::10        2002::11 248          IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)

xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
1   0x6206 192.168.1.138   192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
2   0x6206 2002::10        2002::11    88   IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
3   0x0000 2002::10        2002::11    248  ICMPv6    Echo (ping) request

Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
guoren83 pushed a commit that referenced this issue Mar 16, 2022
Ido Schimmel says:

====================
selftests: mlxsw: A couple of fixes

Patch #1 fixes a breakage due to a change in iproute2 output. The real
problem is not iproute2, but the fact that the check was not strict
enough. Fixed by using JSON output instead. Targeting at net so that the
test will pass as part of old and new kernels regardless of iproute2
version.

Patch #2 fixes an issue uncovered by the first one.
====================

Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
guoren83 pushed a commit that referenced this issue Jul 23, 2022
…tion

Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.

Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:

 #1> mkdir -p /sys/fs/cgroup/misc/a/b
 #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
 #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
 #4> PID=$!
 #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
 #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.

After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:

 1. cgroup_migrate_add_src() is called on B for the leader.

 2. cgroup_migrate_add_src() is called on A for the other threads.

 3. cgroup_migrate_prepare_dst() is called. It scans the src list.

 4. It notices that B wants to migrate to A, so it tries to A to the dst
    list but realizes that its ->mg_preload_node is already busy.

 5. and then it notices A wants to migrate to A as it's an identity
    migration, it culls it by list_del_init()'ing its ->mg_preload_node and
    putting references accordingly.

 6. The rest of migration takes place with B on the src list but nothing on
    the dst list.

This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.

This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.

This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
guoren83 pushed a commit that referenced this issue Jul 23, 2022
Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
This should be caused by the subtracting from sp->rx_count_cooked before.
It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
previous judgment.

The situation is shown below:

         (Thread 1)			|  (Thread 2)
decode_std_command()		| resync_tnc()
...					|
if (rest == 2)			|
	sp->rx_count_cooked -= 2;	|
else if (rest == 3)			| ...
					| sp->rx_count_cooked = 0;
	sp->rx_count_cooked -= 1;	|
for (i = 0; i < sp->rx_count_cooked; i++) // report error
	checksum += sp->cooked_buf[i];

sp->rx_count_cooked is a shared variable but is not protected by a lock.
The same applies to sp->rx_count. This patch adds a lock to fix the bug.

The fail log is shown below:
=======================================================================
UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
index 400 is out of range for type 'unsigned char [400]'
CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted 5.18.0-rc5-00163-g4b97bac0756a #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
 <TASK>
 dump_stack_lvl+0xcd/0x134
 ubsan_epilogue+0xb/0x50
 __ubsan_handle_out_of_bounds.cold+0x62/0x6c
 sixpack_receive_buf+0xfda/0x1330
 tty_ldisc_receive_buf+0x13e/0x180
 tty_port_default_receive_buf+0x6d/0xa0
 flush_to_ldisc+0x213/0x3f0
 process_one_work+0x98f/0x1620
 worker_thread+0x665/0x1080
 kthread+0x2e9/0x3a0
 ret_from_fork+0x1f/0x30
 ...

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Jul 23, 2022
…/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.19, take #2

- Fix a regression with pKVM when kmemleak is enabled

- Add Oliver Upton as an official KVM/arm64 reviewer
guoren83 pushed a commit that referenced this issue Jul 23, 2022
This was missed in c3ed222 ("NFSv4: Fix free of uninitialized
nfs4_label on referral lookup.") and causes a panic when mounting
with '-o trunkdiscovery':

PID: 1604   TASK: ffff93dac3520000  CPU: 3   COMMAND: "mount.nfs"
 #0 [ffffb79140f738f8] machine_kexec at ffffffffaec64bee
 #1 [ffffb79140f73950] __crash_kexec at ffffffffaeda67fd
 #2 [ffffb79140f73a18] crash_kexec at ffffffffaeda76ed
 #3 [ffffb79140f73a30] oops_end at ffffffffaec2658d
 #4 [ffffb79140f73a50] general_protection at ffffffffaf60111e
    [exception RIP: nfs_fattr_init+0x5]
    RIP: ffffffffc0c18265  RSP: ffffb79140f73b08  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff93dac304a800  RCX: 0000000000000000
    RDX: ffffb79140f73bb0  RSI: ffff93dadc8cbb40  RDI: d03ee11cfaf6bd50
    RBP: ffffb79140f73be8   R8: ffffffffc0691560   R9: 0000000000000006
    R10: ffff93db3ffd3df8  R11: 0000000000000000  R12: ffff93dac4040000
    R13: ffff93dac2848e00  R14: ffffb79140f73b60  R15: ffffb79140f73b30
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffb79140f73b08] _nfs41_proc_get_locations at ffffffffc0c73d53 [nfsv4]
 #6 [ffffb79140f73bf0] nfs4_proc_get_locations at ffffffffc0c83e90 [nfsv4]
 #7 [ffffb79140f73c60] nfs4_discover_trunking at ffffffffc0c83fb7 [nfsv4]
 #8 [ffffb79140f73cd8] nfs_probe_fsinfo at ffffffffc0c0f95f [nfs]
 #9 [ffffb79140f73da0] nfs_probe_server at ffffffffc0c1026a [nfs]
    RIP: 00007f6254fce26e  RSP: 00007ffc69496ac8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f6254fce26e
    RDX: 00005600220a82a0  RSI: 00005600220a64d0  RDI: 00005600220a6520
    RBP: 00007ffc69496c50   R8: 00005600220a8710   R9: 003035322e323231
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007ffc69496c50
    R13: 00005600220a8440  R14: 0000000000000010  R15: 0000560020650ef9
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

Fixes: c3ed222 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
guoren83 pushed a commit that referenced this issue Jul 23, 2022
…nline extents

When doing a direct IO read or write, we always return -ENOTBLK when we
find a compressed extent (or an inline extent) so that we fallback to
buffered IO. This however is not ideal in case we are in a NOWAIT context
(io_uring for example), because buffered IO can block and we currently
have no support for NOWAIT semantics for buffered IO, so if we need to
fallback to buffered IO we should first signal the caller that we may
need to block by returning -EAGAIN instead.

This behaviour can also result in short reads being returned to user
space, which although it's not incorrect and user space should be able
to deal with partial reads, it's somewhat surprising and even some popular
applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't
deal with short reads properly (or at all).

The short read case happens when we try to read from a range that has a
non-compressed and non-inline extent followed by a compressed extent.
After having read the first extent, when we find the compressed extent we
return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to
treat the request as a short read, returning 0 (success) and waiting for
previously submitted bios to complete (this happens at
fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at
btrfs_file_read_iter(), we call filemap_read() to use buffered IO to
read the remaining data, and pass it the number of bytes we were able to
read with direct IO. Than at filemap_read() if we get a page fault error
when accessing the read buffer, we return a partial read instead of an
-EFAULT error, because the number of bytes previously read is greater
than zero.

So fix this by returning -EAGAIN for NOWAIT direct IO when we find a
compressed or an inline extent.

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/
Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
guoren83 pushed a commit that referenced this issue Jul 23, 2022
…ernel/git/at91/linux into arm/fixes

AT91 fixes for 5.19 #2

It contains 2 DT fixes:
- one for SAMA5D2 to fix the i2s1 assigned-clock-parents property
- one for kswitch-d10 (LAN966 based) enforcing proper settings
  on GPIO pins

* tag 'at91-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: at91: sama5d2: Fix typo in i2s1 node
  ARM: dts: kswitch-d10: use open drain mode for coma-mode pins

Link: https://lore.kernel.org/r/20220708151621.860339-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
guoren83 pushed a commit that referenced this issue Jul 23, 2022
 into HEAD

 KVM/riscv fixes for 5.19, take #2

- Fix missing PAGE_PFN_MASK

- Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
guoren83 pushed a commit that referenced this issue Jul 23, 2022
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':

  #0  0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
  #1  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
  #2  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
  #3  0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
  #4  syscall__scnprintf_args <snip> at builtin-trace.c:2041
  #5  0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319

That points to the below code in tools/perf/builtin-trace.c:
	/*
	 * If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
	 * arguments, even if the syscall being handled, say "openat", uses only 4 arguments
	 * this breaks syscall__augmented_args() check for augmented args, as we calculate
	 * syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
	 * so when handling, say the openat syscall, we end up getting 6 args for the
	 * raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
	 * thinking that the extra 2 u64 args are the augmented filename, so just check
	 * here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
	 */
	if (evsel != trace->syscalls.events.sys_enter)
		augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);

As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.

Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
The lock length was wrongly set to 0 when fl_end == OFFSET_MAX, thus
failing to lock the whole file when l_start=0 and l_len=0.

This fixes test 2 from cthon04.

Before patch:

$ ./cthon04/lock/tlocklfs -t 2 /mnt

Creating parent/child synchronization pipes.

Test #1 - Test regions of an unlocked file.
        Parent: 1.1  - F_TEST  [               0,               1] PASSED.
        Parent: 1.2  - F_TEST  [               0,          ENDING] PASSED.
        Parent: 1.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
        Parent: 1.4  - F_TEST  [               1,               1] PASSED.
        Parent: 1.5  - F_TEST  [               1,          ENDING] PASSED.
        Parent: 1.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
        Parent: 1.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
        Parent: 1.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
        Parent: 1.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.

Test #2 - Try to lock the whole file.
        Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
        Child:  2.1  - F_TEST  [               0,               1] FAILED!
        Child:  **** Expected EACCES, returned success...
        Child:  **** Probably implementation error.

**  CHILD pass 1 results: 0/0 pass, 0/0 warn, 1/1 fail (pass/total).
        Parent: Child died

** PARENT pass 1 results: 10/10 pass, 0/0 warn, 0/0 fail (pass/total).

After patch:

$ ./cthon04/lock/tlocklfs -t 2 /mnt

Creating parent/child synchronization pipes.

Test #2 - Try to lock the whole file.
        Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
        Child:  2.1  - F_TEST  [               0,               1] PASSED.
        Child:  2.2  - F_TEST  [               0,          ENDING] PASSED.
        Child:  2.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
        Child:  2.4  - F_TEST  [               1,               1] PASSED.
        Child:  2.5  - F_TEST  [               1,          ENDING] PASSED.
        Child:  2.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
        Child:  2.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
        Child:  2.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
        Child:  2.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.
        Parent: 2.10 - F_ULOCK [               0,          ENDING] PASSED.

** PARENT pass 1 results: 2/2 pass, 0/0 warn, 0/0 fail (pass/total).

**  CHILD pass 1 results: 9/9 pass, 0/0 warn, 0/0 fail (pass/total).

Fixes: d80c698 ("cifs: fix signed integer overflow when fl_end is OFFSET_MAX")
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
Petr Machata says:

====================
mlxsw: Fixes for PTP support

This set fixes several issues in mlxsw PTP code.

- Patch #1 fixes compilation warnings.

- Patch #2 adjusts the order of operation during cleanup, thereby
  closing the window after PTP state was already cleaned in the ASIC
  for the given port, but before the port is removed, when the user
  could still in theory make changes to the configuration.

- Patch #3 protects the PTP configuration with a custom mutex, instead
  of relying on RTNL, which is not held in all access paths.

- Patch #4 forbids enablement of PTP only in RX or only in TX. The
  driver implicitly assumed this would be the case, but neglected to
  sanitize the configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
We have been hitting the following lockdep splat with btrfs/187 recently

  WARNING: possible circular locking dependency detected
  5.19.0-rc8+ #775 Not tainted
  ------------------------------------------------------
  btrfs/752500 is trying to acquire lock:
  ffff97e1875a97b8 (btrfs-treloc-02#2){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110

  but task is already holding lock:
  ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (btrfs-tree-01/1){+.+.}-{3:3}:
	 down_write_nested+0x41/0x80
	 __btrfs_tree_lock+0x24/0x110
	 btrfs_init_new_buffer+0x7d/0x2c0
	 btrfs_alloc_tree_block+0x120/0x3b0
	 __btrfs_cow_block+0x136/0x600
	 btrfs_cow_block+0x10b/0x230
	 btrfs_search_slot+0x53b/0xb70
	 btrfs_lookup_inode+0x2a/0xa0
	 __btrfs_update_delayed_inode+0x5f/0x280
	 btrfs_async_run_delayed_root+0x24c/0x290
	 btrfs_work_helper+0xf2/0x3e0
	 process_one_work+0x271/0x590
	 worker_thread+0x52/0x3b0
	 kthread+0xf0/0x120
	 ret_from_fork+0x1f/0x30

  -> #1 (btrfs-tree-01){++++}-{3:3}:
	 down_write_nested+0x41/0x80
	 __btrfs_tree_lock+0x24/0x110
	 btrfs_search_slot+0x3c3/0xb70
	 do_relocation+0x10c/0x6b0
	 relocate_tree_blocks+0x317/0x6d0
	 relocate_block_group+0x1f1/0x560
	 btrfs_relocate_block_group+0x23e/0x400
	 btrfs_relocate_chunk+0x4c/0x140
	 btrfs_balance+0x755/0xe40
	 btrfs_ioctl+0x1ea2/0x2c90
	 __x64_sys_ioctl+0x88/0xc0
	 do_syscall_64+0x38/0x90
	 entry_SYSCALL_64_after_hwframe+0x63/0xcd

  -> #0 (btrfs-treloc-02#2){+.+.}-{3:3}:
	 __lock_acquire+0x1122/0x1e10
	 lock_acquire+0xc2/0x2d0
	 down_write_nested+0x41/0x80
	 __btrfs_tree_lock+0x24/0x110
	 btrfs_lock_root_node+0x31/0x50
	 btrfs_search_slot+0x1cb/0xb70
	 replace_path+0x541/0x9f0
	 merge_reloc_root+0x1d6/0x610
	 merge_reloc_roots+0xe2/0x260
	 relocate_block_group+0x2c8/0x560
	 btrfs_relocate_block_group+0x23e/0x400
	 btrfs_relocate_chunk+0x4c/0x140
	 btrfs_balance+0x755/0xe40
	 btrfs_ioctl+0x1ea2/0x2c90
	 __x64_sys_ioctl+0x88/0xc0
	 do_syscall_64+0x38/0x90
	 entry_SYSCALL_64_after_hwframe+0x63/0xcd

  other info that might help us debug this:

  Chain exists of:
    btrfs-treloc-02#2 --> btrfs-tree-01 --> btrfs-tree-01/1

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(btrfs-tree-01/1);
				 lock(btrfs-tree-01);
				 lock(btrfs-tree-01/1);
    lock(btrfs-treloc-02#2);

   *** DEADLOCK ***

  7 locks held by btrfs/752500:
   #0: ffff97e292fdf460 (sb_writers#12){.+.+}-{0:0}, at: btrfs_ioctl+0x208/0x2c90
   #1: ffff97e284c02050 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0x55f/0xe40
   #2: ffff97e284c00878 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x236/0x400
   #3: ffff97e292fdf650 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xef/0x610
   #4: ffff97e284c02378 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0
   #5: ffff97e284c023a0 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0
   #6: ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110

  stack backtrace:
  CPU: 1 PID: 752500 Comm: btrfs Not tainted 5.19.0-rc8+ #775
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:

   dump_stack_lvl+0x56/0x73
   check_noncircular+0xd6/0x100
   ? lock_is_held_type+0xe2/0x140
   __lock_acquire+0x1122/0x1e10
   lock_acquire+0xc2/0x2d0
   ? __btrfs_tree_lock+0x24/0x110
   down_write_nested+0x41/0x80
   ? __btrfs_tree_lock+0x24/0x110
   __btrfs_tree_lock+0x24/0x110
   btrfs_lock_root_node+0x31/0x50
   btrfs_search_slot+0x1cb/0xb70
   ? lock_release+0x137/0x2d0
   ? _raw_spin_unlock+0x29/0x50
   ? release_extent_buffer+0x128/0x180
   replace_path+0x541/0x9f0
   merge_reloc_root+0x1d6/0x610
   merge_reloc_roots+0xe2/0x260
   relocate_block_group+0x2c8/0x560
   btrfs_relocate_block_group+0x23e/0x400
   btrfs_relocate_chunk+0x4c/0x140
   btrfs_balance+0x755/0xe40
   btrfs_ioctl+0x1ea2/0x2c90
   ? lock_is_held_type+0xe2/0x140
   ? lock_is_held_type+0xe2/0x140
   ? __x64_sys_ioctl+0x88/0xc0
   __x64_sys_ioctl+0x88/0xc0
   do_syscall_64+0x38/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

This isn't necessarily new, it's just tricky to hit in practice.  There
are two competing things going on here.  With relocation we create a
snapshot of every fs tree with a reloc tree.  Any extent buffers that
get initialized here are initialized with the reloc root lockdep key.
However since it is a snapshot, any blocks that are currently in cache
that originally belonged to the fs tree will have the normal tree
lockdep key set.  This creates the lock dependency of

  reloc tree -> normal tree

for the extent buffer locking during the first phase of the relocation
as we walk down the reloc root to relocate blocks.

However this is problematic because the final phase of the relocation is
merging the reloc root into the original fs root.  This involves
searching down to any keys that exist in the original fs root and then
swapping the relocated block and the original fs root block.  We have to
search down to the fs root first, and then go search the reloc root for
the block we need to replace.  This creates the dependency of

  normal tree -> reloc tree

which is why lockdep complains.

Additionally even if we were to fix this particular mismatch with a
different nesting for the merge case, we're still slotting in a block
that has a owner of the reloc root objectid into a normal tree, so that
block will have its lockdep key set to the tree reloc root, and create a
lockdep splat later on when we wander into that block from the fs root.

Unfortunately the only solution here is to make sure we do not set the
lockdep key to the reloc tree lockdep key normally, and then reset any
blocks we wander into from the reloc root when we're doing the merged.

This solves the problem of having mixed tree reloc keys intermixed with
normal tree keys, and then allows us to make sure in the merge case we
maintain the lock order of

  normal tree -> reloc tree

We handle this by setting a bit on the reloc root when we do the search
for the block we want to relocate, and any block we search into or COW
at that point gets set to the reloc tree key.  This works correctly
because we only ever COW down to the parent node, so we aren't resetting
the key for the block we're linking into the fs root.

With this patch we no longer have the lockdep splat in btrfs/187.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
Prior to commit 4149be7, sys_flock() would allocate the file_lock
struct it was going to use to pass parameters, call ->flock() and then call
locks_free_lock() to get rid of it - which had the side effect of calling
locks_release_private() and thus ->fl_release_private().

With commit 4149be7, however, this is no longer the case: the struct
is now allocated on the stack, and locks_free_lock() is no longer called -
and thus any remaining private data doesn't get cleaned up either.

This causes afs flock to cause oops.  Kasan catches this as a UAF by the
list_del_init() in afs_fl_release_private() for the file_lock record
produced by afs_fl_copy_lock() as the original record didn't get delisted.
It can be reproduced using the generic/504 xfstest.

Fix this by reinstating the locks_release_private() call in sys_flock().
I'm not sure if this would affect any other filesystems.  If not, then the
release could be done in afs_flock() instead.

Changes
=======
ver #2)
 - Don't need to call ->fl_release_private() after calling the security
   hook, only after calling ->flock().

Fixes: 4149be7 ("fs/lock: Don't allocate file_lock in flock_make_lock().")
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/166075758809.3532462.13307935588777587536.stgit@warthog.procyon.org.uk/ # v1
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags()
to obtain the value of sk->sk_user_data, but that function is only usable
if the RCU read lock is held, and neither that function nor any of its
callers hold it.

Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
that checks to see if sk->sk_callback_lock() is held and use that here
instead.

Alternatively, making __rcu_dereference_sk_user_data_with_flags() use
rcu_dereference_checked() might suffice.

Without this, the following warning can be occasionally observed:

=============================
WARNING: suspicious RCU usage
6.0.0-rc1-build2+ #563 Not tainted
-----------------------------
include/net/sock.h:592 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by locktest/29873:
 #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121
 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70
 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0
 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd
 #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4

stack backtrace:
CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 bpf_sk_reuseport_detach+0x6d/0xa4
 reuseport_detach_sock+0x75/0xdd
 inet_unhash+0xa5/0x1c0
 tcp_set_state+0x169/0x20f
 ? lockdep_sock_is_held+0x3a/0x3a
 ? __lock_release.isra.0+0x13e/0x220
 ? reacquire_held_locks+0x1bb/0x1bb
 ? hlock_class+0x31/0x96
 ? mark_lock+0x9e/0x1af
 __tcp_close+0x50/0x4b6
 tcp_close+0x28/0x70
 inet_release+0x8e/0xa7
 __sock_release+0x95/0x121
 sock_close+0x14/0x17
 __fput+0x20f/0x36a
 task_work_run+0xa3/0xcc
 exit_to_user_mode_prepare+0x9c/0x14d
 syscall_exit_to_user_mode+0x18/0x44
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: cf8c1e9 ("net: refactor bpf_sk_reuseport_detach()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hawkins Jiawei <yin31149@gmail.com>
Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
The lag_lock is taken from both process and softirq contexts which results
lockdep warning[0] about potential deadlock. However, just disabling
softirqs by using *_bh spinlock API is not enough since it will cause
warning in some contexts where the lock is obtained with hard irqs
disabled. To fix the issue save current irq state, disable them before
obtaining the lock an re-enable irqs from saved state after releasing it.

[0]:

[Sun Aug  7 13:12:29 2022] ================================
[Sun Aug  7 13:12:29 2022] WARNING: inconsistent lock state
[Sun Aug  7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
[Sun Aug  7 13:12:29 2022] --------------------------------
[Sun Aug  7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[Sun Aug  7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[Sun Aug  7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
[Sun Aug  7 13:12:29 2022]   lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]   _raw_spin_lock+0x2c/0x40
[Sun Aug  7 13:12:29 2022]   mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_nic_enable+0x114/0x470 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_resume+0x105/0x160 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   mlx5e_probe+0xac3/0x14f0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   auxiliary_bus_probe+0x9d/0xe0
[Sun Aug  7 13:12:29 2022]   really_probe+0x1e0/0xaa0
[Sun Aug  7 13:12:29 2022]   __driver_probe_device+0x219/0x480
[Sun Aug  7 13:12:29 2022]   driver_probe_device+0x49/0x130
[Sun Aug  7 13:12:29 2022]   __driver_attach+0x1e4/0x4d0
[Sun Aug  7 13:12:29 2022]   bus_for_each_dev+0x11e/0x1a0
[Sun Aug  7 13:12:29 2022]   bus_add_driver+0x3f4/0x5a0
[Sun Aug  7 13:12:29 2022]   driver_register+0x20f/0x390
[Sun Aug  7 13:12:29 2022]   __auxiliary_driver_register+0x14e/0x260
[Sun Aug  7 13:12:29 2022]   mlx5e_init+0x38/0x90 [mlx5_core]
[Sun Aug  7 13:12:29 2022]   vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
[Sun Aug  7 13:12:29 2022]   do_one_initcall+0xc4/0x400
[Sun Aug  7 13:12:29 2022]   do_init_module+0x18a/0x620
[Sun Aug  7 13:12:29 2022]   load_module+0x563a/0x7040
[Sun Aug  7 13:12:29 2022]   __do_sys_finit_module+0x122/0x1d0
[Sun Aug  7 13:12:29 2022]   do_syscall_64+0x3d/0x90
[Sun Aug  7 13:12:29 2022]   entry_SYSCALL_64_after_hwframe+0x46/0xb0
[Sun Aug  7 13:12:29 2022] irq event stamp: 3596508
[Sun Aug  7 13:12:29 2022] hardirqs last  enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
[Sun Aug  7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
[Sun Aug  7 13:12:29 2022] softirqs last  enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022]
                           other info that might help us debug this:
[Sun Aug  7 13:12:29 2022]  Possible unsafe locking scenario:

[Sun Aug  7 13:12:29 2022]        CPU0
[Sun Aug  7 13:12:29 2022]        ----
[Sun Aug  7 13:12:29 2022]   lock(lag_lock);
[Sun Aug  7 13:12:29 2022]   <Interrupt>
[Sun Aug  7 13:12:29 2022]     lock(lag_lock);
[Sun Aug  7 13:12:29 2022]
                            *** DEADLOCK ***

[Sun Aug  7 13:12:29 2022] 4 locks held by swapper/0/0:
[Sun Aug  7 13:12:29 2022]  #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
[Sun Aug  7 13:12:29 2022]  #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
[Sun Aug  7 13:12:29 2022]  #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
[Sun Aug  7 13:12:29 2022]
                           stack backtrace:
[Sun Aug  7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
[Sun Aug  7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[Sun Aug  7 13:12:29 2022] Call Trace:
[Sun Aug  7 13:12:29 2022]  <IRQ>
[Sun Aug  7 13:12:29 2022]  dump_stack_lvl+0x57/0x7d
[Sun Aug  7 13:12:29 2022]  mark_lock.part.0.cold+0x5f/0x92
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? unwind_next_frame+0x1c4/0x1b50
[Sun Aug  7 13:12:29 2022]  ? secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? stack_access_ok+0x1d0/0x1d0
[Sun Aug  7 13:12:29 2022]  ? start_kernel+0x3a7/0x3c5
[Sun Aug  7 13:12:29 2022]  __lock_acquire+0x1260/0x6720
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? mark_lock.part.0+0xed/0x3060
[Sun Aug  7 13:12:29 2022]  ? stack_trace_save+0x91/0xc0
[Sun Aug  7 13:12:29 2022]  lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  _raw_spin_lock+0x2c/0x40
[Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  atomic_notifier_call_chain+0xd7/0x1d0
[Sun Aug  7 13:12:29 2022]  br_switchdev_fdb_notify+0xea/0x100
[Sun Aug  7 13:12:29 2022]  ? br_switchdev_set_port_flag+0x310/0x310
[Sun Aug  7 13:12:29 2022]  fdb_notify+0x11b/0x150
[Sun Aug  7 13:12:29 2022]  br_fdb_update+0x34c/0x570
[Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? br_fdb_add_local+0x50/0x50
[Sun Aug  7 13:12:29 2022]  ? br_allowed_ingress+0x5f/0x1070
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  br_handle_frame_finish+0x786/0x18e0
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  ? sctp_inet_bind_verify+0x4d/0x190
[Sun Aug  7 13:12:29 2022]  ? xlog_unpack_data+0x2e0/0x310
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  br_nf_hook_thresh+0x227/0x380 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? setup_pre_routing+0x460/0x460 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
[Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
[Sun Aug  7 13:12:29 2022]  br_handle_frame+0x8a9/0x1270
[Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
[Sun Aug  7 13:12:29 2022]  ? bond_handle_frame+0xf9/0xac0 [bonding]
[Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug  7 13:12:29 2022]  __netif_receive_skb_core+0x7c0/0x2c70
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  ? generic_xdp_tx+0x5b0/0x5b0
[Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
[Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
[Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
[Sun Aug  7 13:12:29 2022]  __netif_receive_skb_list_core+0x2d7/0x8a0
[Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
[Sun Aug  7 13:12:29 2022]  ? process_backlog+0x960/0x960
[Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x129/0x400
[Sun Aug  7 13:12:29 2022]  ? kvm_clock_get_cycles+0x14/0x20
[Sun Aug  7 13:12:29 2022]  netif_receive_skb_list_internal+0x5f4/0xd60
[Sun Aug  7 13:12:29 2022]  ? do_xdp_generic+0x150/0x150
[Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  napi_complete_done+0x188/0x710
[Sun Aug  7 13:12:29 2022]  mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug  7 13:12:29 2022]  ? __queue_work+0x53c/0xeb0
[Sun Aug  7 13:12:29 2022]  __napi_poll+0x9f/0x540
[Sun Aug  7 13:12:29 2022]  net_rx_action+0x420/0xb70
[Sun Aug  7 13:12:29 2022]  ? napi_threaded_poll+0x470/0x470
[Sun Aug  7 13:12:29 2022]  ? __common_interrupt+0x79/0x1a0
[Sun Aug  7 13:12:29 2022]  __do_softirq+0x271/0x92c
[Sun Aug  7 13:12:29 2022]  irq_exit_rcu+0x11a/0x170
[Sun Aug  7 13:12:29 2022]  common_interrupt+0x7d/0xa0
[Sun Aug  7 13:12:29 2022]  </IRQ>
[Sun Aug  7 13:12:29 2022]  <TASK>
[Sun Aug  7 13:12:29 2022]  asm_common_interrupt+0x22/0x40
[Sun Aug  7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
[Sun Aug  7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
[Sun Aug  7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
[Sun Aug  7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
[Sun Aug  7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
[Sun Aug  7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
[Sun Aug  7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
[Sun Aug  7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
[Sun Aug  7 13:12:29 2022]  ? default_idle_call+0xcc/0x460
[Sun Aug  7 13:12:29 2022]  default_idle_call+0xec/0x460
[Sun Aug  7 13:12:29 2022]  do_idle+0x394/0x450
[Sun Aug  7 13:12:29 2022]  ? arch_cpu_idle_exit+0x40/0x40
[Sun Aug  7 13:12:29 2022]  cpu_startup_entry+0x19/0x20
[Sun Aug  7 13:12:29 2022]  rest_init+0x156/0x250
[Sun Aug  7 13:12:29 2022]  arch_call_rest_init+0xf/0x15
[Sun Aug  7 13:12:29 2022]  start_kernel+0x3a7/0x3c5
[Sun Aug  7 13:12:29 2022]  secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug  7 13:12:29 2022]  </TASK>

Fixes: ff9b752 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.

kernel log:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.19.0-rc8+ #2 Not tainted
 ------------------------------------------------------
 kworker/u20:0/8 is trying to acquire lock:
 ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]

 but task is already holding lock:
 ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
        down_write+0x90/0x150
        blocking_notifier_chain_register+0x53/0xa0
        mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
        mlx5_init_one+0x261/0x490 [mlx5_core]
        probe_one+0x430/0x680 [mlx5_core]
        local_pci_probe+0xd6/0x170
        work_for_cpu_fn+0x4e/0xa0
        process_one_work+0x7c2/0x1340
        worker_thread+0x6f6/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2fc7/0x6720
        lock_acquire+0x1c1/0x550
        __mutex_lock+0x12c/0x14b0
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        bus_for_each_drv+0x123/0x1a0
        __device_attach+0x1a3/0x460
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        __auxiliary_device_add+0x88/0xc0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        process_one_work+0x7c2/0x1340
        worker_thread+0x59d/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&notifier->n_head)->rwsem);
                                lock(&dev->intf_state_mutex);
                                lock(&(&notifier->n_head)->rwsem);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 4 locks held by kworker/u20:0/8:
  #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
  #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
  #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
  #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460

 stack backtrace:
 CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  check_noncircular+0x278/0x300
  ? print_circular_bug+0x460/0x460
  ? lock_chain_count+0x20/0x20
  ? register_lock_class+0x1880/0x1880
  __lock_acquire+0x2fc7/0x6720
  ? register_lock_class+0x1880/0x1880
  ? register_lock_class+0x1880/0x1880
  lock_acquire+0x1c1/0x550
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  __mutex_lock+0x12c/0x14b0
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? _raw_read_unlock+0x1f/0x30
  ? mutex_lock_io_nested+0x1320/0x1320
  ? __ioremap_caller.constprop.0+0x306/0x490
  ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
  ? iounmap+0x160/0x160
  mlx5_init_one+0x2e/0x490 [mlx5_core]
  mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
  ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
  auxiliary_bus_probe+0x9d/0xe0
  really_probe+0x1e0/0xaa0
  __driver_probe_device+0x219/0x480
  ? auxiliary_match_id+0xe9/0x140
  driver_probe_device+0x49/0x130
  __device_attach_driver+0x1b8/0x280
  ? driver_allows_async_probing+0x140/0x140
  bus_for_each_drv+0x123/0x1a0
  ? bus_for_each_dev+0x1a0/0x1a0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  ? trace_hardirqs_on+0x2d/0x100
  __device_attach+0x1a3/0x460
  ? device_driver_attach+0x1e0/0x1e0
  ? kobject_uevent_env+0x22d/0xf10
  bus_probe_device+0x1a2/0x260
  device_add+0x9b1/0x1b40
  ? dev_set_name+0xab/0xe0
  ? __fw_devlink_link_to_suppliers+0x260/0x260
  ? memset+0x20/0x40
  ? lockdep_init_map_type+0x21a/0x7d0
  __auxiliary_device_add+0x88/0xc0
  ? auxiliary_device_init+0x86/0xa0
  mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
  blocking_notifier_call_chain+0xd5/0x130
  mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
  ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
  ? lock_downgrade+0x6e0/0x6e0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? rwlock_bug.part.0+0x90/0x90
  worker_thread+0x59d/0xec0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6a32732 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
guoren83 pushed a commit that referenced this issue Aug 28, 2022
To clear the flow table on flow table free, the following sequence
normally happens in order:

  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@nvidia.com>
Tested-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
guoren83 added a commit that referenced this issue Jul 21, 2023
The callee saved fp & ra are xlen size, not long size. This patch
corrects the layout for the struct stackframe.

echo c > /proc/sysrq-trigger

Before the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #2
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

After the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<c00050c8>] dump_backtrace+0x1e/0x26
[<c086dcae>] show_stack+0x2e/0x3c
[<c0878e00>] dump_stack_lvl+0x40/0x5a
[<c0878e30>] dump_stack+0x16/0x1e
[<c086df7c>] panic+0x10c/0x2a8
[<c04f4c1e>] sysrq_reset_seq_param_set+0x0/0x76
[<c04f52cc>] __handle_sysrq+0x9c/0x19c
[<c04f5946>] write_sysrq_trigger+0x64/0x78
[<c020c7f6>] proc_reg_write+0x4a/0xa2
[<c01acf0a>] vfs_write+0xac/0x308
[<c01ad2b8>] ksys_write+0x62/0xda
[<c01ad33e>] sys_write+0xe/0x16
[<c0879860>] do_trap_ecall_u+0xd8/0xda
[<c00037de>] ret_from_exception+0x0/0x66
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
guoren83 added a commit that referenced this issue Jul 21, 2023
The callee saved fp & ra are xlen size, not long size. This patch
corrects the layout for the struct stackframe.

echo c > /proc/sysrq-trigger

Before the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #2
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

After the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<c00050c8>] dump_backtrace+0x1e/0x26
[<c086dcae>] show_stack+0x2e/0x3c
[<c0878e00>] dump_stack_lvl+0x40/0x5a
[<c0878e30>] dump_stack+0x16/0x1e
[<c086df7c>] panic+0x10c/0x2a8
[<c04f4c1e>] sysrq_reset_seq_param_set+0x0/0x76
[<c04f52cc>] __handle_sysrq+0x9c/0x19c
[<c04f5946>] write_sysrq_trigger+0x64/0x78
[<c020c7f6>] proc_reg_write+0x4a/0xa2
[<c01acf0a>] vfs_write+0xac/0x308
[<c01ad2b8>] ksys_write+0x62/0xda
[<c01ad33e>] sys_write+0xe/0x16
[<c0879860>] do_trap_ecall_u+0xd8/0xda
[<c00037de>] ret_from_exception+0x0/0x66
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
guoren83 added a commit that referenced this issue Jul 24, 2023
The callee saved fp & ra are xlen size, not long size. This patch
corrects the layout for the struct stackframe.

echo c > /proc/sysrq-trigger

Before the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #2
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

After the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<c00050c8>] dump_backtrace+0x1e/0x26
[<c086dcae>] show_stack+0x2e/0x3c
[<c0878e00>] dump_stack_lvl+0x40/0x5a
[<c0878e30>] dump_stack+0x16/0x1e
[<c086df7c>] panic+0x10c/0x2a8
[<c04f4c1e>] sysrq_reset_seq_param_set+0x0/0x76
[<c04f52cc>] __handle_sysrq+0x9c/0x19c
[<c04f5946>] write_sysrq_trigger+0x64/0x78
[<c020c7f6>] proc_reg_write+0x4a/0xa2
[<c01acf0a>] vfs_write+0xac/0x308
[<c01ad2b8>] ksys_write+0x62/0xda
[<c01ad33e>] sys_write+0xe/0x16
[<c0879860>] do_trap_ecall_u+0xd8/0xda
[<c00037de>] ret_from_exception+0x0/0x66
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
guoren83 added a commit that referenced this issue Aug 3, 2023
The callee saved fp & ra are xlen size, not long size. This patch
corrects the layout for the struct stackframe.

echo c > /proc/sysrq-trigger

Before the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #2
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

After the patch:

sysrq: Trigger a crash
Kernel panic - not syncing: sysrq triggered crash
CPU: 0 PID: 102 Comm: sh Not tainted 6.3.0-rc1-00084-g9e2ba938797e-dirty #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<c00050c8>] dump_backtrace+0x1e/0x26
[<c086dcae>] show_stack+0x2e/0x3c
[<c0878e00>] dump_stack_lvl+0x40/0x5a
[<c0878e30>] dump_stack+0x16/0x1e
[<c086df7c>] panic+0x10c/0x2a8
[<c04f4c1e>] sysrq_reset_seq_param_set+0x0/0x76
[<c04f52cc>] __handle_sysrq+0x9c/0x19c
[<c04f5946>] write_sysrq_trigger+0x64/0x78
[<c020c7f6>] proc_reg_write+0x4a/0xa2
[<c01acf0a>] vfs_write+0xac/0x308
[<c01ad2b8>] ksys_write+0x62/0xda
[<c01ad33e>] sys_write+0xe/0x16
[<c0879860>] do_trap_ecall_u+0xd8/0xda
[<c00037de>] ret_from_exception+0x0/0x66
---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants