# git rev-parse -q --verify 3a83f677a6eeff65751b29e3648d7c69c3be83f3^{commit} 3a83f677a6eeff65751b29e3648d7c69c3be83f3 already have revision, skipping fetch # git checkout -q -f -B kisskb 3a83f677a6eeff65751b29e3648d7c69c3be83f3 # git clean -qxdf # < git log -1 # commit 3a83f677a6eeff65751b29e3648d7c69c3be83f3 # Author: Michael Roth # Date: Wed Sep 11 17:31:55 2019 -0500 # # KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag # # On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB # of memory running the following guest configs: # # guest A: # - 224GB of memory # - 56 VCPUs (sockets=1,cores=28,threads=2), where: # VCPUs 0-1 are pinned to CPUs 0-3, # VCPUs 2-3 are pinned to CPUs 4-7, # ... # VCPUs 54-55 are pinned to CPUs 108-111 # # guest B: # - 4GB of memory # - 4 VCPUs (sockets=1,cores=4,threads=1) # # with the following workloads (with KSM and THP enabled in all): # # guest A: # stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M # # guest B: # stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M # # host: # stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M # # the below soft-lockup traces were observed after an hour or so and # persisted until the host was reset (this was found to be reliably # reproducible for this configuration, for kernels 4.15, 4.18, 5.0, # and 5.3-rc5): # # [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU # [ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941 # [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709] # [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913] # [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331] # [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338] # [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525] # [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] # [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU # [ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243 # [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] # [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU # [ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714 # [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] # [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU # [ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403 # [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds. # [ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds. # [ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds. # [ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds. # [ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds. # [ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds. # [ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds. # [ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. # [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds. # [ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 # # CPUs 45, 24, and 124 are stuck on spin locks, likely held by # CPUs 105 and 31. # # CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on # target CPU 42. For instance: # # # CPU 105 registers (via xmon) # R00 = c00000000020b20c R16 = 00007d1bcd800000 # R01 = c00000363eaa7970 R17 = 0000000000000001 # R02 = c0000000019b3a00 R18 = 000000000000006b # R03 = 000000000000002a R19 = 00007d537d7aecf0 # R04 = 000000000000002a R20 = 60000000000000e0 # R05 = 000000000000002a R21 = 0801000000000080 # R06 = c0002073fb0caa08 R22 = 0000000000000d60 # R07 = c0000000019ddd78 R23 = 0000000000000001 # R08 = 000000000000002a R24 = c00000000147a700 # R09 = 0000000000000001 R25 = c0002073fb0ca908 # R10 = c000008ffeb4e660 R26 = 0000000000000000 # R11 = c0002073fb0ca900 R27 = c0000000019e2464 # R12 = c000000000050790 R28 = c0000000000812b0 # R13 = c000207fff623e00 R29 = c0002073fb0ca808 # R14 = 00007d1bbee00000 R30 = c0002073fb0ca800 # R15 = 00007d1bcd600000 R31 = 0000000000000800 # pc = c00000000020b260 smp_call_function_many+0x3d0/0x460 # cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460 # lr = c00000000020b20c smp_call_function_many+0x37c/0x460 # msr = 900000010288b033 cr = 44024824 # ctr = c000000000050790 xer = 0000000000000000 trap = 100 # # CPU 42 is running normally, doing VCPU work: # # # CPU 42 stack trace (via xmon) # [link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv] # [c000008ed3343820] c000008ed3343850 (unreliable) # [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv] # [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv] # [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm] # [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] # [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm] # [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70 # [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120 # [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80 # [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70 # --- Exception: c00 (System Call) at 00007d545cfd7694 # SP (7d53ff7edf50) is in userspace # # It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION] # was set for CPU 42 by at least 1 of the CPUs waiting in # smp_call_function_many(), but somehow the corresponding # call_single_queue entries were never processed by CPU 42, causing the # callers to spin in csd_lock_wait() indefinitely. # # Nick Piggin suggested something similar to the following sequence as # a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE # IPI messages seems to be most common, but any mix of CALL_FUNCTION and # !CALL_FUNCTION messages could trigger it): # # CPU # X: smp_muxed_ipi_set_message(): # X: smp_mb() # X: message[RESCHEDULE] = 1 # X: doorbell_global_ipi(42): # X: kvmppc_set_host_ipi(42, 1) # X: ppc_msgsnd_sync()/smp_mb() # X: ppc_msgsnd() -> 42 # 42: doorbell_exception(): // from CPU X # 42: ppc_msgsync() # 105: smp_muxed_ipi_set_message(): # 105: smb_mb() # // STORE DEFERRED DUE TO RE-ORDERING # --105: message[CALL_FUNCTION] = 1 # | 105: doorbell_global_ipi(42): # | 105: kvmppc_set_host_ipi(42, 1) # | 42: kvmppc_set_host_ipi(42, 0) # | 42: smp_ipi_demux_relaxed() # | 42: // returns to executing guest # | // RE-ORDERED STORE COMPLETES # ->105: message[CALL_FUNCTION] = 1 # 105: ppc_msgsnd_sync()/smp_mb() # 105: ppc_msgsnd() -> 42 # 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored # 105: // hangs waiting on 42 to process messages/call_single_queue # # This can be prevented with an smp_mb() at the beginning of # kvmppc_set_host_ipi(), such that stores to message[] (or other # state indicated by the host_ipi flag) are ordered vs. the store to # to host_ipi. # # However, doing so might still allow for the following scenario (not # yet observed): # # CPU # X: smp_muxed_ipi_set_message(): # X: smp_mb() # X: message[RESCHEDULE] = 1 # X: doorbell_global_ipi(42): # X: kvmppc_set_host_ipi(42, 1) # X: ppc_msgsnd_sync()/smp_mb() # X: ppc_msgsnd() -> 42 # 42: doorbell_exception(): // from CPU X # 42: ppc_msgsync() # // STORE DEFERRED DUE TO RE-ORDERING # -- 42: kvmppc_set_host_ipi(42, 0) # | 42: smp_ipi_demux_relaxed() # | 105: smp_muxed_ipi_set_message(): # | 105: smb_mb() # | 105: message[CALL_FUNCTION] = 1 # | 105: doorbell_global_ipi(42): # | 105: kvmppc_set_host_ipi(42, 1) # | // RE-ORDERED STORE COMPLETES # -> 42: kvmppc_set_host_ipi(42, 0) # 42: // returns to executing guest # 105: ppc_msgsnd_sync()/smp_mb() # 105: ppc_msgsnd() -> 42 # 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored # 105: // hangs waiting on 42 to process messages/call_single_queue # # Fixing this scenario would require an smp_mb() *after* clearing # host_ipi flag in kvmppc_set_host_ipi() to order the store vs. # subsequent processing of IPI messages. # # To handle both cases, this patch splits kvmppc_set_host_ipi() into # separate set/clear functions, where we execute smp_mb() prior to # setting host_ipi flag, and after clearing host_ipi flag. These # functions pair with each other to synchronize the sender and receiver # sides. # # With that change in place the above workload ran for 20 hours without # triggering any lock-ups. # # Fixes: 755563bc79c7 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0 # Signed-off-by: Michael Roth # Acked-by: Paul Mackerras # Signed-off-by: Michael Ellerman # Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-ld --version # < git log --format=%s --max-count=1 3a83f677a6eeff65751b29e3648d7c69c3be83f3 # < make -s -j 8 ARCH=powerpc O=/kisskb/build/powerpc-fixes_44x_ebony_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- 44x/ebony_defconfig # make -s -j 8 ARCH=powerpc O=/kisskb/build/powerpc-fixes_44x_ebony_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- INFO: Uncompressed kernel (size 0x4f04c4) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x500000) INFO: Uncompressed kernel (size 0x4f04c4) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x500000) Image Name: Linux-5.3.0-g3a83f677a6ee Created: Tue Sep 24 15:22:26 2019 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2544431 Bytes = 2484.80 KiB = 2.43 MiB Load Address: 00500000 Entry Point: 005000c4 Completed OK # rm -rf /kisskb/build/powerpc-fixes_44x_ebony_defconfig_powerpc-gcc5 # Build took: 0:00:56.945962