# git rev-parse -q --verify 801980f6497946048709b9b09771a1729551d705^{commit} 801980f6497946048709b9b09771a1729551d705 already have revision, skipping fetch # git checkout -q -f -B kisskb 801980f6497946048709b9b09771a1729551d705 # git clean -qxdf # < git log -1 # commit 801980f6497946048709b9b09771a1729551d705 # Author: Michael Roth # Date: Tue Aug 11 11:15:44 2020 -0500 # # powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death # # For a power9 KVM guest with XIVE enabled, running a test loop # where we hotplug 384 vcpus and then unplug them, the following traces # can be seen (generally within a few loops) either from the unplugged # vcpu: # # cpu 65 (hwid 65) Ready to die... # Querying DEAD? cpu 66 (66) shows 2 # list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048 # ------------[ cut here ]------------ # kernel BUG at lib/list_debug.c:56! # Oops: Exception in kernel mode, sig: 5 [#1] # LE SMP NR_CPUS=2048 NUMA pSeries # Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ... # CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1 # NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac # REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le) # MSR: 800000000282b033 CR: 28000842 XER: 20040000 # ... # NIP __list_del_entry_valid+0xac/0x100 # LR __list_del_entry_valid+0xa8/0x100 # Call Trace: # __list_del_entry_valid+0xa8/0x100 (unreliable) # free_pcppages_bulk+0x1f8/0x940 # free_unref_page+0xd0/0x100 # xive_spapr_cleanup_queue+0x148/0x1b0 # xive_teardown_cpu+0x1bc/0x240 # pseries_mach_cpu_die+0x78/0x2f0 # cpu_die+0x48/0x70 # arch_cpu_idle_dead+0x20/0x40 # do_idle+0x2f4/0x4c0 # cpu_startup_entry+0x38/0x40 # start_secondary+0x7bc/0x8f0 # start_secondary_prolog+0x10/0x14 # # or on the worker thread handling the unplug: # # pseries-hotplug-cpu: Attempting to remove CPU , drc index: 1000013a # Querying DEAD? cpu 314 (314) shows 2 # BUG: Bad page state in process kworker/u768:3 pfn:95de1 # cpu 314 (hwid 314) Ready to die... # page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 # flags: 0x5ffffc00000000() # raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 # raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 # page dumped because: nonzero mapcount # Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ... # CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 # Workqueue: pseries hotplug workque pseries_hp_work_fn # Call Trace: # dump_stack+0xb0/0xf4 (unreliable) # bad_page+0x12c/0x1b0 # free_pcppages_bulk+0x5bc/0x940 # page_alloc_cpu_dead+0x118/0x120 # cpuhp_invoke_callback.constprop.5+0xb8/0x760 # _cpu_down+0x188/0x340 # cpu_down+0x5c/0xa0 # cpu_subsys_offline+0x24/0x40 # device_offline+0xf0/0x130 # dlpar_offline_cpu+0x1c4/0x2a0 # dlpar_cpu_remove+0xb8/0x190 # dlpar_cpu_remove_by_index+0x12c/0x150 # dlpar_cpu+0x94/0x800 # pseries_hp_work_fn+0x128/0x1e0 # process_one_work+0x304/0x5d0 # worker_thread+0xcc/0x7a0 # kthread+0x1ac/0x1c0 # ret_from_kernel_thread+0x5c/0x80 # # The latter trace is due to the following sequence: # # page_alloc_cpu_dead # drain_pages # drain_pages_zone # free_pcppages_bulk # # where drain_pages() in this case is called under the assumption that # the unplugged cpu is no longer executing. To ensure that is the case, # and early call is made to __cpu_die()->pseries_cpu_die(), which runs a # loop that waits for the cpu to reach a halted state by polling its # status via query-cpu-stopped-state RTAS calls. It only polls for 25 # iterations before giving up, however, and in the trace above this # results in the following being printed only .1 seconds after the # hotplug worker thread begins processing the unplug request: # # pseries-hotplug-cpu: Attempting to remove CPU , drc index: 1000013a # Querying DEAD? cpu 314 (314) shows 2 # # At that point the worker thread assumes the unplugged CPU is in some # unknown/dead state and procedes with the cleanup, causing the race # with the XIVE cleanup code executed by the unplugged CPU. # # Fix this by waiting indefinitely, but also making an effort to avoid # spurious lockup messages by allowing for rescheduling after polling # the CPU status and printing a warning if we wait for longer than 120s. # # Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller") # Suggested-by: Michael Ellerman # Signed-off-by: Michael Roth # Tested-by: Greg Kurz # Reviewed-by: Thiago Jung Bauermann # Reviewed-by: Greg Kurz # [mpe: Trim oopses in change log slightly for readability] # Signed-off-by: Michael Ellerman # Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com # < /opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux-ld --version # < git log --format=%s --max-count=1 801980f6497946048709b9b09771a1729551d705 # < make -s -j 24 ARCH=mips O=/kisskb/build/powerpc-fixes_mips-defconfig_mips-gcc4.9 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux- defconfig # < make -s -j 24 ARCH=mips O=/kisskb/build/powerpc-fixes_mips-defconfig_mips-gcc4.9 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux- help # make -s -j 24 ARCH=mips O=/kisskb/build/powerpc-fixes_mips-defconfig_mips-gcc4.9 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux- olddefconfig # make -s -j 24 ARCH=mips O=/kisskb/build/powerpc-fixes_mips-defconfig_mips-gcc4.9 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-4.9.4-nolibc/mips-linux/bin/mips-linux- FIT description: Linux 5.9.0-rc1-g801980f64979 Created: Wed Aug 19 07:14:44 2020 Image 0 (kernel@0) Description: Linux 5.9.0-rc1-g801980f64979 Created: Wed Aug 19 07:14:44 2020 Type: Kernel Image Compression: gzip compressed Data Size: 5079323 Bytes = 4960.28 KiB = 4.84 MiB Architecture: MIPS OS: Linux Load Address: 0x80100000 Entry Point: 0x8094e1e0 Hash algo: sha1 Hash value: 30aed41fc8803e73c94c43323f91d9e9260a5cdd Image 1 (fdt@boston) Description: img,boston Device Tree Created: Wed Aug 19 07:14:44 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 3793 Bytes = 3.70 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 4799f50d688573234da6e9d7701234d394759ef4 Image 2 (fdt@ni169445) Description: NI 169445 device tree Created: Wed Aug 19 07:14:44 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 1871 Bytes = 1.83 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 51b89b31605ee62038c8468c429af091dfc75ec7 Image 3 (fdt@ocelot_pcb123) Description: MSCC Ocelot PCB123 Device Tree Created: Wed Aug 19 07:14:44 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 4639 Bytes = 4.53 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: be2724f58b66c316a61da7c11ced633f4e7e86c7 Image 4 (fdt@ocelot_pcb120) Description: MSCC Ocelot PCB120 Device Tree Created: Wed Aug 19 07:14:44 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 5398 Bytes = 5.27 KiB = 0.01 MiB Architecture: MIPS Hash algo: sha1 Hash value: 2a757b83ef9e9d35fb6c43bdbde70d311b3e4555 Image 5 (fdt@xilfpga) Description: MIPSfpga (xilfpga) Device Tree Created: Wed Aug 19 07:14:44 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 2708 Bytes = 2.64 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 63d058b780f65e22da30f0a183433765f1807f1d Default Configuration: 'conf@default' Configuration 0 (conf@default) Description: Generic Linux kernel Kernel: kernel@0 Configuration 1 (conf@boston) Description: Boston Linux kernel Kernel: kernel@0 FDT: fdt@boston Configuration 2 (conf@ni169445) Description: NI 169445 Linux Kernel Kernel: kernel@0 FDT: fdt@ni169445 Configuration 3 (conf@ocelot_pcb123) Description: Ocelot Linux kernel Kernel: kernel@0 FDT: fdt@ocelot_pcb123 Configuration 4 (conf@ocelot_pcb120) Description: Ocelot Linux kernel Kernel: kernel@0 FDT: fdt@ocelot_pcb120 Configuration 5 (conf@xilfpga) Description: MIPSfpga Linux kernel Kernel: kernel@0 FDT: fdt@xilfpga Completed OK # rm -rf /kisskb/build/powerpc-fixes_mips-defconfig_mips-gcc4.9 # Build took: 0:02:06.564376