# git rev-parse -q --verify 72cdd117c449896c707fc6cfe5b90978160697d0^{commit} 72cdd117c449896c707fc6cfe5b90978160697d0 already have revision, skipping fetch # git checkout -q -f -B kisskb 72cdd117c449896c707fc6cfe5b90978160697d0 # git clean -qxdf # < git log -1 # commit 72cdd117c449896c707fc6cfe5b90978160697d0 # Author: Scott Cheloha # Date: Wed Sep 16 09:51:22 2020 -0500 # # pseries/hotplug-memory: hot-add: skip redundant LMB lookup # # During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid() # to determine which node id (nid) to use when later calling __add_memory(). # # This is wasteful. On pseries, memory_add_physaddr_to_nid() finds an # appropriate nid for a given address by looking up the LMB containing the # address and then passing that LMB to of_drconf_to_nid_single() to get the # nid. In dlpar_add_lmb() we get this address from the LMB itself. # # In short, we have a pointer to an LMB and then we are searching for # that LMB *again* in order to find its nid. # # If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we # can skip the redundant lookup. The only error handling we need to # duplicate from memory_add_physaddr_to_nid() is the fallback to the # default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or # an invalid nid. # # Skipping the extra lookup makes hot-add operations faster, especially # on machines with many LMBs. # # Consider an LPAR with 126976 LMBs. In one test, hot-adding 126000 # LMBs on an upatched kernel took ~3.5 hours while a patched kernel # completed the same operation in ~2 hours: # # Unpatched (12450 seconds): # Sep 9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000 # Sep 9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s) # [...] # Sep 9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added # # Patched (7065 seconds): # Sep 8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000 # Sep 8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s) # [...] # Sep 8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added # # It should be noted that the speedup grows more substantial when # hot-adding LMBs at the end of the drconf range. This is because we # are skipping a linear LMB search. # # To see the distinction, consider smaller hot-add test on the same # LPAR. A perf-stat run with 10 iterations showed that hot-adding 4096 # LMBs completed less than 1 second faster on a patched kernel: # # Unpatched: # Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs): # # 104,753.42 msec task-clock # 0.992 CPUs utilized ( +- 0.55% ) # 4,708 context-switches # 0.045 K/sec ( +- 0.69% ) # 2,444 cpu-migrations # 0.023 K/sec ( +- 1.25% ) # 394 page-faults # 0.004 K/sec ( +- 0.22% ) # 445,902,503,057 cycles # 4.257 GHz ( +- 0.55% ) (66.67%) # 8,558,376,740 stalled-cycles-frontend # 1.92% frontend cycles idle ( +- 0.88% ) (49.99%) # 300,346,181,651 stalled-cycles-backend # 67.36% backend cycles idle ( +- 0.76% ) (50.01%) # 258,091,488,691 instructions # 0.58 insn per cycle # # 1.16 stalled cycles per insn ( +- 0.22% ) (66.67%) # 70,568,169,256 branches # 673.660 M/sec ( +- 0.17% ) (50.01%) # 3,100,725,426 branch-misses # 4.39% of all branches ( +- 0.20% ) (49.99%) # # 105.583 +- 0.589 seconds time elapsed ( +- 0.56% ) # # Patched: # Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs): # # 104,055.69 msec task-clock # 0.993 CPUs utilized ( +- 0.32% ) # 4,606 context-switches # 0.044 K/sec ( +- 0.20% ) # 2,463 cpu-migrations # 0.024 K/sec ( +- 0.93% ) # 394 page-faults # 0.004 K/sec ( +- 0.25% ) # 442,951,129,921 cycles # 4.257 GHz ( +- 0.32% ) (66.66%) # 8,710,413,329 stalled-cycles-frontend # 1.97% frontend cycles idle ( +- 0.47% ) (50.06%) # 299,656,905,836 stalled-cycles-backend # 67.65% backend cycles idle ( +- 0.39% ) (50.02%) # 252,731,168,193 instructions # 0.57 insn per cycle # # 1.19 stalled cycles per insn ( +- 0.20% ) (66.66%) # 68,902,851,121 branches # 662.173 M/sec ( +- 0.13% ) (49.94%) # 3,100,242,882 branch-misses # 4.50% of all branches ( +- 0.15% ) (49.98%) # # 104.829 +- 0.325 seconds time elapsed ( +- 0.31% ) # # This is consistent. An add-by-count hot-add operation adds LMBs # greedily, so LMBs near the start of the drconf range are considered # first. On an otherwise idle LPAR with so many LMBs we would expect to # find the LMBs we need near the start of the drconf range, hence the # smaller speedup. # # Signed-off-by: Scott Cheloha # Reviewed-by: Laurent Dufour # Signed-off-by: Michael Ellerman # Link: https://lore.kernel.org/r/20200916145122.3408129-1-cheloha@linux.ibm.com # < /opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux-gcc --version # < /opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux-ld --version # < git log --format=%s --max-count=1 72cdd117c449896c707fc6cfe5b90978160697d0 # < make -s -j 48 ARCH=mips O=/kisskb/build/powerpc-next_mips-defconfig_mipsel CROSS_COMPILE=/opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux- defconfig # < make -s -j 48 ARCH=mips O=/kisskb/build/powerpc-next_mips-defconfig_mipsel CROSS_COMPILE=/opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux- help # make -s -j 48 ARCH=mips O=/kisskb/build/powerpc-next_mips-defconfig_mipsel CROSS_COMPILE=/opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux- olddefconfig # make -s -j 48 ARCH=mips O=/kisskb/build/powerpc-next_mips-defconfig_mipsel CROSS_COMPILE=/opt/cross/kisskb/br-mipsel-o32-full-2016.08-613-ge98b4dd/bin/mipsel-linux- FIT description: Linux 5.9.0-rc2-g72cdd117c449 Created: Wed Oct 7 04:53:50 2020 Image 0 (kernel@0) Description: Linux 5.9.0-rc2-g72cdd117c449 Created: Wed Oct 7 04:53:50 2020 Type: Kernel Image Compression: gzip compressed Data Size: 5076930 Bytes = 4957.94 KiB = 4.84 MiB Architecture: MIPS OS: Linux Load Address: 0x80100000 Entry Point: 0x80955b00 Hash algo: sha1 Hash value: c51a43ee93574bd45afdf162527684d38b4a5231 Image 1 (fdt@boston) Description: img,boston Device Tree Created: Wed Oct 7 04:53:50 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 3793 Bytes = 3.70 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 4799f50d688573234da6e9d7701234d394759ef4 Image 2 (fdt@ni169445) Description: NI 169445 device tree Created: Wed Oct 7 04:53:50 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 1871 Bytes = 1.83 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 51b89b31605ee62038c8468c429af091dfc75ec7 Image 3 (fdt@ocelot_pcb123) Description: MSCC Ocelot PCB123 Device Tree Created: Wed Oct 7 04:53:50 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 4639 Bytes = 4.53 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: be2724f58b66c316a61da7c11ced633f4e7e86c7 Image 4 (fdt@ocelot_pcb120) Description: MSCC Ocelot PCB120 Device Tree Created: Wed Oct 7 04:53:50 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 5398 Bytes = 5.27 KiB = 0.01 MiB Architecture: MIPS Hash algo: sha1 Hash value: 2a757b83ef9e9d35fb6c43bdbde70d311b3e4555 Image 5 (fdt@xilfpga) Description: MIPSfpga (xilfpga) Device Tree Created: Wed Oct 7 04:53:50 2020 Type: Flat Device Tree Compression: uncompressed Data Size: 2708 Bytes = 2.64 KiB = 0.00 MiB Architecture: MIPS Hash algo: sha1 Hash value: 63d058b780f65e22da30f0a183433765f1807f1d Default Configuration: 'conf@default' Configuration 0 (conf@default) Description: Generic Linux kernel Kernel: kernel@0 Configuration 1 (conf@boston) Description: Boston Linux kernel Kernel: kernel@0 FDT: fdt@boston Configuration 2 (conf@ni169445) Description: NI 169445 Linux Kernel Kernel: kernel@0 FDT: fdt@ni169445 Configuration 3 (conf@ocelot_pcb123) Description: Ocelot Linux kernel Kernel: kernel@0 FDT: fdt@ocelot_pcb123 Configuration 4 (conf@ocelot_pcb120) Description: Ocelot Linux kernel Kernel: kernel@0 FDT: fdt@ocelot_pcb120 Configuration 5 (conf@xilfpga) Description: MIPSfpga Linux kernel Kernel: kernel@0 FDT: fdt@xilfpga Completed OK # rm -rf /kisskb/build/powerpc-next_mips-defconfig_mipsel # Build took: 0:01:42.423006