# git rev-parse -q --verify 679b68c4e8656c1aef21b7ecfc280db6b54ba2f8^{commit} 679b68c4e8656c1aef21b7ecfc280db6b54ba2f8 already have revision, skipping fetch # git checkout -q -f -B kisskb 679b68c4e8656c1aef21b7ecfc280db6b54ba2f8 # git clean -qxdf # < git log -1 # commit 679b68c4e8656c1aef21b7ecfc280db6b54ba2f8 # Author: Srikar Dronamraju # Date: Thu Dec 5 14:02:17 2019 +0530 # # powerpc/vcpu: Assume dedicated processors as non-preempt # # With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on # pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule # tasks on wakeup. This leads to wrong choice of CPU, which in-turn # leads to larger wakeup latencies. Eventually, it leads to performance # regression in latency sensitive benchmarks like soltp, schbench etc. # # On Powerpc, vcpu_is_preempted() only looks at yield_count. If the # yield_count is odd, the vCPU is assumed to be preempted. However # yield_count is increased whenever the LPAR enters CEDE state (idle). # So any CPU that has entered CEDE state is assumed to be preempted. # # Even if vCPU of dedicated LPAR is preempted/donated, it should have # right of first-use since they are supposed to own the vCPU. # # On a Power9 System with 32 cores # # lscpu # Architecture: ppc64le # Byte Order: Little Endian # CPU(s): 128 # On-line CPU(s) list: 0-127 # Thread(s) per core: 8 # Core(s) per socket: 1 # Socket(s): 16 # NUMA node(s): 2 # Model: 2.2 (pvr 004e 0202) # Model name: POWER9 (architected), altivec supported # Hypervisor vendor: pHyp # Virtualization type: para # L1d cache: 32K # L1i cache: 32K # L2 cache: 512K # L3 cache: 10240K # NUMA node0 CPU(s): 0-63 # NUMA node1 CPU(s): 64-127 # # # perf stat -a -r 5 ./schbench # v5.4 v5.4 + patch # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 45 50.0000th: 39 # 75.0000th: 62 75.0000th: 53 # 90.0000th: 71 90.0000th: 67 # 95.0000th: 77 95.0000th: 76 # *99.0000th: 91 *99.0000th: 89 # 99.5000th: 707 99.5000th: 93 # 99.9000th: 6920 99.9000th: 118 # min=0, max=10048 min=0, max=211 # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 45 50.0000th: 34 # 75.0000th: 61 75.0000th: 45 # 90.0000th: 72 90.0000th: 53 # 95.0000th: 79 95.0000th: 56 # *99.0000th: 691 *99.0000th: 61 # 99.5000th: 3972 99.5000th: 63 # 99.9000th: 8368 99.9000th: 78 # min=0, max=16606 min=0, max=228 # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 45 50.0000th: 34 # 75.0000th: 61 75.0000th: 45 # 90.0000th: 71 90.0000th: 53 # 95.0000th: 77 95.0000th: 57 # *99.0000th: 106 *99.0000th: 63 # 99.5000th: 2364 99.5000th: 68 # 99.9000th: 7480 99.9000th: 100 # min=0, max=10001 min=0, max=134 # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 45 50.0000th: 34 # 75.0000th: 62 75.0000th: 46 # 90.0000th: 72 90.0000th: 53 # 95.0000th: 78 95.0000th: 56 # *99.0000th: 93 *99.0000th: 61 # 99.5000th: 108 99.5000th: 64 # 99.9000th: 6792 99.9000th: 85 # min=0, max=17681 min=0, max=121 # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 46 50.0000th: 33 # 75.0000th: 62 75.0000th: 44 # 90.0000th: 73 90.0000th: 51 # 95.0000th: 79 95.0000th: 54 # *99.0000th: 113 *99.0000th: 61 # 99.5000th: 2724 99.5000th: 64 # 99.9000th: 6184 99.9000th: 82 # min=0, max=9887 min=0, max=121 # # Performance counter stats for 'system wide' (5 runs): # # context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) # cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) # page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) # # Waiman Long suggested using static_keys. # # Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs") # Cc: stable@vger.kernel.org # v4.18+ # Reported-by: Parth Shah # Reported-by: Ihor Pasichnyk # Tested-by: Juri Lelli # Acked-by: Waiman Long # Reviewed-by: Gautham R. Shenoy # Signed-off-by: Srikar Dronamraju # Acked-by: Phil Auld # Reviewed-by: Vaidyanathan Srinivasan # Tested-by: Parth Shah # Signed-off-by: Michael Ellerman # Link: https://lore.kernel.org/r/20191205083218.25824-1-srikar@linux.vnet.ibm.com # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-ld --version # < git log --format=%s --max-count=1 679b68c4e8656c1aef21b7ecfc280db6b54ba2f8 # < make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-fixes_83xx_mpc834x_itxgp_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- 83xx/mpc834x_itxgp_defconfig # make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-fixes_83xx_mpc834x_itxgp_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- INFO: Uncompressed kernel (size 0x5a1638) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x600000) INFO: Uncompressed kernel (size 0x5a1638) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x600000) INFO: Uncompressed kernel (size 0x590f50) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x600000) Image Name: Linux-5.5.0-rc1-g679b68c4e865 Created: Thu Dec 12 00:30:13 2019 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2860866 Bytes = 2793.81 KiB = 2.73 MiB Load Address: 00000000 Entry Point: 00000000 Image Name: Linux-5.5.0-rc1-g679b68c4e865 Created: Thu Dec 12 00:30:13 2019 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2890382 Bytes = 2822.64 KiB = 2.76 MiB Load Address: 00600000 Entry Point: 00600294 Image Name: Linux-5.5.0-rc1-g679b68c4e865 Created: Thu Dec 12 00:30:13 2019 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2890221 Bytes = 2822.48 KiB = 2.76 MiB Load Address: 00600000 Entry Point: 00600294 Completed OK # rm -rf /kisskb/build/powerpc-fixes_83xx_mpc834x_itxgp_defconfig_powerpc-gcc5 # Build took: 0:00:43.825710