# git rev-parse -q --verify d7d40595a2568d199396c863460cecd5ae676c34^{commit} d7d40595a2568d199396c863460cecd5ae676c34 already have revision, skipping fetch # git checkout -q -f -B kisskb d7d40595a2568d199396c863460cecd5ae676c34 # git clean -qxdf # < git log -1 # commit d7d40595a2568d199396c863460cecd5ae676c34 # Merge: c05e1251efb4 da1d8d81d961 # Author: Michael Ellerman # Date: Mon Sep 14 23:12:44 2020 +1000 # # Merge coregroup support into next # # From Srikar's cover letter, with some reformatting: # # Cleanup of existing powerpc topologies and add coregroup support on # powerpc. Coregroup is a group of (subset of) cores of a DIE that share # a resource. # # Summary of some of the testing done with coregroup patchset. # # It includes ebizzy, schbench, perf bench sched pipe and topology # verification. On the left side are results from powerpc/next tree and # on the right are the results with the patchset applied. Topological # verification clearly shows that there is no change in topology with # and without the patches on all the 3 class of systems that were # tested. # # Power 9 PowerNV (2 Node/ 160 Cpu System) # ---------------------------------------- # # Baseline Baseline + Coregroup Support # # N Min Max Median Avg Stddev N Min Max Median Avg Stddev # 100 993884 1276090 1173476 1165914 54867.201 100 910470 1279820 1171095 1162091 67363.28 # # ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better) # # schbench (latency hence lower is better) # Latency percentiles (usec) Latency percentiles (usec) # 50.0th: 455 50.0th: 454 # 75.0th: 533 75.0th: 543 # 90.0th: 683 90.0th: 701 # 95.0th: 743 95.0th: 737 # *99.0th: 815 *99.0th: 805 # 99.5th: 839 99.5th: 835 # 99.9th: 913 99.9th: 893 # min=0, max=1011 min=0, max=2833 # # perf bench sched pipe (lesser time and higher ops/sec is better) # Running 'sched/pipe' benchmark: Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Executed 1000000 pipe operations between two processes # # Total time: 6.083 [sec] Total time: 6.303 [sec] # # 6.083576 usecs/op 6.303318 usecs/op # 164377 ops/sec 158646 ops/sec # # Power 9 LPAR (2 Node/ 128 Cpu System) # ------------------------------------- # # Baseline Baseline + Coregroup Support # # N Min Max Median Avg Stddev N Min Max Median Avg Stddev # 100 1058029 1295393 1200414 1188306.7 56786.538 100 943264 1287619 1180522 1168473.2 64469.955 # # ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better) # # schbench (latency hence lower is better) # Latency percentiles (usec) Latency percentiles (usec) # 50.0000th: 34 50.0000th: 39 # 75.0000th: 46 75.0000th: 52 # 90.0000th: 53 90.0000th: 68 # 95.0000th: 56 95.0000th: 77 # *99.0000th: 61 *99.0000th: 89 # 99.5000th: 63 99.5000th: 94 # 99.9000th: 81 99.9000th: 169 # min=0, max=8405 min=0, max=23674 # # perf bench sched pipe (lesser time and higher ops/sec is better) # Running 'sched/pipe' benchmark: Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Executed 1000000 pipe operations between two processes # # Total time: 8.768 [sec] Total time: 5.217 [sec] # # 8.768400 usecs/op 5.217625 usecs/op # 114045 ops/sec 191658 ops/sec # # Power 8 LPAR (8 Node/ 256 Cpu System) # ------------------------------------- # # Baseline Baseline + Coregroup Support # # N Min Max Median Avg Stddev N Min Max Median Avg Stddev # 100 1267615 1965234 1707423 1689137.6 144363.29 100 1175357 1924262 1691104 1664792.1 145876.4 # # ^ ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better) # # schbench (latency hence lower is better) # Latency percentiles (usec) Latency percentiles (usec) # 50.0th: 37 50.0th: 36 # 75.0th: 51 75.0th: 48 # 90.0th: 59 90.0th: 55 # 95.0th: 63 95.0th: 59 # *99.0th: 71 *99.0th: 67 # 99.5th: 75 99.5th: 72 # 99.9th: 105 99.9th: 170 # min=0, max=18560 min=0, max=27031 # # perf bench sched pipe (lesser time and higher ops/sec is better) # Running 'sched/pipe' benchmark: Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Executed 1000000 pipe operations between two processes # # Total time: 6.013 [sec] Total time: 5.930 [sec] # # 6.013963 usecs/op 5.930724 usecs/op # 166279 ops/sec 168613 ops/sec # # Topology verification on Power9 # Power9 / powernv / SMT4 # # $ tail /proc/cpuinfo # cpu : POWER9, altivec supported # clock : 3600.000000MHz # revision : 2.2 (pvr 004e 1202) # # timebase : 512000000 # platform : PowerNV # model : 9006-22P # machine : PowerNV 9006-22P # firmware : OPAL # MMU : Radix # # Baseline Baseline + Coregroup Support # # lscpu lscpu # ------ ------ # Architecture: ppc64le Architecture: ppc64le # Byte Order: Little Endian Byte Order: Little Endian # CPU(s): 160 CPU(s): 160 # On-line CPU(s) list: 0-159 On-line CPU(s) list: 0-159 # Thread(s) per core: 4 Thread(s) per core: 4 # Core(s) per socket: 20 Core(s) per socket: 20 # Socket(s): 2 Socket(s): 2 # NUMA node(s): 2 NUMA node(s): 2 # Model: 2.2 (pvr 004e 1202) Model: 2.2 (pvr 004e 1202) # Model name: POWER9, altivec supported Model name: POWER9, altivec supported # CPU max MHz: 3800.0000 CPU max MHz: 3800.0000 # CPU min MHz: 2166.0000 CPU min MHz: 2166.0000 # L1d cache: 32K L1d cache: 32K # L1i cache: 32K L1i cache: 32K # L2 cache: 512K L2 cache: 512K # L3 cache: 10240K L3 cache: 10240K # NUMA node0 CPU(s): 0-79 NUMA node0 CPU(s): 0-79 # NUMA node8 CPU(s): 80-159 NUMA node8 CPU(s): 80-159 # # grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name # ----------------------------------------------------- ----------------------------------------------------- # /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT # /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE # /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE # /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA # # grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags # ------------------------------------------------------ ------------------------------------------------------ # /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391 # /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327 # /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071 # /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801 # # Baseline # # head /proc/schedstat # -------------------- # version 15 # timestamp 4295043536 # cpu0 0 0 0 0 0 0 9597119314 2408913694 11897 # domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu1 0 0 0 0 0 0 4941435230 11106132 1583 # domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # # Baseline + Coregroup Support # # head /proc/schedstat # -------------------- # version 15 # timestamp 4296311826 # cpu0 0 0 0 0 0 0 3353674045024 3781680865826 297483 # domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu1 0 0 0 0 0 0 3337873293332 4231590033856 229090 # domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # # Post sudo ppc64_cpu --smt=1 Post sudo ppc64_cpu --smt=1 # --------------------- --------------------- # grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name # ----------------------------------------------------- ----------------------------------------------------- # /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE # /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE # /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA # # grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags # ------------------------------------------------------ ------------------------------------------------------ # /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327 # /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071 # /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801 # # Baseline: # # head /proc/schedstat # -------------------- # version 15 # timestamp 4295046242 # cpu0 0 0 0 0 0 0 10978610020 2658997390 13068 # domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu4 0 0 0 0 0 0 5408663896 95701034 7697 # domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # # Baseline + Coregroup Support # # head /proc/schedstat # -------------------- # version 15 # timestamp 4296314905 # cpu0 0 0 0 0 0 0 3355392013536 3781975150576 298723 # domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu4 0 0 0 0 0 0 3351637920996 4427329763050 256776 # domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # # Similar verification was done on Power 8 (8 Node 256 CPU LPAR) and # Power 9 (2 node 128 Cpu LPAR) and they showed the topology before and # after the patch to be identical. If Interested, I could provide the # same. # # On Power 9 (with device-tree enablement to show coregroups): # # $ tail /proc/cpuinfo # processor : 127 # cpu : POWER9 (architected), altivec supported # clock : 3000.000000MHz # revision : 2.2 (pvr 004e 0202) # # timebase : 512000000 # platform : pSeries # model : IBM,9008-22L # machine : CHRP IBM,9008-22L # MMU : Hash # # Before patchset: # # $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name # SMT # CACHE # DIE # NUMA # # $ head /proc/schedstat # version 15 # timestamp 4318242208 # cpu0 0 0 0 0 0 0 28077107004 4773387362 78205 # domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain3 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu1 0 0 0 0 0 0 24177439200 413887604 75393 # domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # # After patchset: # # $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name # SMT # CACHE # MC # DIE # NUMA # # $ head /proc/schedstat # version 15 # timestamp 4318242208 # cpu0 0 0 0 0 0 0 28077107004 4773387362 78205 # domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain2 00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain3 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # domain4 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # cpu1 0 0 0 0 0 0 24177439200 413887604 75393 # domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-ld --version # < git log --format=%s --max-count=1 d7d40595a2568d199396c863460cecd5ae676c34 # < make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-next_powernv_defconfig+STRICT_RWX_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- powernv_defconfig # Added to kconfig CONFIG_RELOCATABLE=n # Added to kconfig CONFIG_STRICT_KERNEL_RWX=y # < make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-next_powernv_defconfig+STRICT_RWX_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- help # make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-next_powernv_defconfig+STRICT_RWX_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- olddefconfig .config:3982:warning: override: reassigning to symbol RELOCATABLE # make -s -j 48 ARCH=powerpc O=/kisskb/build/powerpc-next_powernv_defconfig+STRICT_RWX_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- Completed OK # rm -rf /kisskb/build/powerpc-next_powernv_defconfig+STRICT_RWX_powerpc-gcc5 # Build took: 0:03:08.117833