# git rev-parse -q --verify 0df82189bc42037678fa590a77ed0116f428c90d^{commit} 0df82189bc42037678fa590a77ed0116f428c90d already have revision, skipping fetch # git checkout -q -f -B kisskb 0df82189bc42037678fa590a77ed0116f428c90d # git clean -qxdf # < git log -1 # commit 0df82189bc42037678fa590a77ed0116f428c90d # Merge: b72b5fecc1b8 f9fa0778ee73 # Author: Linus Torvalds # Date: Thu Feb 23 10:29:51 2023 -0800 # # Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux # # Pull perf tools updates from Arnaldo Carvalho de Melo: # "Miscellaneous: # # - Add Ian Rogers to MAINTAINERS as a perf tools reviewer. # # - Add support for retire latency feature (pipeline stall of a # instruction compared to the previous one, in cycles) present on # some Intel processors. # # - Add 'perf c2c' report option to show false sharing with adjacent # cachelines, to be used in machines with cacheline prefetching, # where accesses to a cacheline brings the next one too. # # - Skip 'perf test bpf' when the required kernel-debuginfo package # isn't installed. # # - Avoid d3-flame-graph package dependency in 'perf script flamegraph', # making this feature more generally available. # # - Add JSON metric events to present CPI stall cycles in Power10. # # - Assorted improvements/refactorings on the JSON metrics parsing # code. # # perf lock contention: # # - Add -o/--lock-owner option: # # $ sudo ./perf lock contention -abo -- ./perf bench sched pipe # # Running 'sched/pipe' benchmark: # # Executed 1000000 pipe operations between two processes # # Total time: 4.766 [sec] # # 4.766540 usecs/op # 209795 ops/sec # contended total wait max wait avg wait pid owner # # 403 565.32 us 26.81 us 1.40 us -1 Unknown # 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe # 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe # 1 2.03 us 2.03 us 2.03 us 5068 chrome # # The owner is unknown in most cases. Filtering only for the # mutex locks, it will more likely get the owners. # # - -S/--callstack-filter is to limit display entries having the given # string in the callstack: # # $ sudo ./perf lock contention -abv -S net sleep 1 # ... # contended total wait max wait avg wait type caller # # 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d # 0xffffffffa5dd1c60 _raw_spin_lock+0x30 # 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d # 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 # 0xffffffffa5cdac14 ip6_finish_output+0x1d4 # 0xffffffffa5cdb477 ip6_xmit+0x457 # 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 # 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a # 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd # # Please note that to have the -b option (BPF) working above one has # to build with BUILD_BPF_SKEL=1. # # - Add more 'perf test' entries to test these new features. # # perf script: # # - Add 'cgroup' field for 'perf script' output: # # $ perf record --all-cgroups -- true # $ perf script -F comm,pid,cgroup # true 337112 /user.slice/user-657345.slice/user@657345.service/... # true 337112 /user.slice/user-657345.slice/user@657345.service/... # true 337112 /user.slice/user-657345.slice/user@657345.service/... # true 337112 /user.slice/user-657345.slice/user@657345.service/... # # - Add support for showing branch speculation information in 'perf # script' and in the 'perf report' raw dump (-D). # # perf record: # # - Fix 'perf record' segfault with --overwrite and --max-size. # # perf test/bench: # # - Switch basic BPF filtering test to use syscall tracepoint to avoid # the variable number of probes inserted when using the previous # probe point (do_epoll_wait) that happens on different CPU # architectures. # # - Fix DWARF unwind test by adding non-inline to expected function in # a backtrace. # # - Use 'grep -c' where the longer form 'grep | wc -l' was being used. # # - Add getpid and execve benchmarks to 'perf bench syscall'. # # Intel PT: # # - Add support for synthesizing "cycle" events from Intel PT traces as # we support "instruction" events when Intel PT CYC packets are # available. This enables much more accurate profiles than when using # the regular 'perf record -e cycles' (the default) when the workload # lasts for very short periods (<10ms). # # - .plt symbol handling improvements, better handling IBT (in the past # MPX) done in the context of decoding Intel PT processor traces, # IFUNC symbols on x86_64, static executables, understanding .plt.got # symbols on x86_64. # # - Add a 'perf test' to test symbol resolution, part of the .plt # improvements series, this tests things like symbol size in contexts # where only the symbol start is available (kallsyms), etc. # # - Better handle auxtrace/Intel PT data when using pipe mode (perf # record sleep 1|perf report). # # - Fix symbol lookup with kcore with multiple segments match stext, # getting the symbol resolution to just show DSOs as unknown. # # ARM: # # - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace # Macrocell v4). # # - Ensure ARM64 CoreSight timestamps don't go backwards. # # - Document that ARM64 SPE (Statistical Profiling Extension) is used # with 'perf c2c/mem'. # # - Add raw decoding for ARM64 SPEv1.2 previous branch address. # # - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, # TLB, cache, branch, PE utilization and instruction mix metrics. # # - Update decoder code for OpenCSD version 1.4, on ARM64 systems. # # - Fix command line auto-complete of CPU events on aarch64. # # Build: # # - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, # as several tests use tracepoints, those should be skipped. # # - More fallout fixes for the removal of tools/lib/traceevent/. # # - Fix build error when linking with libpfm" # # * tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits) # perf tests stat_all_metrics: Change true workload to sleep workload for system wide check # perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc # perf intel-pt: Synthesize cycle events # perf c2c: Add report option to show false sharing in adjacent cachelines # perf record: Fix segfault with --overwrite and --max-size # perf stat: Avoid merging/aggregating metric counts twice # perf tools: Fix perf tool build error in util/pfm.c # perf tools: Fix auto-complete on aarch64 # perf lock contention: Support old rw_semaphore type # perf lock contention: Add -o/--lock-owner option # perf lock contention: Fix to save callstack for the default modified # perf test bpf: Skip test if kernel-debuginfo is not present # perf probe: Update the exit error codes in function try_to_find_probe_trace_event # perf script: Fix missing Retire Latency fields option documentation # perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT # perf test x86: Support the retire_lat (Retire Latency) sample_type check # perf test bpf: Check for libtraceevent support # perf script: Support Retire Latency # perf report: Support Retire Latency # perf lock contention: Support filters for different aggregation # ... # < /opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux-ld --version # < git log --format=%s --max-count=1 0df82189bc42037678fa590a77ed0116f428c90d # < make -s -j 160 ARCH=x86_64 O=/kisskb/build/linus_x86_64-allnoconfig_x86_64-gcc11 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux- allnoconfig # < make -s -j 160 ARCH=x86_64 O=/kisskb/build/linus_x86_64-allnoconfig_x86_64-gcc11 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux- help # make -s -j 160 ARCH=x86_64 O=/kisskb/build/linus_x86_64-allnoconfig_x86_64-gcc11 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux- olddefconfig # make -s -j 160 ARCH=x86_64 O=/kisskb/build/linus_x86_64-allnoconfig_x86_64-gcc11 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-11.3.0-nolibc/x86_64-linux/bin/x86_64-linux- Completed OK # rm -rf /kisskb/build/linus_x86_64-allnoconfig_x86_64-gcc11 # Build took: 0:00:52.954016