# git rev-parse -q --verify d465bff130bf4ca17b6980abe51164ace1e0cba4^{commit} d465bff130bf4ca17b6980abe51164ace1e0cba4 already have revision, skipping fetch # git checkout -q -f -B kisskb d465bff130bf4ca17b6980abe51164ace1e0cba4 # git clean -qxdf # < git log -1 # commit d465bff130bf4ca17b6980abe51164ace1e0cba4 # Merge: 041bc24d867a d79310700590 # Author: Linus Torvalds # Date: Tue Oct 11 15:02:25 2022 -0700 # # Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux # # Pull perf tools updates from Arnaldo Carvalho de Melo: # # - Add support for AMD on 'perf mem' and 'perf c2c', the kernel # enablement patches went via tip. # # Example: # # $ sudo perf mem record -- -c 10000 # ^C[ perf record: Woken up 227 times to write data ] # [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] # # $ sudo perf mem report -F mem,sample,snoop # Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 # Memory access Samples Snoop # N/A 700620 N/A # L1 hit 126675 N/A # L2 hit 424 N/A # L3 hit 664 HitM # L3 hit 10 N/A # Local RAM hit 2 N/A # Remote RAM (1 hop) hit 8558 N/A # Remote Cache (1 hop) hit 3 N/A # Remote Cache (1 hop) hit 2 HitM # Remote Cache (2 hops) hit 10 HitM # Remote Cache (2 hops) hit 6 N/A # Uncached hit 4 N/A # $ # # - "perf lock" improvements: # # - Add -E/--entries option to limit the number of entries to # display, say to ask for just the top 5 contended locks. # # - Add -q/--quiet option to suppress header and debug messages. # # - Add a 'perf test' kernel lock contention entry to test 'perf # lock'. # # - "perf lock contention" improvements: # # - Ask BPF's bpf_get_stackid() to skip some callchain entries. # # The ones closer to the tooling are bpf related and not that # interesting, the ones calling the locking function are the ones # we're interested in, example of a full, unskipped callstack: # # - Allow changing the callstack depth and number of entries to skip. # # 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb # 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 # 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 # 0xffffffffbb8b8e75 bpf_trace_run2+0x35 # 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb # 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 # 0xffffffffbc1c26ff _raw_spin_lock+0x1f # 0xffffffffbb841015 tick_do_update_jiffies64+0x25 # 0xffffffffbb8409ee tick_irq_enter+0x9e # # - Show full callstack in verbose mode (-v option), sometimes this # is desirable instead of showing just one callstack entry. # # - Allow multiple time ranges in 'perf record --delay' to help in # reducing the amount of data collected from hardware tracing (Intel # PT, etc) when there is a rough idea of periods of time where events # of interest take time. # # - Add Intel PT to record only decoder debug messages when error # happens. # # - Improve layout of Intel PT man page. # # - Add new branch types: alignment, data and inst faults and arch # specific ones, such as fiq, debug_halt, debug_exit, debug_inst and # debug_data on arm64. # # Kernel enablement went thru the tip tree. # # - Fix 'perf probe' error log check in 'perf test' when no debuginfo is # available. # # - Fix 'perf stat' aggregation mode logic, it should be looking at the # CPU not at the core number. # # - Fix flags parsing in 'perf trace' filters. # # - Introduce compact encoding of CPU range encoding on perf.data, to # avoid having a bitmap with all the CPUs. # # - Improvements to the 'perf stat' metrics, including adding # "core_wide", and computing "smt" from the CPU topology. # # - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, # that allows tooling to ask for the precise number of lost samples for # a given event. # # - Add 'addr' sort key to see just the address of sampled instructions: # # $ perf record -o- true | perf report -i- -s addr # [ perf record: Woken up 1 times to write data ] # [ perf record: Captured and wrote 0.000 MB - ] # # Samples: 12 of event 'cycles:u' # # Event count (approx.): 252512 # # # # Overhead Address # # ........ .................. # 42.96% 0x7f96f08443d7 # 29.55% 0x7f96f0859b50 # 14.76% 0x7f96f0852e02 # 8.30% 0x7f96f0855028 # 4.43% 0xffffffff8de01087 # # perf annotate: Toggle full address <-> offset display # # - Add 'f' hotkey to the 'perf annotate' TUI interface when in # 'disassembler output' mode ('o' hotkey) to toggle showing full # virtual address or just the offset. # # - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for # pre-existing threads, at the start of a 'perf record' session, # speeding up that record startup phase. # # - Add a command line option to specify build ids in 'perf inject'. # # - Update JSON event files for the Intel alderlake, broadwell, # broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, # icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, # skylake, skylakex, and tigerlake processors. # # - Update vendor JSON event files for the ARM Neoverse V1 and E1 # platforms. # # - Add a 'perf test' entry for 'perf mem' where a struct has false # sharing and this gets detected in the 'perf mem' output, tested with # Intel, AMD and ARM64 systems. # # - Add a 'perf test' entry to test the resolution of java symbols, where # an output like this is expected: # # 8.18% jshell jitted-50116-29.so [.] Interpreter # 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) # # - Add tests for the ARM64 CoreSight hardware tracing feature, with # specially crafted pureloop, memcpy, thread loop and unroll tread that # then gets traced and the output compared with expected output. # # Documentation explaining it is also included. # # - Add per thread Intel PT 'perf test' entry to check that # PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a # mixture of per thread and per CPU events and mmaps, verify that this # gets all recorded correctly. # # - Introduce pthread mutex wrappers to allow for building with clang's # -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" # "lockable", "exclusive_lock_function", "exclusive_trylock_function", # "exclusive_locks_required", and "no_thread_safety_analysis" compiler # function attributes. # # - Fix empty version number when building outside of a git repo. # # - Improve feature detection display when multiple versions of a feature # are present, such as for binutils libbfd, that has a mix of possible # ways to detect according to the Linux distribution. # # Previously in some cases we had: # # Auto-detecting system features # # ... libbfd: [ on ] # ... libbfd-liberty: [ on ] # ... libbfd-liberty-z: [ on ] # # # Now for this case we show just the main feature: # # Auto-detecting system features # # ... libbfd: [ on ] # # # - Remove some unused structs, variables, macros, function prototypes # and includes from various places. # # * tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) # perf script: Add missing fields in usage hint # perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB # perf mem/c2c: Avoid printing empty lines for unsupported events # perf mem/c2c: Add load store event mappings for AMD # perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events # perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO} # perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel # tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel # perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout() # perf test coresight: Add relevant documentation about ARM64 CoreSight testing # perf test: Add git ignore for tmp and output files of ARM CoreSight tests # perf test coresight: Add unroll thread test shell script # perf test coresight: Add unroll thread test tool # perf test coresight: Add thread loop test shell scripts # perf test coresight: Add thread loop test tool # perf test coresight: Add memcpy thread test shell script # perf test coresight: Add memcpy thread test tool # perf test: Add git ignore for perf data generated by the ARM CoreSight tests # perf test: Add arm64 asm pureloop test shell script # perf test: Add asm pureloop test tool # ... # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux-ld --version # < git log --format=%s --max-count=1 d465bff130bf4ca17b6980abe51164ace1e0cba4 # < make -s -j 32 ARCH=sparc64 O=/kisskb/build/linus_sparc64-allnoconfig_sparc64-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux- allnoconfig # < make -s -j 32 ARCH=sparc64 O=/kisskb/build/linus_sparc64-allnoconfig_sparc64-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux- help # make -s -j 32 ARCH=sparc64 O=/kisskb/build/linus_sparc64-allnoconfig_sparc64-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux- olddefconfig # make -s -j 32 ARCH=sparc64 O=/kisskb/build/linus_sparc64-allnoconfig_sparc64-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/sparc64-linux/bin/sparc64-linux- :1517:2: warning: #warning syscall clone3 not implemented [-Wcpp] kernel: arch/sparc/boot/image is ready kernel: arch/sparc/boot/zImage is ready Completed OK # rm -rf /kisskb/build/linus_sparc64-allnoconfig_sparc64-gcc5 # Build took: 0:00:21.403388