# git rev-parse -q --verify 3840cbe24cf060ea05a585ca497814609f5d47d1^{commit} 3840cbe24cf060ea05a585ca497814609f5d47d1 already have revision, skipping fetch # git checkout -q -f -B kisskb 3840cbe24cf060ea05a585ca497814609f5d47d1 # git clean -qxdf # < git log -1 # commit 3840cbe24cf060ea05a585ca497814609f5d47d1 # Author: Johannes Weiner # Date: Thu Oct 3 07:29:05 2024 -0400 # # sched: psi: fix bogus pressure spikes from aggregation race # # Brandon reports sporadic, non-sensical spikes in cumulative pressure # time (total=) when reading cpu.pressure at a high rate. This is due to # a race condition between reader aggregation and tasks changing states. # # While it affects all states and all resources captured by PSI, in # practice it most likely triggers with CPU pressure, since scheduling # events are so frequent compared to other resource events. # # The race context is the live snooping of ongoing stalls during a # pressure read. The read aggregates per-cpu records for stalls that # have concluded, but will also incorporate ad-hoc the duration of any # active state that hasn't been recorded yet. This is important to get # timely measurements of ongoing stalls. Those ad-hoc samples are # calculated on-the-fly up to the current time on that CPU; since the # stall hasn't concluded, it's expected that this is the minimum amount # of stall time that will enter the per-cpu records once it does. # # The problem is that the path that concludes the state uses a CPU clock # read that is not synchronized against aggregators; the clock is read # outside of the seqlock protection. This allows aggregators to race and # snoop a stall with a longer duration than will actually be recorded. # # With the recorded stall time being less than the last snapshot # remembered by the aggregator, a subsequent sample will underflow and # observe a bogus delta value, resulting in an erratic jump in pressure. # # Fix this by moving the clock read of the state change into the seqlock # protection. This ensures no aggregation can snoop live stalls past the # time that's recorded when the state concludes. # # Reported-by: Brandon Duffany # Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194 # Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/ # Fixes: df77430639c9 ("psi: Reduce calls to sched_clock() in psi") # Cc: stable@vger.kernel.org # Signed-off-by: Johannes Weiner # Reviewed-by: Chengming Zhou # Signed-off-by: Linus Torvalds # < /opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux-ld --version # < git log --format=%s --max-count=1 3840cbe24cf060ea05a585ca497814609f5d47d1 # make -s -j 160 ARCH=riscv O=/kisskb/build/linus_defconfig_riscv-gcc13 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux- defconfig # < make -s -j 160 ARCH=riscv O=/kisskb/build/linus_defconfig_riscv-gcc13 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux- help # make -s -j 160 ARCH=riscv O=/kisskb/build/linus_defconfig_riscv-gcc13 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux- olddefconfig # make -s -j 160 ARCH=riscv O=/kisskb/build/linus_defconfig_riscv-gcc13 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-13.1.0-nolibc/riscv64-linux/bin/riscv64-linux- Completed OK # rm -rf /kisskb/build/linus_defconfig_riscv-gcc13 # Build took: 0:01:46.219320