# git rev-parse -q --verify 30bac164aca750892b93eef350439a0562a68647^{commit} 30bac164aca750892b93eef350439a0562a68647 already have revision, skipping fetch # git checkout -q -f -B kisskb 30bac164aca750892b93eef350439a0562a68647 # git clean -qxdf # < git log -1 # commit 30bac164aca750892b93eef350439a0562a68647 # Author: Linus Torvalds # Date: Thu Jan 24 09:04:37 2019 +1300 # # Revert "Change mincore() to count "mapped" pages rather than "cached" pages" # # This reverts commit 574823bfab82d9d8fa47f422778043fbb4b4f50e. # # It turns out that my hope that we could just remove the code that # exposes the cache residency status from mincore() was too optimistic. # # There are various random users that want it, and one example would be # the Netflix database cluster maintenance. To quote Josh Snyder: # # "For Netflix, losing accurate information from the mincore syscall # would lengthen database cluster maintenance operations from days to # months. We rely on cross-process mincore to migrate the contents of a # page cache from machine to machine, and across reboots. # # To do this, I wrote and maintain happycache [1], a page cache # dumper/loader tool. It is quite similar in architecture to pgfincore, # except that it is agnostic to workload. The gist of happycache's # operation is "produce a dump of residence status for each page, do # some operation, then reload exactly the same pages which were present # before." happycache is entirely dependent on accurate reporting of the # in-core status of file-backed pages, as accessed by another process. # # We primarily use happycache with Cassandra, which (like Postgres + # pgfincore) relies heavily on OS page cache to reduce disk accesses. # Because our workloads never experience a cold page cache, we are able # to provision hardware for a peak utilization level that is far lower # than the hypothetical "every query is a cache miss" peak. # # A database warmed by happycache can be ready for service in seconds # (bounded only by the performance of the drives and the I/O subsystem), # with no period of in-service degradation. By contrast, putting a # database in service without a page cache entails a potentially # unbounded period of degradation (at Netflix, the time to populate a # single node's cache via natural cache misses varies by workload from # hours to weeks). If a single node upgrade were to take weeks, then # upgrading an entire cluster would take months. Since we want to apply # security upgrades (and other things) on a somewhat tighter schedule, # we would have to develop more complex solutions to provide the same # functionality already provided by mincore. # # At the bottom line, happycache is designed to benignly exploit the # same information leak documented in the paper [2]. I think it makes # perfect sense to remove cross-process mincore functionality from # unprivileged users, but not to remove it entirely" # # We do have an alternate approach that limits the cache residency # reporting only to processes that have write permissions to the file, so # we can fix the original information leak issue that way. It involves # _adding_ code rather than removing it, which is sad, but hey, at least # we haven't found any users that would find the restrictions # unacceptable. # # So revert the optimistic first approach to make room for that alternate # fix instead. # # Reported-by: Josh Snyder # Cc: Jiri Kosina # Cc: Dominique Martinet # Cc: Andy Lutomirski # Cc: Dave Chinner # Cc: Kevin Easton # Cc: Matthew Wilcox # Cc: Cyril Hrubis # Cc: Vlastimil Babka # Cc: Tejun Heo # Cc: Kirill A. Shutemov # Cc: Daniel Gruss # Signed-off-by: Linus Torvalds # < /opt/cross/kisskb/gcc-4.6.3-nolibc/arm-unknown-linux-gnueabi/bin/arm-unknown-linux-gnueabi-gcc --version # < /opt/cross/kisskb/gcc-4.6.3-nolibc/arm-unknown-linux-gnueabi/bin/arm-unknown-linux-gnueabi-ld --version # < git log --format=%s --max-count=1 30bac164aca750892b93eef350439a0562a68647 # < make -s -j 8 ARCH=arm O=/kisskb/build/linus_colibri_pxa300_defconfig_arm CROSS_COMPILE=/opt/cross/kisskb/gcc-4.6.3-nolibc/arm-unknown-linux-gnueabi/bin/arm-unknown-linux-gnueabi- colibri_pxa300_defconfig # make -s -j 8 ARCH=arm O=/kisskb/build/linus_colibri_pxa300_defconfig_arm CROSS_COMPILE=/opt/cross/kisskb/gcc-4.6.3-nolibc/arm-unknown-linux-gnueabi/bin/arm-unknown-linux-gnueabi- /kisskb/src/fs/proc/inode.c: In function 'proc_reg_open': /kisskb/src/include/linux/list.h:65:12: warning: 'pdeo' may be used uninitialized in this function [-Wuninitialized] /kisskb/src/fs/proc/inode.c:339:21: note: 'pdeo' was declared here /kisskb/src/kernel/printk/printk.c: In function 'devkmsg_sysctl_set_loglvl': /kisskb/src/kernel/printk/printk.c:186:16: warning: 'old' may be used uninitialized in this function [-Wuninitialized] /kisskb/src/net/ipv6/ip6_output.c: In function '__ip6_append_data': /kisskb/src/include/linux/skbuff.h:1338:6: warning: 'extra_uref' may be used uninitialized in this function [-Wuninitialized] /kisskb/src/net/ipv6/ip6_output.c:1270:14: note: 'extra_uref' was declared here /kisskb/src/net/ipv4/ip_output.c: In function '__ip_append_data': /kisskb/src/include/linux/skbuff.h:1338:6: warning: 'extra_uref' may be used uninitialized in this function [-Wuninitialized] /kisskb/src/net/ipv4/ip_output.c:885:14: note: 'extra_uref' was declared here Completed OK # rm -rf /kisskb/build/linus_colibri_pxa300_defconfig_arm # Build took: 0:01:02.309975