# git rev-parse -q --verify 10dce8af34226d90fa56746a934f8da5dcdba3df^{commit} 10dce8af34226d90fa56746a934f8da5dcdba3df already have revision, skipping fetch # git checkout -q -f -B kisskb 10dce8af34226d90fa56746a934f8da5dcdba3df # git clean -qxdf # < git log -1 # commit 10dce8af34226d90fa56746a934f8da5dcdba3df # Author: Kirill Smelkov # Date: Tue Mar 26 22:20:43 2019 +0000 # # fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock # # Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added # locking for file.f_pos access and in particular made concurrent read and # write not possible - now both those functions take f_pos lock for the # whole run, and so if e.g. a read is blocked waiting for data, write will # deadlock waiting for that read to complete. # # This caused regression for stream-like files where previously read and # write could run simultaneously, but after that patch could not do so # anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes # to /proc/xen/xenbus") which fixes such regression for particular case of # /proc/xen/xenbus. # # The patch that added f_pos lock in 2014 did so to guarantee POSIX thread # safety for read/write/lseek and added the locking to file descriptors of # all regular files. In 2014 that thread-safety problem was not new as it # was already discussed earlier in 2006. # # However even though 2006'th version of Linus's patch was adding f_pos # locking "only for files that are marked seekable with FMODE_LSEEK (thus # avoiding the stream-like objects like pipes and sockets)", the 2014 # version - the one that actually made it into the tree as 9c225f2655e3 - # is doing so irregardless of whether a file is seekable or not. # # See # # https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ # https://lwn.net/Articles/180387 # https://lwn.net/Articles/180396 # # for historic context. # # The reason that it did so is, probably, that there are many files that # are marked non-seekable, but e.g. their read implementation actually # depends on knowing current position to correctly handle the read. Some # examples: # # kernel/power/user.c snapshot_read # fs/debugfs/file.c u32_array_read # fs/fuse/control.c fuse_conn_waiting_read + ... # drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read # arch/s390/hypfs/inode.c hypfs_read_iter # ... # # Despite that, many nonseekable_open users implement read and write with # pure stream semantics - they don't depend on passed ppos at all. And for # those cases where read could wait for something inside, it creates a # situation similar to xenbus - the write could be never made to go until # read is done, and read is waiting for some, potentially external, event, # for potentially unbounded time -> deadlock. # # Besides xenbus, there are 14 such places in the kernel that I've found # with semantic patch (see below): # # drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() # drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() # drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() # drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() # net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() # drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() # drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() # drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() # net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() # drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() # drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() # drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() # drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() # drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() # # In addition to the cases above another regression caused by f_pos # locking is that now FUSE filesystems that implement open with # FOPEN_NONSEEKABLE flag, can no longer implement bidirectional # stream-like files - for the same reason as above e.g. read can deadlock # write locking on file.f_pos in the kernel. # # FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: # implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp # in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and # write routines not depending on current position at all, and with both # read and write being potentially blocking operations: # # See # # https://github.com/libfuse/osspd # https://lwn.net/Articles/308445 # # https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 # https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 # https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 # # Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as # "somewhat pipe-like files ..." with read handler not using offset. # However that test implements only read without write and cannot exercise # the deadlock scenario: # # https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 # https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 # https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 # # I've actually hit the read vs write deadlock for real while implementing # my FUSE filesystem where there is /head/watch file, for which open # creates separate bidirectional socket-like stream in between filesystem # and its user with both read and write being later performed # simultaneously. And there it is semantically not easy to split the # stream into two separate read-only and write-only channels: # # https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 # # Let's fix this regression. The plan is: # # 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - # doing so would break many in-kernel nonseekable_open users which # actually use ppos in read/write handlers. # # 2. Add stream_open() to kernel to open stream-like non-seekable file # descriptors. Read and write on such file descriptors would never use # nor change ppos. And with that property on stream-like files read and # write will be running without taking f_pos lock - i.e. read and write # could be running simultaneously. # # 3. With semantic patch search and convert to stream_open all in-kernel # nonseekable_open users for which read and write actually do not # depend on ppos and where there is no other methods in file_operations # which assume @offset access. # # 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via # steam_open if that bit is present in filesystem open reply. # # It was tempting to change fs/fuse/ open handler to use stream_open # instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but # grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, # and in particular GVFS which actually uses offset in its read and # write handlers # # https://codesearch.debian.net/search?q=-%3Enonseekable+%3D # https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 # https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 # https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 # # so if we would do such a change it will break a real user. # # 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting # from v3.14+ (the kernel where 9c225f2655 first appeared). # # This will allow to patch OSSPD and other FUSE filesystems that # provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE # in their open handler and this way avoid the deadlock on all kernel # versions. This should work because fs/fuse/ ignores unknown open # flags returned from a filesystem and so passing FOPEN_STREAM to a # kernel that is not aware of this flag cannot hurt. In turn the kernel # that is not aware of FOPEN_STREAM will be < v3.14 where just # FOPEN_NONSEEKABLE is sufficient to implement streams without read vs # write deadlock. # # This patch adds stream_open, converts /proc/xen/xenbus to it and adds # semantic patch to automatically locate in-kernel places that are either # required to be converted due to read vs write deadlock, or that are just # safe to be converted because read and write do not use ppos and there # are no other funky methods in file_operations. # # Regarding semantic patch I've verified each generated change manually - # that it is correct to convert - and each other nonseekable_open instance # left - that it is either not correct to convert there, or that it is not # converted due to current stream_open.cocci limitations. # # The script also does not convert files that should be valid to convert, # but that currently have .llseek = noop_llseek or generic_file_llseek for # unknown reason despite file being opened with nonseekable_open (e.g. # drivers/input/mousedev.c) # # Cc: Michael Kerrisk # Cc: Yongzhi Pan # Cc: Jonathan Corbet # Cc: David Vrabel # Cc: Juergen Gross # Cc: Miklos Szeredi # Cc: Tejun Heo # Cc: Kirill Tkhai # Cc: Arnd Bergmann # Cc: Christoph Hellwig # Cc: Greg Kroah-Hartman # Cc: Julia Lawall # Cc: Nikolaus Rath # Cc: Han-Wen Nienhuys # Signed-off-by: Kirill Smelkov # Signed-off-by: Linus Torvalds # < /opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux-gcc --version # < /opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux-ld --version # < git log --format=%s --max-count=1 10dce8af34226d90fa56746a934f8da5dcdba3df # < make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- allmodconfig # Added to kconfig CONFIG_64BIT=n # Added to kconfig CONFIG_BUILD_DOCSRC=n # Added to kconfig CONFIG_HAVE_FTRACE_MCOUNT_RECORD=n # Added to kconfig CONFIG_SAMPLES=n # Added to kconfig CONFIG_MODULE_SIG=n # yes \n | make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- oldconfig yes: standard output: Broken pipe # make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- :1478:2: warning: #warning syscall pidfd_send_signal not implemented [-Wcpp] :1481:2: warning: #warning syscall io_uring_setup not implemented [-Wcpp] :1484:2: warning: #warning syscall io_uring_enter not implemented [-Wcpp] :1487:2: warning: #warning syscall io_uring_register not implemented [-Wcpp] /kisskb/src/mm/mprotect.c: In function 'change_pte_range': /kisskb/src/mm/mprotect.c:42:20: warning: unused variable 'mm' [-Wunused-variable] struct mm_struct *mm = vma->vm_mm; ^ In file included from /kisskb/src/arch/sparc/include/asm/cmpxchg.h:7:0, from /kisskb/src/arch/sparc/include/asm/atomic_32.h:17, from /kisskb/src/arch/sparc/include/asm/atomic.h:7, from /kisskb/src/include/linux/atomic.h:7, from /kisskb/src/include/asm-generic/bitops/lock.h:5, from /kisskb/src/arch/sparc/include/asm/bitops_32.h:102, from /kisskb/src/arch/sparc/include/asm/bitops.h:7, from /kisskb/src/include/linux/bitops.h:19, from /kisskb/src/include/linux/kernel.h:12, from /kisskb/src/include/linux/list.h:9, from /kisskb/src/include/linux/wait.h:7, from /kisskb/src/include/linux/wait_bit.h:8, from /kisskb/src/include/linux/fs.h:6, from /kisskb/src/fs/ocfs2/file.c:27: /kisskb/src/fs/ocfs2/file.c: In function 'ocfs2_file_write_iter': /kisskb/src/arch/sparc/include/asm/cmpxchg_32.h:28:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) ^ /kisskb/src/fs/ocfs2/file.c:2386:3: note: in expansion of macro 'xchg' xchg(&iocb->ki_complete, saved_ki_complete); ^ /kisskb/src/net/core/sysctl_net_core.c:294:1: warning: 'proc_dointvec_minmax_bpf_restricted' defined but not used [-Wunused-function] proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write, ^ /kisskb/src/drivers/char/tpm/tpm2-cmd.c: In function 'tpm2_unseal_trusted': /kisskb/src/drivers/char/tpm/tpm2-cmd.c:670:2: warning: 'blob_handle' may be used uninitialized in this function [-Wmaybe-uninitialized] tpm2_flush_context(chip, blob_handle); ^ /kisskb/src/drivers/input/joystick/analog.c:172:2: warning: #warning Precise timer not defined for this architecture. [-Wcpp] #warning Precise timer not defined for this architecture. ^ /kisskb/src/drivers/i2c/busses/i2c-sh_mobile.c: In function 'sh_mobile_i2c_isr': /kisskb/src/drivers/i2c/busses/i2c-sh_mobile.c:399:26: warning: 'data' may be used uninitialized in this function [-Wmaybe-uninitialized] pd->msg->buf[real_pos] = data; ^ /kisskb/src/drivers/i2c/busses/i2c-sh_mobile.c:372:16: note: 'data' was declared here unsigned char data; ^ /kisskb/src/drivers/i2c/i2c-core-base.c: In function 'i2c_generic_scl_recovery': /kisskb/src/drivers/i2c/i2c-core-base.c:235:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] if (ret == -EOPNOTSUPP) ^ /kisskb/src/drivers/tty/serial/sunzilog.c:1132:13: warning: 'sunzilog_putchar' defined but not used [-Wunused-function] static void sunzilog_putchar(struct uart_port *port, int ch) ^ In file included from /kisskb/src/include/linux/rwsem.h:16:0, from /kisskb/src/include/linux/notifier.h:15, from /kisskb/src/include/linux/clk.h:17, from /kisskb/src/drivers/tty/serial/sh-sci.c:24: /kisskb/src/drivers/tty/serial/sh-sci.c: In function 'sci_dma_rx_submit': /kisskb/src/include/linux/spinlock.h:279:3: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized] _raw_spin_unlock_irqrestore(lock, flags); \ ^ /kisskb/src/drivers/tty/serial/sh-sci.c:1353:16: note: 'flags' was declared here unsigned long flags; ^ In file included from /kisskb/src/include/linux/printk.h:7:0, from /kisskb/src/include/linux/kernel.h:15, from /kisskb/src/include/linux/list.h:9, from /kisskb/src/include/linux/rculist.h:10, from /kisskb/src/include/linux/sched/signal.h:5, from /kisskb/src/drivers/net/usb/hso.c:55: /kisskb/src/drivers/net/usb/hso.c: In function 'hso_serial_set_termios': /kisskb/src/include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 4 has type 'tcflag_t {aka long unsigned int}' [-Wformat=] #define KERN_SOH "\001" /* ASCII Start Of Header */ ^ /kisskb/src/include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH' #define KERN_INFO KERN_SOH "6" /* informational */ ^ /kisskb/src/include/linux/printk.h:309:9: note: in expansion of macro 'KERN_INFO' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^ /kisskb/src/drivers/net/usb/hso.c:115:3: note: in expansion of macro 'pr_info' pr_info("[%d:%s] " fmt, \ ^ /kisskb/src/drivers/net/usb/hso.c:1406:3: note: in expansion of macro 'hso_dbg' hso_dbg(0x16, "Termios called with: cflags new[%d] - old[%d]\n", ^ /kisskb/src/include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 5 has type 'tcflag_t {aka long unsigned int}' [-Wformat=] #define KERN_SOH "\001" /* ASCII Start Of Header */ ^ /kisskb/src/include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH' #define KERN_INFO KERN_SOH "6" /* informational */ ^ /kisskb/src/include/linux/printk.h:309:9: note: in expansion of macro 'KERN_INFO' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^ /kisskb/src/drivers/net/usb/hso.c:115:3: note: in expansion of macro 'pr_info' pr_info("[%d:%s] " fmt, \ ^ /kisskb/src/drivers/net/usb/hso.c:1406:3: note: in expansion of macro 'hso_dbg' hso_dbg(0x16, "Termios called with: cflags new[%d] - old[%d]\n", ^ /kisskb/src/drivers/scsi/myrs.c: In function 'myrs_log_event': /kisskb/src/drivers/scsi/myrs.c:821:24: warning: 'sshdr.sense_key' may be used uninitialized in this function [-Wmaybe-uninitialized] struct scsi_sense_hdr sshdr; ^ In file included from /kisskb/src/arch/sparc/include/asm/cmpxchg.h:7:0, from /kisskb/src/arch/sparc/include/asm/atomic_32.h:17, from /kisskb/src/arch/sparc/include/asm/atomic.h:7, from /kisskb/src/include/linux/atomic.h:7, from /kisskb/src/include/asm-generic/bitops/lock.h:5, from /kisskb/src/arch/sparc/include/asm/bitops_32.h:102, from /kisskb/src/arch/sparc/include/asm/bitops.h:7, from /kisskb/src/include/linux/bitops.h:19, from /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:11: /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function 'ixgbevf_xdp_setup': /kisskb/src/arch/sparc/include/asm/cmpxchg_32.h:28:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) ^ /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4471:4: note: in expansion of macro 'xchg' xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog); ^ WARNING: EXPORT symbol "__lshrdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_write_enter" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ashldi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ndelay" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__udelay" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ashrdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_try" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__muldi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__divdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "bzero_1page" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_enter" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_exit" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__copy_1page" [vmlinux] version generation failed, symbol will not be versioned. arch/sparc/kernel/head_32.o: In function `current_pc': arch/sparc/kernel/head_32.o:(.head.text+0x5040): relocation truncated to fit: R_SPARC_WDISP22 against `.init.text' arch/sparc/kernel/head_32.o: In function `halt_notsup': arch/sparc/kernel/head_32.o:(.head.text+0x5100): relocation truncated to fit: R_SPARC_WDISP22 against `.init.text' arch/sparc/kernel/head_32.o: In function `leon_init': arch/sparc/kernel/head_32.o:(.init.text+0xa4): relocation truncated to fit: R_SPARC_WDISP22 against symbol `leon_smp_cpu_startup' defined in .text section in arch/sparc/kernel/trampoline_32.o arch/sparc/kernel/process_32.o:(.fixup+0x4): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/process_32.o:(.fixup+0xc): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x4): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x10): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x1c): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x28): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x34): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x40): additional relocation overflows omitted from the output make[1]: *** [/kisskb/src/Makefile:1029: vmlinux] Error 1 make: *** [Makefile:169: sub-make] Error 2 Command 'make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- ' returned non-zero exit status 2 # rm -rf /kisskb/build/linus_sparc-allmodconfig_sparc64 # Build took: 0:09:04.486289