# git rev-parse -q --verify c228d294f2040c3a5f5965ff04d4947d0bf6e7da^{commit} c228d294f2040c3a5f5965ff04d4947d0bf6e7da already have revision, skipping fetch # git checkout -q -f -B kisskb c228d294f2040c3a5f5965ff04d4947d0bf6e7da # git clean -qxdf # < git log -1 # commit c228d294f2040c3a5f5965ff04d4947d0bf6e7da # Author: Linus Torvalds # Date: Thu Jan 31 11:10:20 2019 -0800 # # x86: explicitly align IO accesses in memcpy_{to,from}io # # In commit 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io") # I made our copy from IO space use a separate copy routine rather than # rely on the generic memcpy. I did that because our generic memory copy # isn't actually well-defined when it comes to internal access ordering or # alignment, and will in fact depend on various CPUID flags. # # In particular, the default memcpy() for a modern Intel CPU will # generally be just a "rep movsb", which works reasonably well for # medium-sized memory copies of regular RAM, since the CPU will turn it # into fairly optimized microcode. # # However, for non-cached memory and IO, "rep movs" ends up being # horrendously slow and will just do the architectural "one byte at a # time" accesses implied by the movsb. # # At the other end of the spectrum, if you _don't_ end up using the "rep # movsb" code, you'd likely fall back to the software copy, which does # overlapping accesses for the tail, and may copy things backwards. # Again, for regular memory that's fine, for IO memory not so much. # # The thinking was that clearly nobody really cared (because things # worked), but some people had seen horrible performance due to the byte # accesses, so let's just revert back to our long ago version that dod # "rep movsl" for the bulk of the copy, and then fixed up the potentially # last few bytes of the tail with "movsw/b". # # Interestingly (and perhaps not entirely surprisingly), while that was # our original memory copy implementation, and had been used before for # IO, in the meantime many new users of memcpy_*io() had come about. And # while the access patterns for the memory copy weren't well-defined (so # arguably _any_ access pattern should work), in practice the "rep movsb" # case had been very common for the last several years. # # In particular Jarkko Sakkinen reported that the memcpy_*io() change # resuled in weird errors from his Geminilake NUC TPM module. # # And it turns out that the TPM TCG accesses according to spec require # that the accesses be # # (a) done strictly sequentially # # (b) be naturally aligned # # otherwise the TPM chip will abort the PCI transaction. # # And, in fact, the tpm_crb.c driver did this: # # memcpy_fromio(buf, priv->rsp, 6); # ... # memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6); # # which really should never have worked in the first place, but back # before commit 170d13ca3a2f it *happened* to work, because the # memcpy_fromio() would be expanded to a regular memcpy, and # # (a) gcc would expand the first memcpy in-line, and turn it into a # 4-byte and a 2-byte read, and they happened to be in the right # order, and the alignment was right. # # (b) gcc would call "memcpy()" for the second one, and the machines that # had this TPM chip also apparently ended up always having ERMS # ("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep # movbs" for that copy. # # In other words, basically by pure luck, the code happened to use the # right access sizes in the (two different!) memcpy() implementations to # make it all work. # # But after commit 170d13ca3a2f, both of the memcpy_fromio() calls # resulted in a call to the routine with the consistent memory accesses, # and in both cases it started out transferring with 4-byte accesses. # Which worked for the first copy, but resulted in the second copy doing a # 32-bit read at an address that was only 2-byte aligned. # # Jarkko is actually fixing the fragile code in the TPM driver, but since # this is an excellent example of why we absolutely must not use a generic # memcpy for IO accesses, _and_ an IO-specific one really should strive to # align the IO accesses, let's do exactly that. # # Side note: Jarkko also noted that the driver had been used on ARM # platforms, and had worked. That was because on 32-bit ARM, memcpy_*io() # ends up always doing byte accesses, and on 64-bit ARM it first does byte # accesses to align to 8-byte boundaries, and then does 8-byte accesses # for the bulk. # # So ARM actually worked by design, and the x86 case worked by pure luck. # # We *might* want to make x86-64 do the 8-byte case too. That should be a # pretty straightforward extension, but let's do one thing at a time. And # generally MMIO accesses aren't really all that performance-critical, as # shown by the fact that for a long time we just did them a byte at a # time, and very few people ever noticed. # # Reported-and-tested-by: Jarkko Sakkinen # Tested-by: Jerry Snitselaar # Cc: David Laight # Fixes: 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io") # Signed-off-by: Linus Torvalds # < /opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux-gcc --version # < /opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux-ld --version # < git log --format=%s --max-count=1 c228d294f2040c3a5f5965ff04d4947d0bf6e7da # < make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- allmodconfig # Added to kconfig CONFIG_64BIT=n # Added to kconfig CONFIG_BUILD_DOCSRC=n # Added to kconfig CONFIG_HAVE_FTRACE_MCOUNT_RECORD=n # Added to kconfig CONFIG_SAMPLES=n # Added to kconfig CONFIG_MODULE_SIG=n # yes \n | make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- oldconfig yes: standard output: Broken pipe # make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- :1336:2: warning: #warning syscall rseq not implemented [-Wcpp] /kisskb/src/net/core/sysctl_net_core.c:285:1: warning: 'proc_dointvec_minmax_bpf_restricted' defined but not used [-Wunused-function] proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write, ^ In file included from /kisskb/src/arch/sparc/include/asm/cmpxchg.h:7:0, from /kisskb/src/arch/sparc/include/asm/atomic_32.h:17, from /kisskb/src/arch/sparc/include/asm/atomic.h:7, from /kisskb/src/include/linux/atomic.h:7, from /kisskb/src/include/asm-generic/bitops/lock.h:5, from /kisskb/src/arch/sparc/include/asm/bitops_32.h:102, from /kisskb/src/arch/sparc/include/asm/bitops.h:7, from /kisskb/src/include/linux/bitops.h:19, from /kisskb/src/include/linux/kernel.h:11, from /kisskb/src/include/linux/list.h:9, from /kisskb/src/include/linux/wait.h:7, from /kisskb/src/include/linux/wait_bit.h:8, from /kisskb/src/include/linux/fs.h:6, from /kisskb/src/fs/ocfs2/file.c:27: /kisskb/src/fs/ocfs2/file.c: In function 'ocfs2_file_write_iter': /kisskb/src/arch/sparc/include/asm/cmpxchg_32.h:28:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) ^ /kisskb/src/fs/ocfs2/file.c:2386:3: note: in expansion of macro 'xchg' xchg(&iocb->ki_complete, saved_ki_complete); ^ /kisskb/src/drivers/char/tpm/tpm2-cmd.c: In function 'tpm2_unseal_trusted': /kisskb/src/drivers/char/tpm/tpm2-cmd.c:668:2: warning: 'blob_handle' may be used uninitialized in this function [-Wmaybe-uninitialized] tpm2_flush_context_cmd(chip, blob_handle, TPM_TRANSMIT_UNLOCKED); ^ /kisskb/src/drivers/i2c/i2c-core-base.c: In function 'i2c_generic_scl_recovery': /kisskb/src/drivers/i2c/i2c-core-base.c:235:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] if (ret == -EOPNOTSUPP) ^ /kisskb/src/drivers/input/joystick/analog.c:172:2: warning: #warning Precise timer not defined for this architecture. [-Wcpp] #warning Precise timer not defined for this architecture. ^ In file included from /kisskb/src/include/linux/printk.h:7:0, from /kisskb/src/include/linux/kernel.h:14, from /kisskb/src/include/linux/list.h:9, from /kisskb/src/include/linux/rculist.h:10, from /kisskb/src/include/linux/sched/signal.h:5, from /kisskb/src/drivers/net/usb/hso.c:55: /kisskb/src/drivers/net/usb/hso.c: In function 'hso_serial_set_termios': /kisskb/src/include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 4 has type 'tcflag_t {aka long unsigned int}' [-Wformat=] #define KERN_SOH "\001" /* ASCII Start Of Header */ ^ /kisskb/src/include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH' #define KERN_INFO KERN_SOH "6" /* informational */ ^ /kisskb/src/include/linux/printk.h:310:9: note: in expansion of macro 'KERN_INFO' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^ /kisskb/src/drivers/net/usb/hso.c:115:3: note: in expansion of macro 'pr_info' pr_info("[%d:%s] " fmt, \ ^ /kisskb/src/drivers/net/usb/hso.c:1406:3: note: in expansion of macro 'hso_dbg' hso_dbg(0x16, "Termios called with: cflags new[%d] - old[%d]\n", ^ /kisskb/src/include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 5 has type 'tcflag_t {aka long unsigned int}' [-Wformat=] #define KERN_SOH "\001" /* ASCII Start Of Header */ ^ /kisskb/src/include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH' #define KERN_INFO KERN_SOH "6" /* informational */ ^ /kisskb/src/include/linux/printk.h:310:9: note: in expansion of macro 'KERN_INFO' printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^ /kisskb/src/drivers/net/usb/hso.c:115:3: note: in expansion of macro 'pr_info' pr_info("[%d:%s] " fmt, \ ^ /kisskb/src/drivers/net/usb/hso.c:1406:3: note: in expansion of macro 'hso_dbg' hso_dbg(0x16, "Termios called with: cflags new[%d] - old[%d]\n", ^ /kisskb/src/drivers/tty/serial/sunzilog.c:1132:13: warning: 'sunzilog_putchar' defined but not used [-Wunused-function] static void sunzilog_putchar(struct uart_port *port, int ch) ^ In file included from /kisskb/src/include/linux/rwsem.h:16:0, from /kisskb/src/include/linux/notifier.h:15, from /kisskb/src/include/linux/clk.h:17, from /kisskb/src/drivers/tty/serial/sh-sci.c:24: /kisskb/src/drivers/tty/serial/sh-sci.c: In function 'sci_submit_rx': /kisskb/src/include/linux/spinlock.h:279:3: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized] _raw_spin_unlock_irqrestore(lock, flags); \ ^ /kisskb/src/drivers/tty/serial/sh-sci.c:1338:16: note: 'flags' was declared here unsigned long flags; ^ /kisskb/src/drivers/scsi/myrs.c: In function 'myrs_log_event': /kisskb/src/drivers/scsi/myrs.c:821:24: warning: 'sshdr.sense_key' may be used uninitialized in this function [-Wmaybe-uninitialized] struct scsi_sense_hdr sshdr; ^ In file included from /kisskb/src/arch/sparc/include/asm/cmpxchg.h:7:0, from /kisskb/src/arch/sparc/include/asm/atomic_32.h:17, from /kisskb/src/arch/sparc/include/asm/atomic.h:7, from /kisskb/src/include/linux/atomic.h:7, from /kisskb/src/include/asm-generic/bitops/lock.h:5, from /kisskb/src/arch/sparc/include/asm/bitops_32.h:102, from /kisskb/src/arch/sparc/include/asm/bitops.h:7, from /kisskb/src/include/linux/bitops.h:19, from /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:11: /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function 'ixgbevf_xdp_setup': /kisskb/src/arch/sparc/include/asm/cmpxchg_32.h:28:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) ^ /kisskb/src/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4471:4: note: in expansion of macro 'xchg' xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog); ^ WARNING: EXPORT symbol "__lshrdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_write_enter" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ashldi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ndelay" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__udelay" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__ashrdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_try" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__muldi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__divdi3" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "bzero_1page" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_enter" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "___rw_read_exit" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "__copy_1page" [vmlinux] version generation failed, symbol will not be versioned. arch/sparc/kernel/head_32.o: In function `current_pc': arch/sparc/kernel/head_32.o:(.head.text+0x5040): relocation truncated to fit: R_SPARC_WDISP22 against `.init.text' arch/sparc/kernel/head_32.o: In function `halt_notsup': arch/sparc/kernel/head_32.o:(.head.text+0x5100): relocation truncated to fit: R_SPARC_WDISP22 against `.init.text' arch/sparc/kernel/head_32.o: In function `leon_init': arch/sparc/kernel/head_32.o:(.init.text+0xa4): relocation truncated to fit: R_SPARC_WDISP22 against symbol `leon_smp_cpu_startup' defined in .text section in arch/sparc/kernel/trampoline_32.o arch/sparc/kernel/process_32.o:(.fixup+0x4): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/process_32.o:(.fixup+0xc): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x4): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x10): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x1c): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x28): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x34): relocation truncated to fit: R_SPARC_WDISP22 against `.text' arch/sparc/kernel/signal_32.o:(.fixup+0x40): additional relocation overflows omitted from the output make[1]: *** [/kisskb/src/Makefile:1021: vmlinux] Error 1 make: *** [Makefile:152: sub-make] Error 2 Command 'make -s -j 48 ARCH=sparc O=/kisskb/build/linus_sparc-allmodconfig_sparc64 CROSS_COMPILE=/opt/cross/kisskb/br-sparc64-full-2016.08-613-ge98b4dd/bin/sparc64-linux- ' returned non-zero exit status 2 # rm -rf /kisskb/build/linus_sparc-allmodconfig_sparc64 # Build took: 0:08:51.469300