# git rev-parse -q --verify 7f92891778dff62303c070ac81de7b7d80de331a^{commit} 7f92891778dff62303c070ac81de7b7d80de331a already have revision, skipping fetch # git checkout -q -f -B kisskb 7f92891778dff62303c070ac81de7b7d80de331a # git clean -qxdf # < git log -1 # commit 7f92891778dff62303c070ac81de7b7d80de331a # Author: Alexey Kardashevskiy # Date: Thu Dec 20 12:10:36 2018 +1100 # # vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver # # POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not # pluggable PCIe devices but still have PCIe links which are used # for config space and MMIO. In addition to that the GPUs have 6 NVLinks # which are connected to other GPUs and the POWER9 CPU. POWER9 chips # have a special unit on a die called an NPU which is an NVLink2 host bus # adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. # These systems also support ATS (address translation services) which is # a part of the NVLink2 protocol. Such GPUs also share on-board RAM # (16GB or 32GB) to the system via the same NVLink2 so a CPU has # cache-coherent access to a GPU RAM. # # This exports GPU RAM to the userspace as a new VFIO device region. This # preregisters the new memory as device memory as it might be used for DMA. # This inserts pfns from the fault handler as the GPU memory is not onlined # until the vendor driver is loaded and trained the NVLinks so doing this # earlier causes low level errors which we fence in the firmware so # it does not hurt the host system but still better be avoided; for the same # reason this does not map GPU RAM into the host kernel (usual thing for # emulated access otherwise). # # This exports an ATSD (Address Translation Shootdown) register of NPU which # allows TLB invalidations inside GPU for an operating system. The register # conveniently occupies a single 64k page. It is also presented to # the userspace as a new VFIO device region. One NPU has 8 ATSD registers, # each of them can be used for TLB invalidation in a GPU linked to this NPU. # This allocates one ATSD register per an NVLink bridge allowing passing # up to 6 registers. Due to the host firmware bug (just recently fixed), # only 1 ATSD register per NPU was actually advertised to the host system # so this passes that alone register via the first NVLink bridge device in # the group which is still enough as QEMU collects them all back and # presents to the guest via vPHB to mimic the emulated NPU PHB on the host. # # In order to provide the userspace with the information about GPU-to-NVLink # connections, this exports an additional capability called "tgt" # (which is an abbreviated host system bus address). The "tgt" property # tells the GPU its own system address and allows the guest driver to # conglomerate the routing information so each GPU knows how to get directly # to the other GPUs. # # For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to # know LPID (a logical partition ID or a KVM guest hardware ID in other # words) and PID (a memory context ID of a userspace process, not to be # confused with a linux pid). This assigns a GPU to LPID in the NPU and # this is why this adds a listener for KVM on an IOMMU group. A PID comes # via NVLink from a GPU and NPU uses a PID wildcard to pass it through. # # This requires coherent memory and ATSD to be available on the host as # the GPU vendor only supports configurations with both features enabled # and other configurations are known not to work. Because of this and # because of the ways the features are advertised to the host system # (which is a device tree with very platform specific properties), # this requires enabled POWERNV platform. # # The V100 GPUs do not advertise any of these capabilities via the config # space and there are more than just one device ID so this relies on # the platform to tell whether these GPUs have special abilities such as # NVLinks. # # Signed-off-by: Alexey Kardashevskiy # Acked-by: Alex Williamson # Signed-off-by: Michael Ellerman # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc --version # < /opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux-ld --version # < git log --format=%s --max-count=1 7f92891778dff62303c070ac81de7b7d80de331a # < make -s -j 120 ARCH=powerpc O=/kisskb/build/powerpc-next_mpc8272_ads_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- mpc8272_ads_defconfig # make -s -j 120 ARCH=powerpc O=/kisskb/build/powerpc-next_mpc8272_ads_defconfig_powerpc-gcc5 CROSS_COMPILE=/opt/cross/kisskb/korg/gcc-5.5.0-nolibc/powerpc64-linux/bin/powerpc64-linux- /kisskb/src/arch/powerpc/kernel/head_32.S: Assembler messages: /kisskb/src/arch/powerpc/kernel/head_32.S:994: Warning: invalid register expression INFO: Uncompressed kernel (size 0x5638a4) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x600000) INFO: Uncompressed kernel (size 0x553260) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x600000) Image Name: Linux-4.20.0-rc2-g7f92891778df Created: Fri Dec 21 23:05:18 2018 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2803080 Bytes = 2737.38 KiB = 2.67 MiB Load Address: 00600000 Entry Point: 00600894 Image Name: Linux-4.20.0-rc2-g7f92891778df Created: Fri Dec 21 23:05:18 2018 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 2771215 Bytes = 2706.26 KiB = 2.64 MiB Load Address: 00000000 Entry Point: 00000000 Completed OK # rm -rf /kisskb/build/powerpc-next_mpc8272_ads_defconfig_powerpc-gcc5 # Build took: 0:00:33.910210