|author||Linus Torvalds <firstname.lastname@example.org>||2018-10-25 06:50:48 -0700|
|committer||Linus Torvalds <email@example.com>||2018-10-25 06:50:48 -0700|
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
Diffstat (limited to 'Documentation/driver-api')
|-rw-r--r--||Documentation/driver-api/pci/pci.rst (renamed from Documentation/driver-api/pci.rst)||0|
4 files changed, 168 insertions, 1 deletions
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 1f0cdba2b2bf..909f991b4c0d 100644
@@ -30,7 +30,7 @@ available subsections can be seen below.
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
@@ -0,0 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+The Linux PCI driver implementer's API guide
+.. class:: toc-title
+ Table of contents
+ :maxdepth: 2
+.. only:: subproject and html
+ * :ref:`genindex`
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
@@ -0,0 +1,145 @@
+.. SPDX-License-Identifier: GPL-2.0
+PCI Peer-to-Peer DMA Support
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transactions within the
+hierarchy will be routable, but it does not require routing
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+Driver Writer's Guide
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+* Provider - A driver which provides or publishes P2P resources like
+ memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+ DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+ clients and providers.
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+For example, in the NVMe Target Copy Offload implementation:
+* The NVMe PCI driver is both a client, provider and orchestrator
+ in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+ resource (provider), it accepts P2P memory pages as buffers in requests
+ to be used directly (client) and it can also make use of the CMB as
+ submission queue entries (orchastrator).
+* The RDMA driver is a client in this arrangement so that an RNIC
+ can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+ to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately. It is important to
+ensure that struct pages that back P2P memory stay out of code that
+does not have support for them as other code may treat the pages as
+regular memory which may not be appropriate.
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including the namespace
+block device and the RNIC in use. If the orchestrator has access to
+a specific P2P provider to use it may check compatibility using
+:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
+that's compatible with all clients using :c:func:`pci_p2pmem_find()`.
+If more than one provider is supported, the one nearest to all the clients will
+be chosen first. If more than one provider is an equal distance away, the
+one returned will be chosen at random (it is not an arbitrary but
+truely random). This function returns the PCI device to use for the provider
+with a reference taken and therefore when it's no longer needed it should be
+returned with pci_dev_put().
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+Struct Page Caveats
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+P2P DMA Support Library
+.. kernel-doc:: drivers/pci/p2pdma.c