Skip to content

Commit 49cc0c4

Browse files
committed
Merge branch 'sun4v-64bit-DMA'
Tushar Dave says: ==================== sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar v2->v3: - Patch #5 addresses comment by Joe Perches. -- use %s, __func__ instead of embedding the function name. v1->v2: - Patch #2 addresses comments by Dave M. -- use page allocator to allocate IOTSB. -- use true/false with boolean variables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2 parents 87a349f + d30a6b8 commit 49cc0c4

8 files changed

Lines changed: 849 additions & 60 deletions

File tree

arch/sparc/Kconfig

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,14 @@ config ARCH_DEFCONFIG
8989
config ARCH_PROC_KCORE_TEXT
9090
def_bool y
9191

92+
config ARCH_ATU
93+
bool
94+
default y if SPARC64
95+
96+
config ARCH_DMA_ADDR_T_64BIT
97+
bool
98+
default y if ARCH_ATU
99+
92100
config IOMMU_HELPER
93101
bool
94102
default y if SPARC64
@@ -304,6 +312,20 @@ config ARCH_SPARSEMEM_ENABLE
304312
config ARCH_SPARSEMEM_DEFAULT
305313
def_bool y if SPARC64
306314

315+
config FORCE_MAX_ZONEORDER
316+
int "Maximum zone order"
317+
default "13"
318+
help
319+
The kernel memory allocator divides physically contiguous memory
320+
blocks into "zones", where each zone is a power of two number of
321+
pages. This option selects the largest power of two that the kernel
322+
keeps in the memory allocator. If you need to allocate very large
323+
blocks of physically contiguous memory, then you may need to
324+
increase this value.
325+
326+
This config option is actually maximum order plus one. For example,
327+
a value of 13 means that the largest free memory block is 2^12 pages.
328+
307329
source "mm/Kconfig"
308330

309331
if SPARC64

arch/sparc/include/asm/hypervisor.h

Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2335,6 +2335,348 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle,
23352335
*/
23362336
#define HV_FAST_PCI_MSG_SETVALID 0xd3
23372337

2338+
/* PCI IOMMU v2 definitions and services
2339+
*
2340+
* While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
2341+
* definitions and services.
2342+
*
2343+
* CTE Clump Table Entry. First level table entry in the ATU.
2344+
*
2345+
* pci_device_list
2346+
* A 32-bit aligned list of pci_devices.
2347+
*
2348+
* pci_device_listp
2349+
* real address of a pci_device_list. 32-bit aligned.
2350+
*
2351+
* iotte IOMMU translation table entry.
2352+
*
2353+
* iotte_attributes
2354+
* IO Attributes for IOMMU v2 mappings. In addition to
2355+
* read, write IOMMU v2 supports relax ordering
2356+
*
2357+
* io_page_list A 64-bit aligned list of real addresses. Each real
2358+
* address in an io_page_list must be properly aligned
2359+
* to the pagesize of the given IOTSB.
2360+
*
2361+
* io_page_list_p Real address of an io_page_list, 64-bit aligned.
2362+
*
2363+
* IOTSB IO Translation Storage Buffer. An aligned table of
2364+
* IOTTEs. Each IOTSB has a pagesize, table size, and
2365+
* virtual address associated with it that must match
2366+
* a pagesize and table size supported by the un-derlying
2367+
* hardware implementation. The alignment requirements
2368+
* for an IOTSB depend on the pagesize used for that IOTSB.
2369+
* Each IOTTE in an IOTSB maps one pagesize-sized page.
2370+
* The size of the IOTSB dictates how large of a virtual
2371+
* address space the IOTSB is capable of mapping.
2372+
*
2373+
* iotsb_handle An opaque identifier for an IOTSB. A devhandle plus
2374+
* iotsb_handle represents a binding of an IOTSB to a
2375+
* PCI root complex.
2376+
*
2377+
* iotsb_index Zero-based IOTTE number within an IOTSB.
2378+
*/
2379+
2380+
/* The index_count argument consists of two fields:
2381+
* bits 63:48 #iottes and bits 47:0 iotsb_index
2382+
*/
2383+
#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
2384+
(((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
2385+
2386+
/* pci_iotsb_conf()
2387+
* TRAP: HV_FAST_TRAP
2388+
* FUNCTION: HV_FAST_PCI_IOTSB_CONF
2389+
* ARG0: devhandle
2390+
* ARG1: r_addr
2391+
* ARG2: size
2392+
* ARG3: pagesize
2393+
* ARG4: iova
2394+
* RET0: status
2395+
* RET1: iotsb_handle
2396+
* ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize
2397+
* EBADALIGN r_addr is not properly aligned
2398+
* ENORADDR r_addr is not a valid real address
2399+
* ETOOMANY No further IOTSBs may be configured
2400+
* EBUSY Duplicate devhandle, raddir, iova combination
2401+
*
2402+
* Create an IOTSB suitable for the PCI root complex identified by devhandle,
2403+
* for the DMA virtual address defined by the argument iova.
2404+
*
2405+
* r_addr is the properly aligned base address of the IOTSB and size is the
2406+
* IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
2407+
* being configured. If it contains any values other than zeros then the
2408+
* behavior is undefined.
2409+
*
2410+
* pagesize is the size of each page in the IOTSB. Note that the combination of
2411+
* size (table size) and pagesize must be valid.
2412+
*
2413+
* virt is the DMA virtual address this IOTSB will map.
2414+
*
2415+
* If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
2416+
* Once configured, privileged access to the IOTSB memory is prohibited and
2417+
* creates undefined behavior. The only permitted access is indirect via these
2418+
* services.
2419+
*/
2420+
#define HV_FAST_PCI_IOTSB_CONF 0x190
2421+
2422+
/* pci_iotsb_info()
2423+
* TRAP: HV_FAST_TRAP
2424+
* FUNCTION: HV_FAST_PCI_IOTSB_INFO
2425+
* ARG0: devhandle
2426+
* ARG1: iotsb_handle
2427+
* RET0: status
2428+
* RET1: r_addr
2429+
* RET2: size
2430+
* RET3: pagesize
2431+
* RET4: iova
2432+
* RET5: #bound
2433+
* ERRORS: EINVAL Invalid devhandle or iotsb_handle
2434+
*
2435+
* This service returns configuration information about an IOTSB previously
2436+
* created with pci_iotsb_conf.
2437+
*
2438+
* iotsb_handle value 0 may be used with this service to inquire about the
2439+
* legacy IOTSB that may or may not exist. If the service succeeds, the return
2440+
* values describe the legacy IOTSB and I/O virtual addresses mapped by that
2441+
* table. However, the table base address r_addr may contain the value -1 which
2442+
* indicates a memory range that cannot be accessed or be reclaimed.
2443+
*
2444+
* The return value #bound contains the number of PCI devices that iotsb_handle
2445+
* is currently bound to.
2446+
*/
2447+
#define HV_FAST_PCI_IOTSB_INFO 0x191
2448+
2449+
/* pci_iotsb_unconf()
2450+
* TRAP: HV_FAST_TRAP
2451+
* FUNCTION: HV_FAST_PCI_IOTSB_UNCONF
2452+
* ARG0: devhandle
2453+
* ARG1: iotsb_handle
2454+
* RET0: status
2455+
* ERRORS: EINVAL Invalid devhandle or iotsb_handle
2456+
* EBUSY The IOTSB is bound and may not be unconfigured
2457+
*
2458+
* This service unconfigures the IOTSB identified by the devhandle and
2459+
* iotsb_handle arguments, previously created with pci_iotsb_conf.
2460+
* The IOTSB must not be currently bound to any device or the service will fail
2461+
*
2462+
* If the call succeeds, iotsb_handle is no longer valid.
2463+
*/
2464+
#define HV_FAST_PCI_IOTSB_UNCONF 0x192
2465+
2466+
/* pci_iotsb_bind()
2467+
* TRAP: HV_FAST_TRAP
2468+
* FUNCTION: HV_FAST_PCI_IOTSB_BIND
2469+
* ARG0: devhandle
2470+
* ARG1: iotsb_handle
2471+
* ARG2: pci_device
2472+
* RET0: status
2473+
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or pci_device
2474+
* EBUSY A PCI function is already bound to an IOTSB at the same
2475+
* address range as specified by devhandle, iotsb_handle.
2476+
*
2477+
* This service binds the PCI function specified by the argument pci_device to
2478+
* the IOTSB specified by the arguments devhandle and iotsb_handle.
2479+
*
2480+
* The PCI device function is bound to the specified IOTSB with the IOVA range
2481+
* specified when the IOTSB was configured via pci_iotsb_conf. If the function
2482+
* is already bound then it is unbound first.
2483+
*/
2484+
#define HV_FAST_PCI_IOTSB_BIND 0x193
2485+
2486+
/* pci_iotsb_unbind()
2487+
* TRAP: HV_FAST_TRAP
2488+
* FUNCTION: HV_FAST_PCI_IOTSB_UNBIND
2489+
* ARG0: devhandle
2490+
* ARG1: iotsb_handle
2491+
* ARG2: pci_device
2492+
* RET0: status
2493+
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or pci_device
2494+
* ENOMAP The PCI function was not bound to the specified IOTSB
2495+
*
2496+
* This service unbinds the PCI device specified by the argument pci_device
2497+
* from the IOTSB identified * by the arguments devhandle and iotsb_handle.
2498+
*
2499+
* If the PCI device is not bound to the specified IOTSB then this service will
2500+
* fail with status ENOMAP
2501+
*/
2502+
#define HV_FAST_PCI_IOTSB_UNBIND 0x194
2503+
2504+
/* pci_iotsb_get_binding()
2505+
* TRAP: HV_FAST_TRAP
2506+
* FUNCTION: HV_FAST_PCI_IOTSB_GET_BINDING
2507+
* ARG0: devhandle
2508+
* ARG1: iotsb_handle
2509+
* ARG2: iova
2510+
* RET0: status
2511+
* RET1: iotsb_handle
2512+
* ERRORS: EINVAL Invalid devhandle, pci_device, or iova
2513+
* ENOMAP The PCI function is not bound to an IOTSB at iova
2514+
*
2515+
* This service returns the IOTSB binding, iotsb_handle, for a given pci_device
2516+
* and DMA virtual address, iova.
2517+
*
2518+
* iova must be the base address of a DMA virtual address range as defined by
2519+
* the iommu-address-ranges property in the root complex device node defined
2520+
* by the argument devhandle.
2521+
*/
2522+
#define HV_FAST_PCI_IOTSB_GET_BINDING 0x195
2523+
2524+
/* pci_iotsb_map()
2525+
* TRAP: HV_FAST_TRAP
2526+
* FUNCTION: HV_FAST_PCI_IOTSB_MAP
2527+
* ARG0: devhandle
2528+
* ARG1: iotsb_handle
2529+
* ARG2: index_count
2530+
* ARG3: iotte_attributes
2531+
* ARG4: io_page_list_p
2532+
* RET0: status
2533+
* RET1: #mapped
2534+
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, #iottes,
2535+
* iotsb_index or iotte_attributes
2536+
* EBADALIGN Improperly aligned io_page_list_p or I/O page
2537+
* address in the I/O page list.
2538+
* ENORADDR Invalid io_page_list_p or I/O page address in
2539+
* the I/O page list.
2540+
*
2541+
* This service creates and flushes mappings in the IOTSB defined by the
2542+
* arguments devhandle, iotsb.
2543+
*
2544+
* The index_count argument consists of two fields. Bits 63:48 contain #iotte
2545+
* and bits 47:0 contain iotsb_index
2546+
*
2547+
* The first mapping is created in the IOTSB index specified by iotsb_index.
2548+
* Subsequent mappings are created at iotsb_index+1 and so on.
2549+
*
2550+
* The attributes of each mapping are defined by the argument iotte_attributes.
2551+
*
2552+
* The io_page_list_p specifies the real address of the 64-bit-aligned list of
2553+
* #iottes I/O page addresses. Each page address must be a properly aligned
2554+
* real address of a page to be mapped in the IOTSB. The first entry in the I/O
2555+
* page list contains the real address of the first page, the 2nd entry for the
2556+
* 2nd page, and so on.
2557+
*
2558+
* #iottes must be greater than zero.
2559+
*
2560+
* The return value #mapped is the actual number of mappings created, which may
2561+
* be less than or equal to the argument #iottes. If the function returns
2562+
* successfully with a #mapped value less than the requested #iottes then the
2563+
* caller should continue to invoke the service with updated iotsb_index,
2564+
* #iottes, and io_page_list_p arguments until all pages are mapped.
2565+
*
2566+
* This service must not be used to demap a mapping. In other words, all
2567+
* mappings must be valid and have one or both of the RW attribute bits set.
2568+
*
2569+
* Note:
2570+
* It is implementation-defined whether I/O page real address validity checking
2571+
* is done at time mappings are established or deferred until they are
2572+
* accessed.
2573+
*/
2574+
#define HV_FAST_PCI_IOTSB_MAP 0x196
2575+
2576+
/* pci_iotsb_map_one()
2577+
* TRAP: HV_FAST_TRAP
2578+
* FUNCTION: HV_FAST_PCI_IOTSB_MAP_ONE
2579+
* ARG0: devhandle
2580+
* ARG1: iotsb_handle
2581+
* ARG2: iotsb_index
2582+
* ARG3: iotte_attributes
2583+
* ARG4: r_addr
2584+
* RET0: status
2585+
* ERRORS: EINVAL Invalid devhandle,iotsb_handle, iotsb_index
2586+
* or iotte_attributes
2587+
* EBADALIGN Improperly aligned r_addr
2588+
* ENORADDR Invalid r_addr
2589+
*
2590+
* This service creates and flushes a single mapping in the IOTSB defined by the
2591+
* arguments devhandle, iotsb.
2592+
*
2593+
* The mapping for the page at r_addr is created at the IOTSB index specified by
2594+
* iotsb_index with the attributes iotte_attributes.
2595+
*
2596+
* This service must not be used to demap a mapping. In other words, the mapping
2597+
* must be valid and have one or both of the RW attribute bits set.
2598+
*
2599+
* Note:
2600+
* It is implementation-defined whether I/O page real address validity checking
2601+
* is done at time mappings are established or deferred until they are
2602+
* accessed.
2603+
*/
2604+
#define HV_FAST_PCI_IOTSB_MAP_ONE 0x197
2605+
2606+
/* pci_iotsb_demap()
2607+
* TRAP: HV_FAST_TRAP
2608+
* FUNCTION: HV_FAST_PCI_IOTSB_DEMAP
2609+
* ARG0: devhandle
2610+
* ARG1: iotsb_handle
2611+
* ARG2: iotsb_index
2612+
* ARG3: #iottes
2613+
* RET0: status
2614+
* RET1: #unmapped
2615+
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, iotsb_index or #iottes
2616+
*
2617+
* This service unmaps and flushes up to #iottes mappings starting at index
2618+
* iotsb_index from the IOTSB defined by the arguments devhandle, iotsb.
2619+
*
2620+
* #iottes must be greater than zero.
2621+
*
2622+
* The actual number of IOTTEs unmapped is returned in #unmapped and may be less
2623+
* than or equal to the requested number of IOTTEs, #iottes.
2624+
*
2625+
* If #unmapped is less than #iottes, the caller should continue to invoke this
2626+
* service with updated iotsb_index and #iottes arguments until all pages are
2627+
* demapped.
2628+
*/
2629+
#define HV_FAST_PCI_IOTSB_DEMAP 0x198
2630+
2631+
/* pci_iotsb_getmap()
2632+
* TRAP: HV_FAST_TRAP
2633+
* FUNCTION: HV_FAST_PCI_IOTSB_GETMAP
2634+
* ARG0: devhandle
2635+
* ARG1: iotsb_handle
2636+
* ARG2: iotsb_index
2637+
* RET0: status
2638+
* RET1: r_addr
2639+
* RET2: iotte_attributes
2640+
* ERRORS: EINVAL Invalid devhandle, iotsb_handle, or iotsb_index
2641+
* ENOMAP No mapping was found
2642+
*
2643+
* This service returns the mapping specified by index iotsb_index from the
2644+
* IOTSB defined by the arguments devhandle, iotsb.
2645+
*
2646+
* Upon success, the real address of the mapping shall be returned in
2647+
* r_addr and thethe IOTTE mapping attributes shall be returned in
2648+
* iotte_attributes.
2649+
*
2650+
* The return value iotte_attributes may not include optional features used in
2651+
* the call to create the mapping.
2652+
*/
2653+
#define HV_FAST_PCI_IOTSB_GETMAP 0x199
2654+
2655+
/* pci_iotsb_sync_mappings()
2656+
* TRAP: HV_FAST_TRAP
2657+
* FUNCTION: HV_FAST_PCI_IOTSB_SYNC_MAPPINGS
2658+
* ARG0: devhandle
2659+
* ARG1: iotsb_handle
2660+
* ARG2: iotsb_index
2661+
* ARG3: #iottes
2662+
* RET0: status
2663+
* RET1: #synced
2664+
* ERROS: EINVAL Invalid devhandle, iotsb_handle, iotsb_index, or #iottes
2665+
*
2666+
* This service synchronizes #iottes mappings starting at index iotsb_index in
2667+
* the IOTSB defined by the arguments devhandle, iotsb.
2668+
*
2669+
* #iottes must be greater than zero.
2670+
*
2671+
* The actual number of IOTTEs synchronized is returned in #synced, which may
2672+
* be less than or equal to the requested number, #iottes.
2673+
*
2674+
* Upon a successful return, #synced is less than #iottes, the caller should
2675+
* continue to invoke this service with updated iotsb_index and #iottes
2676+
* arguments until all pages are synchronized.
2677+
*/
2678+
#define HV_FAST_PCI_IOTSB_SYNC_MAPPINGS 0x19a
2679+
23382680
/* Logical Domain Channel services. */
23392681

23402682
#define LDC_CHANNEL_DOWN 0
@@ -2993,6 +3335,7 @@ unsigned long sun4v_m7_set_perfreg(unsigned long reg_num,
29933335
#define HV_GRP_SDIO 0x0108
29943336
#define HV_GRP_SDIO_ERR 0x0109
29953337
#define HV_GRP_REBOOT_DATA 0x0110
3338+
#define HV_GRP_ATU 0x0111
29963339
#define HV_GRP_M7_PERF 0x0114
29973340
#define HV_GRP_NIAG_PERF 0x0200
29983341
#define HV_GRP_FIRE_PERF 0x0201

0 commit comments

Comments
 (0)