Intel DSA is a high-performance data copy and transformation accelerator that will be integrated in future Intel® processors, targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications.
The goal is to provide higher overall system performance for data mover and transformation operations, while freeing up CPU cycles for higher level functions. Intel DSA enables high performance data mover capability to/from volatile memory, persistent memory, memory-mapped I/O, and through a Non-Transparent Bridge (NTB) device to/from remote volatile and persistent memory on another node in a cluster. Enumeration and configuration is done with a PCI Express compatible programming interface to the Operating System (OS) and can be controlled through a device driver.
Besides the basic data mover operations, Intel DSA supports a set of transformation operations on memory. For example:
- Generate and test CRC checksum, or Data Integrity Field (DIF) to support storage and networking applications.
- Memory Compare and delta generate/merge to support VM migration, VM Fast check-pointing and software managed memory deduplication usages.
Figure 3-1 illustrates the high-level blocks within the device at a conceptual level. The I/O fabric interface is used for receiving downstream work requests from clients and for upstream read, write, and address translation operations.
Each device contains the following basic components:
- Work Queues (WQ) - On device storage to queue descriptors to the device. Requests are added to a WQ by using new instructions to write to the memory mapped “portal” associated with each WQ.
- Groups - Abstract container that can include one or more engines and work queues.
- Engines - Pulls work submitted to the WQs and process them.
Two types of WQs are supported:
- Dedicated WQ (DWQ) - A single client owns this exclusively and can submit work to it.
- Shared WQ (SWQ) - Multiple clients can submit work to the SWQ.
A client using DWQ submits work descriptors using the MOVDIR64B instruction. This is a posted write, so the client must track the number of descriptors submitted to ensure that it does not exceed the configured work queue length as any additional descriptors would be dropped.
Clients using shared work queues submit work descriptors using either ENQCMDS (from supervisor mode) or ENQCMD (from user mode). These instructions indicate via the EFLAGS.ZF bit whether the request was accepted.