本文将mark下Intel Data Streaming Accelerator(DSA)相关notes。想要了解细节的话,还是需要去查询spec。

Intel DSA is a high-performance data copy and transformation accelerator that will be integrated in future Intel® processors, targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications.

The goal is to provide higher overall system performance for data mover and transformation operations, while freeing up CPU cycles for higher level functions. Intel DSA enables high performance data mover capability to/from volatile memory, persistent memory, memory-mapped I/O, and through a Non-Transparent Bridge (NTB) device to/from remote volatile and persistent memory on another node in a cluster. Enumeration and configuration is done with a PCI Express compatible programming interface to the Operating System (OS) and can be controlled through a device driver.

Besides the basic data mover operations, Intel DSA supports a set of transformation operations on memory. For example:

  • Generate and test CRC checksum, or Data Integrity Field (DIF) to support storage and networking applications.
  • Memory Compare and delta generate/merge to support VM migration, VM Fast check-pointing and software managed memory deduplication usages.

比如用DSA做memcpy的话,与CPU相比,优势在哪里呢?

  1. 可以释放CPU资源,将memcpy的功能offload到DSA上
  2. DSA支持并行批量化处理,当memcpy大量数据的话,DSA可以并行处理,因此可以提高效。如果是少量的memcpy的话,用DSA的效率就不如CPU的了。

Figure 3-1 illustrates the high-level blocks within the device at a conceptual level. The I/O fabric interface is used for receiving downstream work requests from clients and for upstream read, write, and address translation operations.

Each device contains the following basic components:

  • Work Queues (WQ) - On device storage to queue descriptors to the device. Requests are added to a WQ by using new instructions to write to the memory mapped “portal” associated with each WQ.
  • Groups - Abstract container that can include one or more engines and work queues.
  • Engines - Pulls work submitted to the WQs and process them.

Two types of WQs are supported:

  • Dedicated WQ (DWQ) - A single client owns this exclusively and can submit work to it.
  • Shared WQ (SWQ) - Multiple clients can submit work to the SWQ.

A client using DWQ submits work descriptors using the MOVDIR64B instruction. This is a posted write, so the client must track the number of descriptors submitted to ensure that it does not exceed the configured work queue length as any additional descriptors would be dropped.

Clients using shared work queues submit work descriptors using either ENQCMDS (from supervisor mode) or ENQCMD (from user mode). These instructions indicate via the EFLAGS.ZF bit whether the request was accepted.


参考资料:

  1. INTRODUCING THE INTEL® DATA STREAMING ACCELERATOR
  2. DSA spec
  3. intel DSA spec 解读