本文将mark下Mellanox的SF(Scalable Function/Sub Function)技术。

Introduction

Scalable functions (SFs), or sub-functions, are very similar to virtual functions (VFs) which are part of a Single Root I/O Virtualization (SR-IOV) solution. I/O virtualization is one of the key features used in data centers today. It improves the performance of enterprise servers by giving virtual machines direct access to hardware I/O devices. The SR-IOV specification allows one PCI Express (PCIe) device to present itself to the host as multiple distinct “virtual” devices. This is done with a new PCIe capability structure added to a traditional PCIe function (i.e., a physical function or PF).

The PF provides control over the creation and allocation of new VFs. VFs share the device’s underlying hardware and PCIe. A key feature of the SR-IOV specification is that VFs are very lightweight so that many of them can be implemented in a single device.

To utilize the capabilities of VF in the BlueField, SFs are used. SFs allow support for a larger number of functions than VFs, and more importantly, they allow running multiple services concurrently on the DPU.

An SF is a lightweight function which has a parent PCIe function on which it is deployed. The SF, therefore, has access to the capabilities and resources of its parent PCIe function and has its own function capabilities and its own resources. This means that an SF would also have its own dedicated queues (i.e., txq, rxq).

SFs co-exist with PCIe SR-IOV virtual functions (on the host) but also do not require enabling PCIe SR-IOV.

SFs support E-Switch representation offload like existing PF and VF representors. An SF shares PCIe-level resources with other SFs and/or with its parent PCIe function.

Internals

  • When scalable function is RDMA capable, it has its own QP1, GID table and rdma resources neither shared nor stolen from the parent PCI function.

  • A scalable function has dedicated window in PCI BAR space that is not shared with the other scalable functions or parent PCI function. This ensures that all class devices of the scalable function accesses only assigned PCI BAR space.

  • A scalable function supports eswitch representation through which it supports tc offloads. User must configure eswitch to send/receive packets from/to scalable function port.

  • Scalable functions share PCI level resources such as PCI MSI-X IRQs with their other scalable function and/or with its parent PCI function.

SFs vs VFs

  1. SFs are deployed in unit of one unlike SR-IOV VFs which are enabled all together. When a new container is spawned, at that point needed SF can be created and deployed.
  2. SFs do not have to implement full PCI config space, reset, registers. This makes the device light weight.
  3. SFs share MSI-X vectors with owner PCI PF and other peer SFs. This reduces the demand on total number of vectors in hardware and platform interrupt controller.

总结

以下内容只是个人的理解与猜测:

  • SF与SR-IOV是正交的
    • 不支持SR-IOV,也是可以支持SF的
    • 支持SR-IOV的话,可以在VF中支持SF
  • SF类比于Intel的SIOV技术,但是当前SF只支持容器,还不支持vm
  • 当前SF应该不支持PASID,SF的DMA视角如下:
    • 对于netdevice的queue,不同容器(进程)的页表是不一样的,但是只有一个parent PCI function,因此只有一个BDF,如果使用IOMMU页表的话,无法保证不同进程使用不同的IOMMU页表,所以在容器场景下,不会使用IOMMU;因此对于DMA内存地址,queue中entry使用的是HPA,而非HVA
    • 对于RDMA的queue,由于MTT的支持,所以queue中entry使用的是HVA,MTT会完成HVA到HPA的翻译
  • 如果支持PASID的话,SF也是可以支持VM的

参考资料:

  1. https://github.com/Mellanox/scalablefunctions/wiki
  2. https://github.com/Mellanox/scalablefunctions
  3. https://docs.nvidia.com/doca/sdk/scalable-functions/index.html
  4. net/mlx5: E-switch, Move devlink port close to eswitch port