本文将mark下RDMA ODP(On-Demand-Paging) feature相关notes。

Introduction

On-Demand-Paging (ODP) is a technique to alleviate much of the shortcomings of memory registration. Applications no longer need to pin down the underlying physical pages of the address space, and track the validity of the mappings. Rather, the HCA requests the latest translations from the OS when pages are not present, and the OS invalidates translations which are no longer valid due to either non-present pages or mapping changes.

Synchronizing between CPU and RNIC page tables

Faulting

When an RDMA request accesses data on invalid virtual pages, (1a) the RNIC stalls the QP and raises an RNIC page fault interrupt. (1b) The driver requests the OS kernel for virtual-to-physical mappings via hmm_range_fault. The OS kernel triggers CPU page faults on these virtual pages and fills the CPU page table if necessary. (1c) The driver updates the mappings on the RNIC page table and (1d) resumes the QP.

Invalidation

When the OS kernel tries to unmap virtual pages in scenarios like swapping out or page migration, (2a)it notifies the RNIC driver to invalidate virtual pages via mmu_interval_notifier. (2b) The RNIC driver erases the virtual-to-physical mapping from the RNIC page table. (2c) The driver notifies the kernel that the physical pages are no longer used by the RNIC. Then, the OS kernel modifies the CPU page table and reuses the physical pages.

ODP MR(Memory Region) relies on faulting and invalidation flows to synchronize CPU and RNIC page tables.

Advising

An application can proactively request the RNIC driver to populate a range in the RNIC page table. The RNIC driver completes advising by steps (3a) – (3b), which are identical to steps (1b) – (1c).

1
2
3
4
5
6
7
8
9
10
11
12
13
enum ib_odp_general_cap_bits {
IB_ODP_SUPPORT = 1 << 0,
IB_ODP_SUPPORT_IMPLICIT = 1 << 1,
};

enum ib_odp_transport_cap_bits {
IB_ODP_SUPPORT_SEND = 1 << 0,
IB_ODP_SUPPORT_RECV = 1 << 1,
IB_ODP_SUPPORT_WRITE = 1 << 2,
IB_ODP_SUPPORT_READ = 1 << 3,
IB_ODP_SUPPORT_ATOMIC = 1 << 4,
IB_ODP_SUPPORT_SRQ_RECV = 1 << 5,
};

参考资料:

  1. Optimized Memory Access
  2. TeRM: Extending RDMA-Attached Memory with SSD(FAST’24)
  3. Mellanox OFED for Linux User Manual
  4. [PATCH v3 00/17] On demand paging
  5. [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  6. RDMA - ODP按需分页设计原理-优点-源码浅析