对于NVMe共享盘,是通过NVMe reservation机制来实现的。可以将NVMe reservation机制类比于读写锁。

Overview

reservation的意义在于,提供一个机制,避免多个host访问一个共享的namespace出现冲突。reservation操作的对象是namespace,而不是ssd。

reservation是管理共享者的访问权限;用户可以用共享,但是不用reservation,reservation是一个可选项。

Multi-Attach enabled io2 volumes support NVMe reservations, which is a set of industry-standard storage fencing protocols. These protocols enable you to create and manage reservations that control and coordinate access from multiple instances to a shared volume. Reservations are used by shared storage applications to ensure data consistency.

https://docs.aws.amazon.com/ebs/latest/userguide/nvme-reservations.html

How to identify reservation capability

Reservation Register

The Reservation Register command is used to register, unregister, or replace a reservation key.

Reservation Report

The Reservation Report command returns a Reservation Status data structure to memory that describes the registration and reservation status of a namespace.

Reservation Acquire

The Reservation Acquire command is used to acquire a reservation on a namespace, preempt a reservation held on a namespace, and abort a reservation held on a namespace.

要想获得对Namespace的全部访问权限,Host需要下发Reservation Acquire命令.

一个Namespace同时只能接收一个Reservation,在Host A已经占用namespace时,如果Host B发送Reservation Acquire命令,该命令会被SSD abort。

在NVMe里,有3类角色:

  • Reservation Holder – 当前获得namespace使用权Host
  • Registrant – 所有获得Reservation Key的Host
  • Non-Registrant – 其他Host

Reservation Holder,有6种Reservation模式:

  • Write Exclusive — 除了Reservation Holder,其他Host都不能写该Namespace
  • Exclusive Access — 除了Reservation Holder,其他Host都不能访问该Namespace
  • Write Exclusive – Registrant Only — 除了Reservation Holder和一个Registrant,其他Host都不能写该Namespace
  • Exclusive Access – Registrant Only –除了Reservation Holder和一个Registrant,其他Host都不能访问该Namespace
  • Write Exclusive – All Registrant Only –除了Reservation Holder和Registrants,其他Host不能访问该Namespace
  • Exclusive Access – All Registrant Only –除了Reservation Holder和Registrants,其他Host不能访问该Namespace

在Host A是Reservation Holder的情况下,Host B也有方式把namespace的使用权夺过来,具体方式是下发Reservation Acquire命令, 把Reservation Acquire Action字段设置为001b,同时在Current Reservation Key字段设置正确的Key。只要当前Host A的reservation type不是”Write Exclusive – All Registrants”或者“Exclusive Access – All Registrants”, SSD就会注销Host A,释放其使用权,并将Host B设置为新的Reservation Holder。

Reservation Release

The Reservation Release command is used to release or clear a reservation held on a namespace.

Host A即使被Host B抢走了Namespace的使用权,但是只要保持了Registrant身份,仍然可以下发Reservation Release命令,将Reservation Release Action字段设置为001,同时在Current Reservation Key字段设置正确的Key,就可以将所有注册为该Namespace registrant的Host全部注销掉。

Reservation Notification

在registration preempted, reservation released, and reservation preempted这三种情况下,如果没有禁止掉Reservation Notification的话,设备就会基于Admin queue的AER(Async Event Request)发送Reservation Notification,告诉driver发生了registration preempted, reservation released, or reservation preempted事件。

Reservation Notification AER的相关描述:


参考资料:

  1. NVMe spec 1.4
  2. NVM Feature— Reservation(NVME 学习笔记五)
  3. 蛋蛋读剩的NVMe之一:NVMe Reservation
  4. 通过多重挂载功能将单块云盘挂载至多台ECS实例