本文将介绍VT-x中的Page-Modification Logging(PML)技术。

1. Motivation

在没有PML前,VMM要监控GPA的修改时,需要将EPT的页面结构设置为not-present或者read-only,这样会触发许多EPT violations,开销非常大。

PML建立在CPU对EPT中的accessed与dirty标志位支持上。
当启用PML时,对EPT中设置了dirty标志位的写操作都会产生一条in-memory记录,报告写操作的GPA,当记录写满时,触发一次VM Exit,然后VMM就可以监控被修改的页面。

2. Introduction

PML is a new feature on Intel’s Boardwell server platfrom targeted to reduce overhead of dirty logging mechanism.

The specification can be found at: Page Modification Logging for Virtual Machine Monitor White Paper

Currently, dirty logging is done by write protection, which write protects guest memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works fine, except with overhead of additional write fault for logging each dirty GFN. The overhead can be large if the write operations from guest is intensive.

PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA automatically to a 4K PML memory buffer when CPU changes EPT table’s D-bit from 0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU decreases PML index after logging one GPA, and eventually a PML buffer full VMEXIT happens when PML buffer is fully logged.

With PML, we don’t have to use write protection so the intensive write fault EPT violation can be avoided, with an additional PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in dirty logging mode, and therefore more CPU cycles can be allocated to guest, so it’s expected benchmarks in guest will have better performance comparing to non-PML.

3. KVM design

3.1 Enable/Disable PML

PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need to enable/disable PML for all vcpus of guest. A dedicated 4K page will be allocated for each vcpu when PML is enabled for that vcpu.

Currently, we choose to always enable PML for guest, which means we enables PML when creating VCPU, and never disable it during guest’s life time. This avoids the complicated logic to enable PML by demand when guest is running. And to eliminate potential unnecessary GPA logging in non-dirty logging mode, we set D-bit manually for the slots with dirty logging disabled(KVM: MMU: Explicitly set D-bit for writable spte.).

3.2 Flush PML buffer

When userspace querys dirty_bitmap, it’s possible that there are GPAs logged in vcpu’s PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this case, we’d better to manually flush PML buffer for all vcpus and update the dirty GPAs to dirty_bitmap.

We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap more updated, and also makes logic of flushing PML buffer for all vcpus easier– we only need to kick all vcpus out of guest and PML buffer for each vcpu will be flushed automatically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/*
* Flush all vcpus' PML buffer and update logged GPAs to dirty_bitmap.
* Called before reporting dirty_bitmap to userspace.
*/
static void kvm_flush_pml_buffers(struct kvm *kvm)
{
int i;
struct kvm_vcpu *vcpu;
/*
* We only need to kick vcpu out of guest mode here, as PML buffer
* is flushed at beginning of all VMEXITs, and it's obvious that only
* vcpus running in guest are possible to have unflushed GPAs in PML
* buffer.
*/
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vcpu_kick(vcpu);
}

参考资料:

  1. Page Modification Logging for Virtual Machine Monitor White Paper
  2. KVM: VMX: Page Modification Logging (PML) support
  3. Intel VT 页面修改记录(PML)
  4. Intel Page Modification Logging, a hardware virtualization feature: study and improvement for virtual machine working set estimation