KVM MMIO Emulation
Prerequisite
Overview
For a summary, the following shows the process of MMIO implementation:
- QEMU declares a memory region(but not allocate ram or commit it to kvm)
- Guest first access the MMIO address, cause a EPT violation VM-exit
- KVM construct the EPT page table and marks the page table entry with special mark(110b)
- Later the guest access these MMIO, it will be processed by EPT misconfig VM-exit handler
QEMU part
这里以e1000网卡模拟为例,设备初始化MMIO时候时候注册的MemoryRegion为IO类型(不是RAM类型)。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17static void
e1000_mmio_setup(E1000State *d)
{
int i;
const uint32_t excluded_regs[] = {
E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
};
// 这里注册MMIO,调用memory_region_init_io,mr->ram = false!!!
memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
"e1000-mmio", PNPMMIO_SIZE);
memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
excluded_regs[i+1] - excluded_regs[i] - 4);
memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", IOPORT_SIZE);
}
QEMU uses function memory_region_init_io
to declare a MMIO region. Here we can see the mr->ram
is false so no really memory is allocated.
QEMU调用kvm_set_phys_mem
注册虚拟机的物理内存到KVM相关的数据结构中的时候,会调用memory_region_is_ram
来判断该段物理地址空间是否是RAM设备, 如果不是RAM设备直接return了.
1 | static void kvm_set_phys_mem(KVMMemoryListener *kml, |
KVM part
In vmx_init
, when ept enabled, it calls ept_set_mmio_spte_mask
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16static void ept_set_mmio_spte_mask(void)
{
/*
* EPT Misconfigurations can be generated if the value of bits 2:0
* of an EPT paging-structure entry is 110b (write/execute).
*/
kvm_mmu_set_mmio_spte_mask(VMX_EPT_RWX_MASK,
VMX_EPT_MISCONFIG_WX_VALUE, 0);
}
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value, u64 access_mask)
{
...
shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
...
}
Here set shadow_mmio_mask
.
We the guest access the MMIO address, the VM will exit caused by ept violation and tdp_page_fault
will be called. __direct_map
will be called to construct the EPT page table.
After the long call-chain, the final function mark_mmio_spte
will be called to set the spte with shadow_mmio_mask
which as we already know is set when the vmx initialization.1
2
3
4
5__direct_map
mmu_set_spte
set_spte
set_mmio_spte
mark_mmio_spte
The condition to call mark_mmio_spte
is is_noslot_pfn
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
pfn_t pfn, unsigned access)
{
if (unlikely(is_noslot_pfn(pfn))) {
mark_mmio_spte(kvm, sptep, gfn, access);
return true;
}
return false;
}
static inline bool is_noslot_pfn(pfn_t pfn)
{
return pfn == KVM_PFN_NOSLOT;
}
As we know the QEMU doesn’t commit the MMIO memory region, so pfn is KVM_PFN_NOSLOT
and then mark the spte with shadow_mmio_mask
.
When the guest later access this MMIO page, as it’s ept page table entry is 110b, this will cause the VM exit by EPT misconfig, any how can a page be write/execute but no read permission. In the handler handle_ept_misconfig
it first process the MMIO case, this will dispatch to the QEMU part.
1 | vcpu_run |
1 | x86_emulate_instruction |
最后会调用到ioeventfd_write
,写eventfd给QEMU发送通知事件。
参考资料: