本文将mark下KVM coalesced MMIO/PIO相关notes。

背景

https://elixir.bootlin.com/linux/v6.0/source/virt/kvm

在KVM源码中经常看到coalesced MMIO,本文将一探究竟。当然本文只侧重于high level层面,不涉及源码中的细节。

Motivation

When kernel has to send MMIO writes to userspace, it stores them in memory until it has to pass the hand to userspace for another reason. This avoids to have too many context switches on operations that can wait.

Coalesced I/O is used if one or more write accesses to a hardware register can be deferred until a read or a write to another hardware register on the same device. This last access will cause a vmexit and userspace will process accesses from the ring buffer before emulating it. That will avoid exiting to userspace on repeated writes.

  • Without KVM coalesced MMIO/PIO

    • 可deferred的MMIO/PIO write导致Non-root mode VM Exit -> KVM -> QEMU(处理可deferred的MMIO/PIO write) -> KVM -> 返回到Non-root mode
  • With KVM coalesced MMIO/PIO

    • 可deferred的MMIO/PIO write导致Non-root mode VM Exit -> KVM(将可deferred的MMIO/PIO write记录到ring buffer) -> 返回到Non-root mode
    • 不可deferred的MMIO/PIO导致Non-root mode VM Exit -> KVM -> QEMU(先处理完ring buffer中记录的可deferred的MMIO/PIO write,再处理这次不可deferred的MMIO/PIO) -> KVM -> 返回到Non-root mode

从上述的对比可知,KVM coalesced MMIO/PIO可以在KVM中记录可deferred的MMIO/PIO write,不用退出的qemu来处理,可以将这些请求dealy到下一次不可deferred的MMIO/PIO来处理。That will avoid exiting to userspace on repeated writes.

API

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
4.116 KVM_(UN)REGISTER_COALESCED_MMIO

Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio)
KVM_CAP_COALESCED_PIO (for coalesced pio)
Architectures: all
Type: vm ioctl
Parameters: struct kvm_coalesced_mmio_zone
Returns: 0 on success, < 0 on error

Coalesced I/O is a performance optimization that defers hardware
register write emulation so that userspace exits are avoided. It is
typically used to reduce the overhead of emulating frequently accessed
hardware registers.

When a hardware register is configured for coalesced I/O, write accesses
do not exit to userspace and their value is recorded in a ring buffer
that is shared between kernel and userspace.

Coalesced I/O is used if one or more write accesses to a hardware
register can be deferred until a read or a write to another hardware
register on the same device. This last access will cause a vmexit and
userspace will process accesses from the ring buffer before emulating
it. That will avoid exiting to userspace on repeated writes.

Coalesced pio is based on coalesced mmio. There is little difference
between coalesced mmio and pio except that coalesced pio records accesses
to I/O ports.

QEMU example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// https://gitlab.com/qemu-project/qemu/-/blob/stable-6.0/hw/net/e1000.c#L1631-L1647
static void
e1000_mmio_setup(E1000State *d)
{
int i;
const uint32_t excluded_regs[] = {
E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
};

memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
"e1000-mmio", PNPMMIO_SIZE);
memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
excluded_regs[i+1] - excluded_regs[i] - 4);
memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", IOPORT_SIZE);
}

本case的memory_region_add_coalescing注册了coalesced MMIO region。对于细节,需要结合e1000的spec与qemu、kvm相关代码了,在此不再描述。


参考资料:

  1. kvm: Batch writes to MMIO
  2. Documentation/virtual/kvm/api.txt