本文将介绍:当guest执行inout指令时,QEMU与KVM源码中的实现细节。

1. PIO background

Intel的I/O指令使得处理器可以访问I/O端口,以便从外设输入数据,或者向外设发送数据。这些指令有一个指定I/O空间端口地址的操作数。有两类的I/O指令:

  1. 在寄存器指定的地址传送一个数据(字节、字、双字)。
  2. 传送指定内存中的一串数据(字节串、字串、双字串)。这些被称作为“串 I/O指令”或者说“块I/O指令”。

IN/OUT INS/OUTS指令

2. PIO configuration in VMCS

SDM中的description如下:

KVM在Primary Processor-Based VM-Execution Controls 设置了Unconditional I/O exiting位,并且没有设置Use I/O bitmaps 位。因此,一旦guest执行了PIO指令,一定会发生VM Exit。

详情请阅读patch KVM: VMX: drop I/O permission bitmaps

3. Warm-up

3.1 VM Exit Qualification for I/O Instructions

当guest执行PIO指令时,触发vmx_handle_exit,根据EXIT_REASON_IO_INSTRUCTION执行handle_io函数。

handle_io会解析Exit Qualification,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static int handle_io(struct kvm_vcpu *vcpu)
{
unsigned long exit_qualification;
int size, in, string;
unsigned port;

exit_qualification = vmx_get_exit_qual(vcpu);
string = (exit_qualification & 16) != 0;
...
port = exit_qualification >> 16;
size = (exit_qualification & 7) + 1;
in = (exit_qualification & 8) != 0;
...
return kvm_fast_pio(vcpu, size, port, in);
}

3.2 misc

  • 本文只讨论guest执行inout指令时的情况,guest执行串 I/O指令这一情况不做介绍;
  • 本文不考虑KVM模拟I/O指令的情况,即假设kernel_pio的返回值不为0。

4. PIO中out的处理流程

KVM函数调用链如下:

1
2
3
4
kvm_fast_pio
kvm_fast_pio_out
emulator_pio_out_emulated
emulator_pio_in_out
1
2
3
4
5
6
7
8
9
10
11
int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size,
unsigned short port)
{
...
unsigned long val = kvm_rax_read(vcpu); //获取vcpu中rax寄存器的值
int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt,
size, port, &val, 1);
...
vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu);//获取guest中rip寄存器的值
vcpu->arch.complete_userspace_io = complete_fast_pio_out;
}

complete_userspace_io的细节后面再描述。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
int size, unsigned short port,
const void *val, unsigned int count)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);

memcpy(vcpu->arch.pio_data, val, size * count);
return emulator_pio_in_out(vcpu, size, port, (void *)val, count, false);
}

int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
unsigned short port, void *val,
unsigned int count, bool in)
{
vcpu->arch.pio.port = port;
vcpu->arch.pio.in = in;
vcpu->arch.pio.count = count;
vcpu->arch.pio.size = size;

...

vcpu->run->exit_reason = KVM_EXIT_IO;
vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
vcpu->run->io.size = size;
vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
vcpu->run->io.count = count;
vcpu->run->io.port = port;

return 0;
}

可以看到vcpu->run->io.data_offset被设置为4096了,emulator_pio_out_emulated已经把guest向端口写的值拷贝到了vpuc->arch.pio_data中去了。 vcpu->arch.pio_data就在kvm_run后面一个页的位置,这可以从kvm_vcpu_init中看出来。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
{
...
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
vcpu->run = page_address(page);
...
kvm_arch_vcpu_init(vcpu);
...
}

int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
...
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
vcpu->arch.pio_data = page_address(page);
...
}

KVM处理完后,返回到QEMU。此时,QEMU的执行代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int kvm_cpu_exec(CPUState *cpu)
{
...
switch (run->exit_reason) {
case KVM_EXIT_IO:
DPRINTF("handle_io\n");
/* Called outside BQL */
kvm_handle_io(run->io.port, attrs,
(uint8_t *)run + run->io.data_offset,
run->io.direction,
run->io.size,
run->io.count);
ret = 0;
break;
}
...
}

QEMU处理完后,返回到KVM。

1
2
3
4
5
6
7
8
9
10
11
12
13
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
...
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
...
}
...
vcpu_run(vcpu);
...
}

kvm_fast_pio_out已将complete_userspace_io 赋值为complete_fast_pio_out;

1
2
3
4
5
6
int complete_fast_pio_out(struct kvm_vcpu *vcpu)
{
vcpu->arch.pio.count = 0;
...
return kvm_skip_emulated_instruction(vcpu);//主要功能是让guest的RIP跳过一个指令
}

5. PIO中in的处理流程

KVM函数调用链如下:

1
2
3
4
kvm_fast_pio
kvm_fast_pio_in
emulator_pio_in_emulated
emulator_pio_in_out
1
2
3
4
5
6
7
8
9
10
11
int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port)
{
unsigned long val;
...
emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
&val, 1);
...
vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu);
vcpu->arch.complete_userspace_io = complete_fast_pio_in;
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
int size, unsigned short port, void *val,
unsigned int count)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
int ret;

if (vcpu->arch.pio.count)
goto data_avail;

memset(vcpu->arch.pio_data, 0, size * count);

ret = emulator_pio_in_out(vcpu, size, port, val, count, true);
if (ret) {
data_avail:
memcpy(val, vcpu->arch.pio_data, size * count);
vcpu->arch.pio.count = 0;
return 1;
}

return 0;
}

emulator_pio_in_emulated中,由于vcpu->arch.pio.count此时还没有数据(需要QEMU提供),所以会执行 emulator_pio_in_out,之前已经看过这个函数了,就是设置kvm_run的相关数据,然后由QEMU来填充。

回到QEMU后,QEMU会往kvm_run填入数据。

回到KVM后,kvm_arch_vcpu_ioctl_run会回调complete_fast_pio_in函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int complete_fast_pio_in(struct kvm_vcpu *vcpu)
{
unsigned long val;

/* We should only ever be called with arch.pio.count equal to 1 */
BUG_ON(vcpu->arch.pio.count != 1);

...

/*
* Since vcpu->arch.pio.count == 1 let emulator_pio_in_emulated perform
* the copy and tracing
*/
emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
vcpu->arch.pio.port, &val, 1);
kvm_rax_write(vcpu, val);//将值写入到vcpu的rax寄存器中

return kvm_skip_emulated_instruction(vcpu);
}

在最终的emulator_pio_in_emulated中,由于这个时候vcpu->arch.pio.count已经有值了,表示数据可用了。

emulator_pio_in_emulated中的执行代码为:

1
2
3
4
5
6
7
8
9
int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
int size, unsigned short port, void *val,
unsigned int count)
{
...
memcpy(val, vcpu->arch.pio_data, size * count);//拷贝QEMU填充的值
vcpu->arch.pio.count = 0;
return 1;
}

参考资料:

  1. QEMU-KVM中的PIO处理
  2. KVM源代码分析5:IO虚拟化之PIO