本文以QEMU V5.2.0,kernel v5.14的源码,介绍CPUID management,具体细节不会一一介绍,但是会给出函数调用链,读者可以以此为线索,深挖细节。

1. Overview

guest执行cpuid指令肯定会导致VM Exit,然后由KVM处理cpuid指令的模拟。

KVM会执行kvm_emulate_cpuid

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
{
u32 eax, ebx, ecx, edx;

if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0))
return 1;

eax = kvm_rax_read(vcpu); // 读取vcpu的rax内容
ecx = kvm_rcx_read(vcpu); // 读取vcpu的rcx内容
kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
kvm_rax_write(vcpu, eax);
kvm_rbx_write(vcpu, ebx);
kvm_rcx_write(vcpu, ecx);
kvm_rdx_write(vcpu, edx);
return kvm_skip_emulated_instruction(vcpu);
}


bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
u32 *ecx, u32 *edx, bool check_limit)
{
u32 function = *eax, index = *ecx;
struct kvm_cpuid_entry2 *best;
bool entry_found = true;

best = kvm_find_cpuid_entry(vcpu, function, index);

if (!best) {
entry_found = false;
if (!check_limit)
goto out;

best = check_cpuid_limit(vcpu, function, index);
}

out:
if (best) {
*eax = best->eax;
*ebx = best->ebx;
*ecx = best->ecx;
*edx = best->edx;
} else
*eax = *ebx = *ecx = *edx = 0;
trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
return entry_found;
}

比较重要的函数为kvm_find_cpuid_entry,该函数寻找qemu写入到KVM中的CPUID entry(具体细节请参考源码)。

所以比较重要的是这个”entry”,该entry由qemu写入。

大致过程为:

  1. qemu通过ioctl(KVM_GET_SUPPORTED_CPUID)读取到host支持的CPUID列表
  2. qemu通过与运算剔除掉qemu(用户通过 -cpu option来指定)不支持的CPUID
  3. qemu通过ioctl(KVM_SET_CPUID2)将CPUID数据写入到KVM中供guest使用

说白了,就是qemu与KVM协调创建cpuid “entry”,最终,qemu将该“entry”的值写入KVM。接下来,guest执行cpuid指令而发生VM Exit时,KVM就可以cover住,无需qemu的参与。

2. Call chains in QEMU

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
x86_cpu_realizefn
x86_cpu_expand_features
x86_cpu_get_supported_feature_word
kvm_arch_get_supported_cpuid
get_supported_cpuid
try_get_cpuid
KVM_GET_SUPPORTED_CPUID
x86_cpu_filter_features
x86_cpu_get_supported_feature_word
kvm_arch_get_supported_cpuid
get_supported_cpuid
try_get_cpuid
KVM_GET_SUPPORTED_CPUID
qemu_init_vcpu
cpus_accel->create_vcpu_thread[kvm_start_vcpu_thread]
kvm_vcpu_thread_fn
kvm_init_vcpu
kvm_arch_init_vcpu
cpu_x86_cpuid
KVM_SET_CPUID2

KVM_GET_SUPPORTED_CPUIDKVM_SET_CPUID2的更多描述,可以参考kvm/api.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/***** Steps involved on loading and filtering CPUID data
*
* When initializing and realizing a CPU object, the steps
* involved in setting up CPUID data are:
*
* 1) Loading CPU model definition (X86CPUDefinition). This is
* implemented by x86_cpu_load_model() and should be completely
* transparent, as it is done automatically by instance_init.
* No code should need to look at X86CPUDefinition structs
* outside instance_init.
*
* 2) CPU expansion. This is done by realize before CPUID
* filtering, and will make sure host/accelerator data is
* loaded for CPU models that depend on host capabilities
* (e.g. "host"). Done by x86_cpu_expand_features().
*
* 3) CPUID filtering. This initializes extra data related to
* CPUID, and checks if the host supports all capabilities
* required by the CPU. Runnability of a CPU model is
* determined at this step. Done by x86_cpu_filter_features().
*/

3.How to use

qemu-system-x86_64 -cpu help

  • 增加 pdpe1gb feature (其中Nehalem是我选定的CPU型号, 也可以是别的型号)

    qemu-system-x86_64 -cpu Nehalem,+pdpe1gb

  • 增加 pdpe1gb feature, 减去sse feature

    qemu-system-x86_64 -cpu Nehalem,+pdpe1gb,-sse

  • 增加x2apic feature

    qemu-system-x86_64 -cpu host,x2apic=on

qemu解析cpu feature选项的函数为x86_cpu_parse_featurestr

4. MISC

1
2
3
4
5
6
/* Compatibily hack to maintain legacy +-feat semantic,
* where +-feat overwrites any feature set by
* feat=on|feat even if the later is parsed after +-feat
* (i.e. "-x2apic,x2apic=on" will result in x2apic disabled)
*/
static GList *plus_features, *minus_features;
1
2
3
4
5
6
7
8
9
//CPUID usage for interaction between Hypervisors and Linux
//https://lore.kernel.org/kvm/1222881242.9381.17.camel@alok-dev1/

#define CPUID_EXT_HYPERVISOR (1U << 31)

/* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx. It
* should be used to determine that a VM is running under KVM.
*/
#define KVM_CPUID_SIGNATURE 0x40000000

kvm_cpu_cap_clear can update guest cpuid.


参考资料:

  1. READMSR和CPUID指令在Guest中的代码执行路径学习
  2. qemu增加减少CPUID