本文将以QEMU V5.2.0,kernel v5.14的源码与SDM的描述,介绍MSR management,具体细节不会一一介绍,但是会点出关键性的内容,读者可以以此为线索,深挖细节。

1. 理论基础

1.1 RDMSR and WRMSR instruction

  • RDMSR—Read from Model Specific Register
    EDX:EAX ← MSR[ECX];
  • WRMSR—Write to Model Specific Register
    MSR[ECX] ← EDX:EAX;

WRMSR与RDMSR类似,受篇幅限制,接下来主要以RDMSR为主。

1.2 VM Exit

The RDMSR instruction causes a VM exit if any of the following are true:

  • The “use MSR bitmaps” VM-execution control is 0.
  • The value of ECX is not in the ranges 00000000H – 00001FFFH and C0000000H – C0001FFFH.
  • The value of ECX is in the range 00000000H – 00001FFFH and bit n in read bitmap for low MSRs is 1, where n is the value of ECX.
  • The value of ECX is in the range C0000000H – C0001FFFH and bit n in read bitmap for high MSRs is 1, where n is the value of ECX & 00001FFFH.

1.3 MSR bitmap

On processors that support the 1-setting of the “use MSR bitmaps” VM-execution control, the VM-execution control fields include the 64-bit physical address of four contiguous MSR bitmaps, which are each 1-KByte in size. This field does not exist on processors that do not support the 1-setting of that control. The four bitmaps are:

  • Read bitmap for low MSRs (located at the MSR-bitmap address). This contains one bit for each MSR address in the range 00000000H to 00001FFFH. The bit determines whether an execution of RDMSR applied to that MSR causes a VM exit.
  • Read bitmap for high MSRs (located at the MSR-bitmap address plus 1024). This contains one bit for each MSR address in the range C0000000H toC0001FFFH. The bit determines whether an execution of RDMSR applied to that MSR causes a VM exit.
  • Write bitmap for low MSRs (located at the MSR-bitmap address plus 2048). This contains one bit for each MSR address in the range 00000000H to 00001FFFH. The bit determines whether an execution of WRMSR applied to that MSR causes a VM exit.
  • Write bitmap for high MSRs (located at the MSR-bitmap address plus 3072). This contains one bit for each MSR address in the range C0000000H toC0001FFFH. The bit determines whether an execution of WRMSR applied to that MSR causes a VM exit.

A logical processor uses these bitmaps if and only if the “use MSR bitmaps” control is 1. If the bitmaps are used, an execution of RDMSR or WRMSR causes a VM exit if the value of RCX is in neither of the ranges covered by the bitmaps or if the appropriate bit in the MSR bitmaps (corresponding to the instruction and the RCX value) is 1.

1.4 VM-Exit Controls for MSRs

1.5 VM-Entry Controls for MSRs

2.1 MSR bitmap

2.1.1 空间分配与初始化

1
2
3
4
5
kvm_vm_ioct(KVM_CREATE_VCPU)
kvm_vm_ioctl_create_vcpu
kvm_arch_vcpu_create
vmx_create_vcpu[static_call(kvm_x86_vcpu_create)(vcpu)]
alloc_loaded_vmcs
1
2
3
4
5
6
7
8
9
10
11
// 分配一个page(4K)的空间给msr bitmap,并将该空间的内容初始化为全1
int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
{
if (cpu_has_vmx_msr_bitmap()) {
loaded_vmcs->msr_bitmap = (unsigned long *)
__get_free_page(GFP_KERNEL_ACCOUNT);
if (!loaded_vmcs->msr_bitmap)
goto out_vmcs;
memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE);
...
}

2.1.2 VMCS field

1
2
3
4
vmx_create_vcpu
alloc_loaded_vmcs
init_vmcs
vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap))

2.2 passthrough MSR

vmx_disable_intercept_for_msr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
...

/*
* Mark the desired intercept state in shadow bitmap, this is needed
* for resync when the MSR filters change.
*/
if (is_valid_passthrough_msr(msr)) {
int idx = possible_passthrough_msr_slot(msr);

if (idx != -ENOENT) {
if (type & MSR_TYPE_R)
clear_bit(idx, vmx->shadow_msr_intercept.read);
if (type & MSR_TYPE_W)
clear_bit(idx, vmx->shadow_msr_intercept.write);
}
}
...
}

2.3 MSR area

建议学习下虚拟化学习心得:three context 中MSR area中的motivation。

以下内容为关键字,读者可去KVM中搜索源码学习。

1
2
VM_ENTRY_MSR_LOAD_COUNT
VM_ENTRY_MSR_LOAD_ADDR
1
2
3
4
VM_EXIT_MSR_STORE_COUNT
VM_EXIT_MSR_LOAD_COUNT
VM_EXIT_MSR_STORE_ADDR
VM_EXIT_MSR_LOAD_ADDR
1
2
3
4
5
6
7
8
9
10
11
12
struct vcpu_vmx {
...
struct msr_autoload {
struct vmx_msrs guest;
struct vmx_msrs host;
} msr_autoload;

struct msr_autostore {
struct vmx_msrs guest;
} msr_autostore;
...
}

3. How KVM handle MSR read

1
2
3
4
5
struct msr_data {
bool host_initiated;
u32 index;
u64 data;
};

host_initiated:

  • true: QEMU fired the call to operate on an MSR reg
  • false: guest fired the call to operate on an MSR reg

3.1 VM Exit when guest executing RDMSR instruction

1
2
3
4
5
6
kvm_emulate_rdmsr
kvm_get_msr
kvm_get_msr_ignored_check
__kvm_get_msr
vmx_get_msr[kvm_x86_get_msr]
kvm_get_msr_common

vmx_get_msr处理一部分特殊MSR的读请求,kvm_get_msr_common处理普通MSR的读请求。

3.2 QEMU get MSRs

1
2
3
4
5
6
7
kvm_arch_dev_ioctl
msr_io
__msr_io(...,do_get_msr)

kvm_arch_vcpu_ioctl
msr_io
__msr_io(...,do_get_msr)
1
2
3
4
do_get_msr
kvm_get_msr_ignored_check
__kvm_get_msr
vmx_get_msr

4. IOCTL

1
2
3
4
5
6
7
8
9
10
11
/*
* List of msr numbers which we expose to userspace through KVM_GET_MSRS
* and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
*
* The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features)
* extract the supported MSRs from the related const lists.
* msrs_to_save is selected from the msrs_to_save_all to reflect the
* capabilities of the host cpu. This capabilities test skips MSRs that are
* kvm-specific. Those are put in emulated_msrs_all; filtering of emulated_msrs
* may depend on host virtualization features rather than host cpu features.
*/

4.1 KVM_GET_MSR_INDEX_LIST

KVM_GET_MSR_INDEX_LIST returns the guest MSRs that are supported. The list varies by kvm version and host processor, but does not change otherwise.

1
2
3
// QEMU
kvm_get_supported_msrs
kvm_ioctl(KVM_GET_MSR_INDEX_LIST)
1
2
3
4
// KVM
kvm_arch_dev_ioctl(KVM_GET_MSR_INDEX_LIST)
msrs_to_save
emulated_msrs

4.2 KVM_GET_MSR_FEATURE_INDEX_LIST

KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities and processor features that are exposed via MSRs (e.g., VMX capabilities).
This list also varies by kvm version and host processor, but does not change otherwise.

1
2
3
// QEMU
kvm_get_supported_feature_msrs
kvm_ioctl(KVM_GET_MSR_FEATURE_INDEX_LIST)
1
2
3
// KVM
kvm_arch_dev_ioctl(KVM_GET_MSR_INDEX_LIST)
msr_based_features

4.3 KVM_GET_MSRS

When used as a system ioctl:
Reads the values of MSR-based features that are available for the VM.
The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST in a system ioctl.

When used as a vcpu ioctl:
Reads model-specific registers from the vcpu. Supported msr indices can be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl.

1
2
3
4
5
6
7
8
9
10
11
12
struct kvm_msrs {
__u32 nmsrs; /* number of msrs in entries */
__u32 pad;

struct kvm_msr_entry entries[0];
};

struct kvm_msr_entry {
__u32 index;
__u32 reserved;
__u64 data;
};

Application code should set the nmsrs member (which indicates the size of the entries array) and the index member of each array entry. kvm will fill in the data member.

4.4 KVM_SET_MSRS

Writes model-specific registers to the vcpu.

Application code should set the nmsrs member (which indicates the size of the entries array), and the index and data members of each array entry.


参考资料:

  1. READMSR和CPUID指令在Guest中的代码执行路径学习
  2. kvm/api.txt