Notes about Paravirtualized ticket spinlocks
文章目录
本文将mark下kvm中的Paravirtualized ticket spinlocks相关notes。
参考内核版本为v5.0,主要内容转载自kvm performance optimization technologies, part two。
在kvm中,也会将Paravirtualized ticket spinlocks称为PV unhalt。
Background
Now, suppose that vCPU A grabs a spinlock, and before it finishes, is preempted by the scheduler. And then suppose vCPU B tries to grab the spinlock. Now B, instead of spinning for the short period of time that A needed the spinlock for, will be spinning until vCPU is scheduled again — which may be anywhere from several milliseconds to hundreds of milliseconds, depending on how busy the system is. B is now using the cpu but accomplishing nothing. It’s burning up its VM’s share of CPU time, and keeping other VMs with useful work to do from running. Worse yet, it may even be that the reason A is not running is that the hypervisor’s scheduler is trying to give priority to B — so B is actively keeping A from finishing the work that it needed to do with the spinlock.
The situation gets even worse with a more advanced form of spinlock called a ticketlock. Ticketlocks have a lot of advantages for large systems, including reduced cacheline bouncing and more predictable wait time. (See this LWN article for a complete description.) The key attribute for this discussion is that ticketlocks essentially make a first-come first-served queue: if A has the lock, then B tries to grab it, and then C, B is guaranteed to get the lock before C. So now, if C is spinning waiting for the lock, it must wait for both A and B to finish before it can get the lock.
The result is that on a moderately loaded system, the vast majority of the cpu cycles are actually spent waiting for ticketlocks rather than doing any useful work. This is called a “ticketlock storm”.
Overview
The PV unhalt feature is used to set the pv_lock_ops
to rewrite the native spinlock’s function so it can be more optimizated.
Though the total implementation of pv spinlock is related with the spinlock implementation such as ticketlock and queued spinlock, the basic idea behind the pv spinlock is the same. That is instead of spining while the vCPU can’t get the spinlock, it will execute halt instruction and let the other vCPU got scheduled. When the vCPU can get the spinlock, allows a vCPU to kick the target vCPU out of halt state.
When the time spent waiting for a lock reaches a given threshold, kvm is notified via a hypercall that a vCPU is currently blocked by a held lock. KVM then halts the waiting vCPU until the lock is detected to be available.
初始化
kvm side
kvm should expose the KVM_FEATURE_PV_UNHALT
to the guest.1
2
3
4
5
6
7
8
9static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
u32 index, int *nent, int maxnent)
{
...
case KVM_CPUID_FEATURES:
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
...
(1 << KVM_FEATURE_PV_UNHALT) |
...
guest side
When the guest startup, kvm_spinlock_init
is used to initialize the pv spinlock.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27void __init kvm_spinlock_init(void)
{
if (!kvm_para_available())
return;
/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
if (kvm_para_has_hint(KVM_HINTS_REALTIME))
return;
/* Don't use the pvqspinlock code if there is only 1 vCPU. */
if (num_possible_cpus() == 1)
return;
__pv_init_lock_hash();
pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_ops.lock.queued_spin_unlock =
PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_ops.lock.wait = kvm_wait;
pv_ops.lock.kick = kvm_kick_cpu;
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
pv_ops.lock.vcpu_is_preempted =
PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
}
}
最值得关注的函数是kvm_wait
和kvm_kick_cpu
,会在下面进行详细的介绍。
halt when vCPU is blocked by a held lock
guest side
When the vCPU can’t get the spinlock, it will call wait callback.
1 | static __always_inline void pv_wait(u8 *ptr, u8 val) |
Then it will execute the halt instruction in kvm_wait
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25static void kvm_wait(u8 *ptr, u8 val)
{
unsigned long flags;
if (in_nmi())
return;
local_irq_save(flags);
if (READ_ONCE(*ptr) != val)
goto out;
/*
* halt until it's our turn and kicked. Note that we do safe halt
* for irq enabled case to avoid hang when lock info is overwritten
* in irq spinlock slowpath and no spurious interrupt occur to save us.
*/
if (arch_irqs_disabled_flags(flags))
halt();
else
safe_halt();
out:
local_irq_restore(flags);
}
kvm side
1 | handle_halt |
When the guest execute halt instruction, the kvm_emulate_halt
->kvm_vcpu_halt
will be called. This will set the vcpu->arch.mp_state
to KVM_MP_STATE_HALTED
. Then vcpu_block
will be called to block this vCPU.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15static int vcpu_run(struct kvm_vcpu *vcpu)
{
int r;
struct kvm *kvm = vcpu->kvm;
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
vcpu->arch.l1tf_flush_l1d = true;
for (;;) {
if (kvm_vcpu_running(vcpu)) {
r = vcpu_enter_guest(vcpu);
} else {
r = vcpu_block(kvm, vcpu);
}
...
1 | static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu) |
kick the halted vCPU until the lock is detected to be available
guest side
When the halted vCPU can get the spinlock, the kick
callback will be called by pv_kick
by a running vCPU. The kvm_kick_cpu
will be called and this trigger a KVM_HC_KICK_CPU
hypercall.1
2
3
4
5
6
7
8
9
10
11
12
13
14static __always_inline void pv_kick(int cpu)
{
PVOP_VCALL1(lock.kick, cpu);
}
/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
static void kvm_kick_cpu(int cpu)
{
int apicid;
unsigned long flags = 0;
apicid = per_cpu(x86_cpu_to_apicid, cpu);
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
}
kvm side
When the guest trigger KVM_HC_KICK_CPU
hypercall, kvm_pv_kick_cpu_op
will be called.1
2
3
4
5
6
7
8
9int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
...
case KVM_HC_KICK_CPU:
kvm_pv_kick_cpu_op(vcpu->kvm, a0, a1);
ret = 0;
break;
...
}
The kvm_pv_kick_cpu_op
will send an interrupt to the lapic.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18/*
* kvm_pv_kick_cpu_op: Kick a vcpu.
*
* @apicid - apicid of vcpu to be kicked.
*/
static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
{
struct kvm_lapic_irq lapic_irq;
lapic_irq.shorthand = 0;
lapic_irq.dest_mode = 0;
lapic_irq.level = 0;
lapic_irq.dest_id = apicid;
lapic_irq.msi_redir_hint = false;
lapic_irq.delivery_mode = APIC_DM_REMRD;
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}
1 | kvm_irq_delivery_to_apic |
Then in __apic_accept_irq
it will kick the blocked vCPU.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17/*
* Add a pending IRQ into lapic.
* Return 1 if successfully added and 0 if discarded.
*/
static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
int vector, int level, int trig_mode,
struct dest_map *dest_map)
{
...
case APIC_DM_REMRD:
result = 1;
vcpu->arch.pv.pv_unhalted = 1;
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_vcpu_kick(vcpu);
break;
...
}
For the halted vCPU, then kvm_vcpu_block
returns, it will set vcpu->arch.mp_state
to KVM_MP_STATE_RUNNABLE
and let the halted vCPU enter Non-root mode to get the spinlock.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
{
if (!kvm_arch_vcpu_runnable(vcpu) &&
(!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
kvm_vcpu_block(vcpu);
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
if (kvm_x86_ops->post_block)
kvm_x86_ops->post_block(vcpu);
// 此时kvm_check_request(KVM_REQ_UNHALT, vcpu)为true
// 因为__apic_accept_irq调用了kvm_make_request(KVM_REQ_EVENT, vcpu)
if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
return 1;
}
kvm_apic_accept_events(vcpu);
switch(vcpu->arch.mp_state) {
case KVM_MP_STATE_HALTED:
vcpu->arch.pv.pv_unhalted = false;
vcpu->arch.mp_state =
KVM_MP_STATE_RUNNABLE;
...
}
参考资料:
- kvm performance optimization technologies, part two
- kvm : Paravirt-spinlock support for KVM guests
- Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud
- Benchmarking the new PV ticketlock implementation
- How to deal with lock-holder preemption
- Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3