Notes about split lock detect.

1. Introduction

A split lock is any atomic operation whose operand crosses two cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines.

During bus locking, request from other CPUs or bus agents for control of the bus are blocked. Blocking bus access from other CPUs plus overhead of configuring bus locking protocol degrade not only performance on one CPU but also overall system performance.

If the operand is cacheable and completely contained in one cache line, the atomic operation is optimized by less expensive cache locking on Intel P6 and recent processors. If a split lock operation is detected and a developer fixes the issue so that the operand can be operated in one cache line, cache locking instead of more expensive bus locking will be used for the atomic operation. Removing the split lock can improve overall performance.

Intel-64 and IA32 multiple-processor systems support locked atomic operations on locations in system memory. For example, The LOCK instruction prefix can be prepended to the following instructions: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG, when these instructions use memory destination operand forms.

More information about split lock, bus locking, and cache locking can be found in the latest Intel 64 and IA-32 Architecture Software Developer’s Manual.

2. Split lock detection

Intel introduces a mechanism to detect split lock via Alignment Check(#AC) exception before badly aligned atomic instructions might impact whole system performance in Tremont and other future processors.

This capability is critical for real time system designers who build consolidated real time systems. These systems run hard real time code on some cores and run “untrusted” user processes on some other cores. The hard real time cannot afford to have any bus lock from the untrusted processes to hurt real time performance. To date the designers have been unable to deploy these solutions as they have no way to prevent the “untrusted” user code from generating split lock and bus lock to block the hard real time code to access memory during bus locking.

This capability may also find usage in cloud. A user process with split lock running in one guest can block other cores from accessing shared memory during its split locked memory access. That may cause overall
system performance degradation.

Split lock may open a security hole where malicious user code may slow down overall system by executing instructions with split lock.

3. Feature Enumeration and Control

#AC for Split-locked Access feature is enumerated and controlled via CPUID and MSR registers.

  • CPUID.(EAX=0x7, ECX=0):EDX[30], the 30th bit of output value in EDX indicates if the platform has IA32_CORE_CAPABILITIES MSR.
  • The 5th bit of IA32_CORE_CAPABILITIES MSR(0xcf), enumerates whether the CPU supports #AC for Split-locked Access (and has TEST_CTRL MSR).
  • The 29th bit of TEST_CTL MSR(0x33) controls enabling and disabling #AC for Split-locked Access.

4. Handle split lock

Because #AC is a fault, the instruction is not executed, giving the #AC handler an opportunity to decide how to handle this instruction:

  • It can allow the instruction to run with LOCK# bus signal potentially impacting performance of other CPUs.
  • It can terminate the software at this instruction.
  • and so on.

5. Interface

split_lock_detect kernel parameter:

6. Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include<stdio.h>
#include <sys/mman.h>

//https://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Structure_002dPacking-Pragmas.html
#pragma pack(push,2)
struct counter
{
char buf[62];
long long c;
};
#pragma pack(pop)

int main () {
struct counter *p;
int size = sizeof(struct counter);
int prot = PROT_READ | PROT_WRITE;
int flags = MAP_PRIVATE | MAP_ANONYMOUS;

p = (struct counter *) mmap(0, size, prot, flags, -1, 0);

while(1) {
__sync_fetch_and_add(&p->c, 1);
}

return 0;
}

Intel CPU中,一个cache line 只有64个字节,struct counter中的成员 c 占8个字节,buf填充了62个字节。因此,一旦访问成员c,就涉及两个cache lines的内容的拼接;执行原子操作 __sync_fetch_and_add()会触发split lock。

1
[124994.391805] x86/split lock detection: #AC: a.out/91418 took a split_lock trap at address: 0x556c2928819a

Another example:

x86: Align incw instruction to avoid split lock

7. Implementation in Kernel

1
2
3
4
5
6
7
8
9
10
11
#define X86_FEATURE_SPLIT_LOCK_DETECT   (11*32+ 6) /* #AC for split lock */
#define X86_FEATURE_BUS_LOCK_DETECT (16*32+24) /* Bus Lock detect */

/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
#define MSR_IA32_CORE_CAPS 0x000000cf
#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)

#define MSR_TEST_CTRL 0x00000033
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)

If bit 5 is set in MSR_IA32_CORE_CAPS, the feature X86_FEATURE_SPLIT_LOCK_DETECT will be enabled and “split_lock_detect” will be displayed.

1
2
3
4
5
6
7
8
early_cpu_init
early_identify_cpu
sld_setup
split_lock_setup
__split_lock_setup
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT)
sld_state_setup
sld_state_show
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
static void __init split_lock_setup(struct cpuinfo_x86 *c)
{
const struct x86_cpu_id *m;
u64 ia32_core_caps;

if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;

m = x86_match_cpu(split_lock_cpu_ids);
if (!m)
return;

switch (m->driver_data) {
case 0:
break;
case 1:
if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITIES))
return;
rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT))
return;
break;
default:
return;
}

cpu_model_supports_sld = true;
__split_lock_setup();
}
1
2
3
4
identify_cpu
init_intel[this_cpu->c_init]
split_lock_init
split_lock_verify_msr

参考资料:

  1. x86/split_lock: Enable split lock detection
  2. Detecting and handling split locks
  3. Handling Split-Locked Access in ACRN
  4. TCC feature introduction - split lock