Notes about split lock detect
文章目录
Notes about split lock detect.
1. Introduction
A split lock is any atomic operation whose operand crosses two cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines.
During bus locking, request from other CPUs or bus agents for control of the bus are blocked. Blocking bus access from other CPUs plus overhead of configuring bus locking protocol degrade not only performance on one CPU but also overall system performance.
If the operand is cacheable and completely contained in one cache line, the atomic operation is optimized by less expensive cache locking on Intel P6 and recent processors. If a split lock operation is detected and a developer fixes the issue so that the operand can be operated in one cache line, cache locking instead of more expensive bus locking will be used for the atomic operation. Removing the split lock can improve overall performance.
Intel-64 and IA32 multiple-processor systems support locked atomic operations on locations in system memory. For example, The LOCK instruction prefix can be prepended to the following instructions: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG, when these instructions use memory destination operand forms.
More information about split lock, bus locking, and cache locking can be found in the latest Intel 64 and IA-32 Architecture Software Developer’s Manual.
2. Split lock detection
Intel introduces a mechanism to detect split lock via Alignment Check(#AC) exception before badly aligned atomic instructions might impact whole system performance in Tremont and other future processors.
This capability is critical for real time system designers who build consolidated real time systems. These systems run hard real time code on some cores and run “untrusted” user processes on some other cores. The hard real time cannot afford to have any bus lock from the untrusted processes to hurt real time performance. To date the designers have been unable to deploy these solutions as they have no way to prevent the “untrusted” user code from generating split lock and bus lock to block the hard real time code to access memory during bus locking.
This capability may also find usage in cloud. A user process with split lock running in one guest can block other cores from accessing shared memory during its split locked memory access. That may cause overall
system performance degradation.
Split lock may open a security hole where malicious user code may slow down overall system by executing instructions with split lock.
3. Feature Enumeration and Control
#AC for Split-locked Access feature is enumerated and controlled via CPUID and MSR registers.
- CPUID.(EAX=0x7, ECX=0):EDX[30], the 30th bit of output value in EDX indicates if the platform has IA32_CORE_CAPABILITIES MSR.
- The 5th bit of IA32_CORE_CAPABILITIES MSR(0xcf), enumerates whether the CPU supports #AC for Split-locked Access (and has TEST_CTRL MSR).
- The 29th bit of TEST_CTL MSR(0x33) controls enabling and disabling #AC for Split-locked Access.
4. Handle split lock
Because #AC is a fault, the instruction is not executed, giving the #AC handler an opportunity to decide how to handle this instruction:
- It can allow the instruction to run with LOCK# bus signal potentially impacting performance of other CPUs.
- It can terminate the software at this instruction.
- and so on.
5. Interface
split_lock_detect
kernel parameter:
6. Example
1 |
|
Intel CPU中,一个cache line 只有64个字节,struct counter中的成员 c 占8个字节,buf填充了62个字节。因此,一旦访问成员c,就涉及两个cache lines的内容的拼接;执行原子操作 __sync_fetch_and_add()
会触发split lock。
1 | [124994.391805] x86/split lock detection: #AC: a.out/91418 took a split_lock trap at address: 0x556c2928819a |
Another example:
x86: Align incw instruction to avoid split lock
7. Implementation in Kernel
1 |
|
If bit 5 is set in MSR_IA32_CORE_CAPS
, the feature X86_FEATURE_SPLIT_LOCK_DETECT
will be enabled and “split_lock_detect” will be displayed.
1 | early_cpu_init |
1 | static void __init split_lock_setup(struct cpuinfo_x86 *c) |
1 | identify_cpu |
参考资料: