Notes about XSAVE feature set
文章目录
mark下个人对XSAVE feature set的一点理解。本文大部分内容源于组内同学的知识分享,并非作者个人原创。
读者若想深入研究XSAVE feature set,Intel SDM Vol1 “MANAGING STATE USING THE XSAVE FEATURE SET”这章的内容就是极佳的材料。
1. Why?
一言以蔽之:Support the saving and restoring of processor state by hardware instead of software.
线程可能会用到X87,SSE,AVX512等feature,这些feature会包含很多的寄存器。如果每次线程切换时,系统软件都需要手动save and restore这些寄存器,那样开销会很大。为此,XSAVE feature set应运而生,系统软件只需要调用XSAVE feature set提供的指令,即可让硬件来完成X87,SSE,AVX512等寄存器的save与restore功能!
2. Overview
The XSAVE feature set supports the saving and restoring of state components, each of which is a discrete set of processor registers (or parts of registers). Each such state component corresponds to an XSAVE-supported feature. The XSAVE feature set organizes the state components of the XSAVE-supported features using state component bitmaps. A state-component bitmap comprises 64 bits; each bit in such a bitmap corresponds to a single state component. Some state components are supervisor state components. The XSAVE feature supports supervisor state components with only the XSAVES and XRSTORS instructions.
- For User state components, Specified by XCR0
- For Supervisor state components, Specified by IA32_XSS MSR
XSAVE-enabled features (those features that require use of the XSAVE feature set for their enabling) .
3. XSAVE Area
The XSAVE feature set allows saving and loading processor state from a region of memory called XSAVE area.
3.1 Legacy Region
3.2 XSAVE Header
XCOMP_BV[63] indicates the format of the extended region of the XSAVE area (see Section 13.4.3).
- If it is clear, the standard format is used.
- If it is set, the compacted format is used; XCOMP_BV[62:0] provide format specifics as specified in Section 13.4.3.
为什么需要区分standard format与compacted format呢?其实就是为了节省内存。例如,某线程没用使用AVX512 feature,那么,compacted format就不会在Extended Region里存储AVX512相关的寄存器状态。
3.3 Extended Region
The XSAVE feature set uses the extended area for each state component i, where i ≥ 2.
All state components other than X87 and SSE are using the extended region.
Format of extended region:
- Standard format
- Compacted format
3.3.1 Standard Format
Supported by all processors that support the XSAVE feature set.
Location of each state component i (i ≥ 2) is determined by CPUID.
- Offset: CPUID.(EAX=0DH,ECX=i):EBX
- Size: CPUID.(EAX=0DH,ECX=i):EAX
3.3.2 Compacted Format
Supported by those processors that support the compaction extensions CPUID.(EAX=0DH,ECX=1):EAX[1]
Location of each state component i (i ≥ 2) is determined by CPUID and XCOMP_BV field in the XSAVE header.
- Offset: refer to section 13.4.3, Vol.1, SDM
- Size: CPUID.(EAX=0DH,ECX=i):EAX
4. Optimization
The XSAVEOPT, XSAVEC, and XSAVES instructions use two optimizations to reduce the amount of data that they write to memory.
4.1 The init optimization
Avoid writing data for any state component known to be in its initial configuration.
4.2 The modified optimization
If either XSAVEOPT or XSAVES is using the same XSAVE area as that used by the most recent execution of XRSTOR or XRSTORS, it may avoid writing data for any state component whose configuration is known not to have been modified since then.