Notes about Intel Processor Trace (IPT)
本文将mark下Intel Processor Trace (IPT)的相关notes。
Overview
Motivation
Currently, there are several control flow tracing mechanisms in hardware, each representing a different set of tradeoffs (as shown in Table 1). Branch Trace Store (BTS) can capture all control transfer events (e.g., call, return and all types of jumps) to a memory-resident BTS buffer. Each record contains the addresses of source and target of the branch instruction, thus there is no need to decode it. However, BTS introduces very high overhead during tracing and is inflexible due to the lack of event filtering mechanisms. While Last Brach Record (LBR) has some support of event filtering (e.g., filtering out conditional branches), it can only record 16 or 32 most recent branch pairs (source and target) into a register stack. Though it incurs very low tracing overhead, it can hardly provide precise protection.
Due to the capability of dynamically tracing control flow, BTS and LBR have been exploited to defend against ROP like attacks, which, however, either incur high overhead (those using BTS) or sacrifice security due to imprecise tracing(LBR).
IPT能以比BTS低得多的性能开销(通常在 5% 左右或更低),持续、近乎无遗漏地记录程序执行的控制流。它在很多场景下正在替代 BTS。
Introduction
IPT is introduced in Intel Core M and 5th generation Intel Core processors. Each CPU core has its own IPT hardware that generates trace information of running programs in the form of packets
. IPT configuration can only be done by the privileged agents (e.g., OS) using certain model-specific registers (MSRs). The traced packets are written to the pre-configured memory buffer in a compressed form to minimize the output bandwidth and reduce the tracing overhead. The software decoder can decode the traced packets based on pre-defined format, with the extra information like the program binaries, as well as some runtime data provided by the control agent, to precisely reconstruct the program flow. Thanks to the aggressive compression of traces, it can collect more control flow tracing information including control flow, execution modes, and timings than BTS, yet incurring much less tracing overhead compared to BTS. This, however, also incurs orders of magnitude slower decoding speed than tracing.
参考资料: