本文将mark下eMCA(Enhanced Machine Check Architecture)相关notes。

Overview

简单来讲,eMCA可以将MCE和CMCI转换成SMI,让Firmware(BIOS)可以先行处理,然后再丢给OS。

下图以CMCI为例,MCE的流程也是类似:

SDM的描述

  • MCG_EMC_P (Enhanced Machine Check Capability) flag, bit 25 — Indicates (when set) that the processor supports enhanced machine check capabilities for firmware first signaling.
  • MCG_ELOG_P (extended error logging) flag, bit 26 — Indicates (when set) that the processor allows platform firmware to be invoked when an error is detected so that it may provide additional platform specific information in an ACPI format “Generic Error Data Entry” that augments the data included in machine check bank registers.

FFM (Firmware First Mode)

FFM allows firmware to provide additional error information to os, synchronous with MCE or CMCI.

The hardware generates an SMI upon error. The SMI handler pre-processes the error and constructs Error Log in memory prior to continuing with the MCE or CMCI.


参考资料:

  1. peterhu:RAS简介
  2. 4th Gen Intel® Xeon® Scalable Processors: Reliability, Availability, and Serviceability (RAS) Technical Paper
  3. Server RAS and UEFI CPER
  4. Martin’s Coding Note:MCA
  5. MCA Enhancements in Intel® Xeon® Processors