系统设计黄金法则:简单之美

包云岗老师分享的经验。

李凯教授KISS(keep it simple)原则,目的其实是——“快速推进、逐步优化”。我们设计一个算法,往往可以在大脑中预先思考好,然后直接编程写出来。但是,我们设计实现一个系统,当系统的复杂度超出我们大脑的工作记忆容量时,就无法在大脑中去“模拟”每一个细节。此时,我们应该用最快的速度去把系统建起了,然后再对各个环节进行优化。

与施一公教授:”耗费时间的完美主义阻碍创新进取 “的文章不谋而合。

简单不是目标,让简单的设计,满足复杂的需求才是目标。设计应该让系统具有从简单到复杂的演化能力。

ODM与OEM是什么?

ODM 和 OEM 分别是什么?两者有什么本质区别?

  • ODM(Original Design Manufacturer)

  • OEM(Original Equipment Manufacturer)

  • OBM(Original Brand Manufacturer)

OBM:A设计,A生产,A品牌,A销售==工厂自己设计自产自销
ODM:B设计,B生产,A品牌,A销售==俗称“贴牌”,就是工厂的产品,别人的品牌
OEM:A设计,B生产,A品牌,A销售==代工,代生产,别人的技术和品牌,工厂只生产

甲厂看重乙厂的制造和设计能力,令其设计制造其所需的产品,乙厂就叫ODM;
甲厂看重乙厂生产制造能力,仅仅让乙产代工生产(不包含设计),乙厂就是OEM;
如果甲厂完全由自己设计制造并销售产品,那么甲厂就是OBM。

princeton Advanced Computer Systems

较好的课程,可以学习下。

jailhouse hypervisor

First of all, it is a partitioning hypervisor that is more concerned with isolation than virtualization. Jailhouse is lightweight and doesn’t provide many features one traditionally expects from virtualization systems. For example, there is no support for overcommitment of resources, guests can’t share a CPU because there is no scheduler, and Jailhouse can’t emulate devices you don’t have.

Instead, Jailhouse enables asymmetric multiprocessing (AMP) on top of an existing Linux setup and splits the system into isolated partitions called “cells.” Each cell runs one guest and has a set of assigned resources (CPUs, memory regions, PCI devices) that it fully controls. The hypervisor’s job is to manage cells and maintain their isolation from each other. This approach is most useful for virtualizing tasks that require full control over the CPU; examples include realtime control tasks and long-running number crunchers (high-performance computing). Besides these, it can be used for security applications: to create sandboxes, for example.

microcode

Microcode究竟是个什么概念?

比如X86指令集是CSIC指令集,但后期X86处理器实际上核心都是RISC架构的,通过微代码能将复杂的X86指令翻译成成硬件能直接执行的简单RISC指令序列。
广义上讲,指令集中的很多指令并不能直接用硬件电路实现,通过微代码将复杂指令翻译成硬件电路能直接执行的简单指令序列。

cpu的微码起作用的地方在cpu的流水线上,比如说译码器,一个cisc指令到达译码器,可能译码成一个微操作(相当于risc指令),也可能是多个。具体译码成几个,分别是什么,就可以通过微码控制。有时发现cpu流片有bug,某条cisc指令行为不对,这是就可以修改微码来挽救。这个不能保证万试万灵,但多留一些余地总是好的,具体能起多大作用就要看设计者的能力了。

流片

在集成电路设计领域,“流片”指的是“试生产”,就是说设计完电路以后,先生产几片几十片,供测试用。如果测试通过,就照着这个样子开始大规模生产了。

CPU vs MPU vs MCU

CPU vs MPU

The central processing unit (CPU) is a chip that functions as the brains of the computer. Sound cards and network cards are encased in microprocessors. So a CPU is part of a microprocessor, but a microprocessor is more than the CPU.

MPU VS MCU

A microprocessor generally does not have RAM, ROM and IO pins. It usually uses its pins as a bus to interface to peripherals such as RAM, ROM, Serial ports, Digital and Analog IO. It is expandable at the board level due to this.

A microcontroller is ‘all in one’, the processor, ram, IO all on the one chip, as such you cannot (say) increase the amount of RAM available or the number of IO ports. The controlling bus is internal and not available to the board designer.

This means that a microprocessor is generally capable of being built into bigger general purpose applications than a microcontroller. The microcontroller is usually used for more dedicated applications.

Virtual appliance

A virtual appliance is a pre-configured virtual machine image, ready to run on a hypervisor. Installation of a software appliance on a virtual machine and packaging that into an image creates a virtual appliance. Like software appliances, virtual appliances are intended to eliminate the installation, configuration and maintenance costs associated with running complex stacks of software.

Virtual Appliances: A New Paradigm for Software Delivery docker思想的鼻祖。

An Analysis of Performance Evolution of Linux Core Operations

SOSP 2019——SJTU-IPADS的集体见闻

这个工作的主要贡献是通过对Linux的核心内核操作(core kernel operations) 的详尽性能测试发现,Linux的核心内核操作的性能意外地随着时间的推移而变差。而导致Linux Kernel的基本OS操作变慢的主要原因可以分为以下不同的三类:安全性补丁、新的特性以及kernel配置上的变化。为此,作者提出了一个新的内核测试工具:LEBench,其包含了一系列有代表性的负载测试以及一个回归测试框架,其能够横向的对比测试不同版本的Linux的核心内核操作的性能差异。最后,作者列出了11个显著导致Linux的核心内核操作减缓的原因并避免或解决他们带来的性能影响。

这篇论文值得细细研究,对于研究内核的多个子系统有帮助。

回归测试

回归测试

假如,在3.1.5版本,模块A的测试用例125是通过的,但是测试人员发现在新的版本3.1.6,这个测试用例却失败了,这就是一个“倒退”。在新版本上运行所有已通过的测试用例以验证有没有“退化”情况发生,这个过程就是一个“Regression Test”。如果这样的“倒退”是由于模块的功能发生了正常变化(由于设计变更的原因)引起的。

针对一个Bug Fix,我们要作Regression Test。

(1)验证新的代码的确把缺陷改正了。

(2)同时要验证新的代码没有把模块的现有功能破坏,没有Regression。

所以对于“回归测试”中的“回归”,我们可以理解为“回归到以前不正常的状态”。

Its time to start writing

A curated list of Computer Engineering and Computer Architecture resources