Notes about gVisor
What
gVisor provides a strong layer of isolation between running applications and the host operating system. It is an application kernel that implements a Linux-like interface. Unlike Linux, it is written in a memory-safe language (Go) and runs in userspace.
How is this different?
Machine-level virtualization
such as KVM and Xen, exposes virtualized hardware to a guest kernel via a Virtual Machine Monitor (VMM). This virtualized hardware is generally enlightened (paravirtualized) and additional mechanisms can be used to improve the visibility between the guest and host (e.g. balloon drivers, paravirtualized spinlocks). Running containers in distinct virtual machines can provide great isolation, compatibility and performance, but for containers it often requires additional proxies and agents, and may require a larger resource footprint and slower start-up times.

Rule-based execution
such as seccomp, SELinux and AppArmor, allows the specification of a fine-grained security policy for an application or container. These schemes typically rely on hooks implemented inside the host kernel to enforce the rules. If the surface can be made small enough, then this is an excellent way to sandbox applications and maintain native performance. However, in practice it can be extremely difficult (if not impossible) to reliably define a policy for arbitrary, previously unknown applications, making this approach challenging to apply universally.

gVisor
provides a third isolation mechanism, distinct from those above.
gVisor intercepts application system calls and acts as the guest kernel, without the need for translation through virtualized hardware. gVisor may be thought of as either a merged guest kernel and VMM, or as seccomp on steroids(强化版). This architecture allows it to provide a flexible resource footprint (i.e. one based on threads and memory mappings, not fixed guest physical resources) while also lowering the fixed costs of virtualization. However, this comes at the price of reduced application compatibility and higher per-system call overhead.

Architecture
Google gVisor is a sandboxed container runtime that uses para-virtualization to isolate containerized applications from the host system without the heavy-weight resource allocation that comes with full virtual machines. It implements a user space kernel, Sentry, that is written in the Go Language and runs in a restricted seccomp container. 

Figure 2 shows gVisor’s architecture. All syscalls made by the application are redirected into the Sentry, which implements most system call functionality itself for the 237 syscalls it supports. Sentry makes calls to 53 host syscalls to support its operations. This prevents the application from having any direct interaction with the host through syscalls. gVisor supports two methods of redirecting syscalls: ptrace-mode uses ptrace in the Linux kernel to forward syscalls to the sentry and KVM-mode uses KVM to trap syscalls before they hit the Linux kernel so they can be forwarded to the sentry. KVM-mode performs better than ptrace for many workloads and has several benefits over the ptrace platform according to the gVisor documentation.
gVisor starts a Gofer process with each container that provides the Sentry with access to file system resources. Thus, a compromised Sentry cannot directly read or write any files. A writable tmpfs can be overlaid on the entire file system to provide complete isolation from the host file system. To enable sharing between the running containers and with the host, a shared file access mode may be used.
gVisor has its own user-space networking stack written in Go called netstack. The Sentry uses netstack to handle almost all networking, including TCP connection state, control messages, and packet assembly, rather than relying on kernel code that shares much more state across containers. gVisor also provides an option to use host networking for higher performance
参考资料: