Notes about VXLAN
本文将mark下VXLAN(Virtual Extensible LAN)的相关notes。
Prerequisite
Motivation
虚拟机规模受限制
一台物理服务器上可能运行几十甚至上百台虚拟机,但接入交换机的 MAC 地址表容量有限,无法支撑海量虚拟机的 MAC 学习需求。
网络隔离能力弱
传统 VLAN 只有 12个比特位,最多支持 4096 个隔离域。对于大型云平台而言,租户数量可能远超这个数字,VLAN不够用了。
虚拟机迁移范围受限
虚拟机迁移通常要求目标主机在同一个二层广播域内,这就限制了迁移只能在一个较小的物理范围内进行,无法实现跨机房、跨数据中心的灵活迁移。
Introduction
VXLAN是主流的overlay技术,通过“MAC in UDP”的封装方式,将二层以太网帧封装在 UDP 报文中,通过三层 IP 网络进行传输。
这样一来再也不怕虚机“乱跑”了:
虚拟机的 MAC 地址只在 VXLAN 边缘设备(VTEP)之间可见,中间网络设备无需学习,解决了 MAC 表瓶颈。
使用 24 位的VNI(VXLAN Network Identifier),支持多达 1600万个隔离段。
只要IP可达,虚拟机就可以跨任意三层网络迁移,构建了一个“虚拟大二层”。
核心组件
VTEP
A VTEP(VXLAN tunnel endpoint) is an edge device on a VXLAN network and the start or end point of a VXLAN tunnel. The source VTEP encapsulates the original data frames sent by the source server into VXLAN packets and transmits them to the destination VTEP on the IP network. The destination VTEP then decapsulates the VXLAN packets into the original data frames and forwards the frames to the destination server.
VNI
A VNI(VXLAN Network Identifier) is a user identifier similar to a VLAN ID. A VNI identifies a tenant. VMs with different VNIs cannot communicate at Layer 2.
VXLAN Gateway
Similar to in VLANs, hosts with different VNIs or those on VXLAN and non-VXLAN networks should be unable to directly communicate with each other. To meet these communication requirements, VXLAN introduces VXLAN gateways.
packet

VXLAN packet format (outer IPv4 header used as an example)
As shown in the preceding figure, a VXLAN tunnel endpoint (VTEP) encapsulates the following headers into the original Ethernet frame (original L2 frame) sent by a VM:
- VXLAN header
A VXLAN header (8 bytes) contains a 24-bit VNI field, which is used to define different tenants on the VXLAN network. It also contains a VXLAN Flags field (8 bits, set to 00001000) and two reserved fields (24 bits and 8 bits, respectively). - UDP header
The VXLAN header and the original Ethernet frame are used as UDP data. In the UDP header, the destination port number (VXLAN Port) is fixed at 4789, and the source port number (UDP Src. Port) is calculated using a hash algorithm based on the original Ethernet frame. - Outer IP header
In the outer IP header, the source IP address (Outer Src. IP) is the IP address of the VTEP connected to the source VM, and the destination IP address (Outer Dst. IP) is the IP address of the VTEP connected to the destination VM. - Outer MAC header
The outer MAC header is also called the outer Ethernet header. In this header, the source MAC address (Src. MAC Addr.) is the MAC address of the VTEP connected to the source VM, and the destination MAC address (Dst. MAC Addr.) is the MAC address of the next hop along the path to the destination VTEP.

So basically the underlay is the actual physical topology doing the routing. While the overlay is the virtual network interconnecting the end devices between which overlay protocol is configured.
If we look at the packet itself, this is how it looks in overlay (the blue portion - the original frame) vs underlay (the red+blue - the encapsulated frame):
流程

如图所示:VM A发出数据包给VTEP后,VTEP-1 收到这个包,查表发现 VM B 在远端的 VTEP-2 下面,便开始封装vxlan包。VTEP-2 收到这个 IP 包,发现目的端口是 4789,知道这是个 VXLAN 包后剥离外层 MAC、IP、UDP、VXLAN 头,取出里面的原始以太网报文送给VM B,VM B 收到的就是一个标准的以太网报文,完全感知不到经历了中间的“包装”。
总结
VXLAN 不仅是技术的演进,更是云时代网络架构的必然选择。它通过封装与隧道技术,在 IP 网络上构建出灵活、可扩展、多租户隔离的虚拟二层网络,真正实现了“网络随业务而动”。
参考资料: