cgroup的简介与功能

Control Groups (cgroups), which provide a mechanism for easily managing and monitoring system resources, by partitioning things like cpu time, system memory, disk and network bandwidth, into groups, then assigning tasks to those groups.

Let me try and explain what control groups are, and what they allow you to do. Lets say for example that you have a resources intensive application on a server. Linux is great at sharing resources between all of the processes on a system, but in some cases, you want to allocate, or guarantee, a greater amount to a specific application, or a set of applications, this is where control groups are useful.

For example, lets say we wanted to assign or isolate an applications resources, lets create two groups, group #1 will be for our operating system, and group #2 will be for our application, then we can assign resource profiles to each group.

这里写图片描述

Lets focus on Group #2 for a moment. Typically when you create a group, you already have a problem in mind, so for the sake of this example, lets say we wanted to manage, cpu, memory, disk and network bandwidth, for our application. So, I would create a group, and assign resources limits to this group, something like this. Keep in mind, the application knows nothing about these limits, this is happening outside of our application. So, any application that is assigned to this group, cannot use more than 80% of the cpu, 10 GB of memory, 80% of disk reads and writes, and finally, 80% of our network bandwidth. Once the group is created, you simple need to add your applications process ids, or pids, into a file, and your applications are automatically throttled. This can happen on the fly, without system reboots, you can also adjust these limits on the fly. I just wanted to mention, that our application will be allowed to spike outside these percentage limits, but if there is resource contention, our application will be throttled back to 80%.

这里写图片描述

Monitoring is also baked in from the start, so we can monitor resource consumption for any application that is assigned to this group, things like, cpu cycles used, the memory profile, IOPS and bytes written and read from our disks, along with network bandwidth used.

Lets jump back for a moment, and use a different example, lets say we have an environment, where we are hosting virtual machines, instead of just having two groups, one for our operating system, and one for our application, we can have many groups, one assigned to each virtual machine. For example, lets say we are worried about a virtual machine saturating the network link or disk IOPS, we can limit the impact by using control groups, which can be really handy.

这里写图片描述

cgroup相关概念

相关概念

  1. 任务(task)。在 cgroups 中,任务就是系统的一个进程;
  2. 控制族群(control group)。控制族群就是一组按照某种标准划分的进程。cgroups 中的资源控制都是以控制族群为单位实现。一个进程可以加入到某个控制族群,也从一个进程组迁移到另一个控制族群。一个进程组的进程可以使用 cgroups 以控制族群为单位分配的资源,同时受到 cgroups 以控制族群为单位设定的限制;
  3. 层级(hierarchy)。控制族群可以组织成 hierarchical 的形式,即一颗控制族群树。控制族群树上的子节点控制族群是父节点控制族群的孩子,继承父控制族群的特定的属性;
  4. 子系统(subsystem)。一个子系统就是一个资源控制器,比如 cpu 子系统就是控制 cpu 时间分配的一个控制器。子系统必须附加(attach)到一个层级上才能起作用,一个子系统附加到某个层级以后,这个层级上的所有控制族群都受到这个子系统的控制。

相互关系

  1. 每次在系统中创建新层级时,该系统中的所有任务都是那个层级的默认 cgroup(我们称之为 root cgroup ,此cgroup在创建层级时自动创建,后面在该层级中创建的cgroup都是此cgroup的后代)的初始成员。
  2. 一个子系统最多只能附加到一个层级。
  3. 一个层级可以附加多个子系统
  4. 一个任务可以是多个cgroup的成员,但是这些cgroup必须在不同的层级。
  5. 系统中的进程(任务)创建子进程(任务)时,该子任务自动成为其父进程所在 cgroup 的成员。然后可根据需要将该子任务移动到不同的 cgroup 中,但开始时它总是继承其父任务的cgroup。

上图所示的 cgroup 层级关系显示,CPU 和 Memory 两个子系统有自己独立的层级系统,而又通过 Task Group 取得关联关系。

cgroups子系统介绍

  • blkio –这个子系统为块设备设定输入/输出限制,比如物理设备(磁盘,固态硬盘,USB 等等)。
  • cpu –这个子系统使用调度程序提供对CPU 的cgroup 任务访问。
  • cpuacct –这个子系统自动生成cgroup 中任务所使用的CPU 报告。
  • cpuset –这个子系统为cgroup 中的任务分配独立CPU(在多核系统)和内存节点。
  • devices –这个子系统可允许或者拒绝cgroup 中的任务访问设备。
  • freezer –这个子系统挂起或者恢复cgroup 中的任务。
  • memory –这个子系统设定cgroup 中任务使用的内存限制,并自动生成由那些任务使用的内存资源报告。
  • net_cls –这个子系统使用等级识别符(classid)标记网络数据包,可允许Linux 流量控制程序(tc)识别从具体cgroup 中生成的数据包。
  • ns –名称空间子系统。

cgroups的使用

这个教程是在ubuntu下使用cgroups,限制进程的内存大小,从而验证cgroup的功能。

安装cgroup

sudo apt-get install cgroup-bin

编写测试程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/** mem-limit.c **/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {

int i;
char *p;

// intro message
printf("Starting ...\n");

// loop 50 times, try and consume 50 MB of memory
for (i = 0; i < 50; ++i) {

// failure to allocate memory?
if ((p = malloc(1<<20)) == NULL) {
printf("Malloc failed at %d MB\n", i);
return 0;
}

// take memory and tell user where we are at
memset(p, 0, (1<<20));
printf("Allocated %d to %d MB\n", i, i+1);
getchar();

}

// exit message and return
printf("Done!\n");
return 0;

}

测试程序的核心是循环部分,每次循环都向内存申请1MB的空间,总共循环50次。

gcc mem-limit.c -o mem-limit

建立cgroup

mkdir /sys/fs/cgroup/memory/test

限制进程的内存大小为5MB。

echo 5242880 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes

运行测试程序

将进程加入到cgroup中。
下面的命令需要root用户运行,否则会报错!

cgexec -g memory:test ./mem-limit

从这两张图中可以看到,当进程向内存申请8MB空间时,swap空间大小为3MB,当进程向内存申请9MB空间时,swap空间大小为4MB。正是由于cgroup对于进程内存的限制,所以才会用到swap空间,这也验证了cgroup的作用达到了。
由于篇幅的限制,只介绍了cgroup中内存子系统的应用实例,若想尝试其他子系统,可查询官方文档


参考资料:

  1. csdn jesseyoung
  2. IBM 周明耀
  3. 王喆锋 Linux cgroups 详解
  4. sysadmincasts
  5. kernel Documentation
  6. redhat
  7. linux-kongress
  8. Control Groups - Official Ubuntu Documentation
  9. cnblogs 轩脉刃de刀光剑影