task work机制可以在内核中向指定的进程添加一些任务函数,这些任务函数会在进程返回用户态时执行,使用的是该进程的上下文。

本文部分转载自:Linux:task work 机制,内核代码的版本是v4.18。

1. definition

进程对象task_struct中有个字段用来存储这些待进行的任务列表头即task_works,这个结构体包含一个next指针和需要执行的函数指针。

1
2
3
4
5
6
7
8
9
/**
* struct callback_head - callback structure for use with RCU and task_work
* @next: next update requests in a list
* @func: actual update function to call after the grace period.
*/
struct callback_head {
struct callback_head *next;
void (*func)(struct callback_head *head);
};

2. task_work_add

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
static struct callback_head work_exited; /* all we need is ->next == NULL */

/**
* task_work_add - ask the @task to execute @work->func()
* @task: the task which should run the callback
* @work: the callback to run
* @notify: send the notification if true
*
* Queue @work for task_work_run() below and notify the @task if @notify.
* Fails if the @task is exiting/exited and thus it can't process this @work.
* Otherwise @work->func() will be called when the @task returns from kernel
* mode or exits.
*
* This is like the signal handler which runs in kernel mode, but it doesn't
* try to wake up the @task.
*
* Note: there is no ordering guarantee on works queued here.
*
* RETURNS:
* 0 if succeeds or -ESRCH.
*/
int
task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
{
struct callback_head *head;

do {
head = READ_ONCE(task->task_works);
if (unlikely(head == &work_exited))
return -ESRCH;
work->next = head;
} while (cmpxchg(&task->task_works, head, work) != head);

if (notify)
set_notify_resume(task);
return 0;
}

主要工作:

  1. 通过CAS以无锁的形式添加了一个链表元素。(新元素排在原有链表头部)
  2. set_notify_resume函数向指定的进程设置了一个_TIF_NOTIFY_RESUME标记。

3. task_work_run执行时机

3.1 with _TIF_NOTIFY_RESUME flag

3.1.1 exit_to_usermode_loop

1
2
3
4
5
6
7
8
9
static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
{
...
if (cached_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
rseq_handle_notify_resume(NULL, regs);
}
...

3.1.2 tracehook_notify_resume

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
* tracehook_notify_resume - report when about to return to user mode
* @regs: user-mode registers of @current task
*
* This is called when %TIF_NOTIFY_RESUME has been set. Now we are
* about to return to user mode, and the user state in @regs can be
* inspected or adjusted. The caller in arch code has cleared
* %TIF_NOTIFY_RESUME before the call. If the flag gets set again
* asynchronously, this will be called again before we return to
* user mode.
*
* Called without locks.
*/
static inline void tracehook_notify_resume(struct pt_regs *regs)
{
/*
* The caller just cleared TIF_NOTIFY_RESUME. This barrier
* pairs with task_work_add()->set_notify_resume() after
* hlist_add_head(task->task_works);
*/
smp_mb__after_atomic();
if (unlikely(current->task_works))
task_work_run();

mem_cgroup_handle_over_high();
}

在进程对象的task_works不为null的情况下才有任务需要执行。

3.2 without _TIF_NOTIFY_RESUME flag

  • get_signal执行task_work_run
1
2
3
4
5
6
7
8
9
// 执行task work机制中的work
// 这是和信号无关的机制,属于搭便车在ret_to_user时刻去执行的机制
int get_signal(struct ksignal *ksig)
{
...
if (unlikely(current->task_works))
task_work_run();
...
}

4. task_work_run

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
* task_work_run - execute the works added by task_work_add()
*
* Flush the pending works. Should be used by the core kernel code.
* Called before the task returns to the user-mode or stops, or when
* it exits. In the latter case task_work_add() can no longer add the
* new work after task_work_run() returns.
*/
void task_work_run(void)
{
struct task_struct *task = current;
struct callback_head *work, *head, *next;

for (;;) {
/*
* work->func() can do task_work_add(), do not set
* work_exited unless the list is empty.
*/
raw_spin_lock_irq(&task->pi_lock);
do {
work = READ_ONCE(task->task_works);
head = !work && (task->flags & PF_EXITING) ?
&work_exited : NULL;
} while (cmpxchg(&task->task_works, work, head) != work);
raw_spin_unlock_irq(&task->pi_lock);

if (!work)
break;

do {
next = work->next;
work->func(work);
work = next;
cond_resched();
} while (work);
}
}
  1. 通过CAS,以无锁的方式取得task_works链表
  2. 因为原链表是按元素添加到链表的时间逆序排列的(见task_work_add),先把链表反转一遍
  3. 反转链表后,遍历链表,执行各个元素的任务函数即work->func(work)

参考资料:

  1. Linux:task work 机制
  2. Linux Signal