Unix systems implement most interfaces between User Mode processes and hardware devices by means of system calls issued to the kernel.

1 POSIX APIs and System Calls

Let’s start by stressing the difference between an application programmer interface (API) and a system call. The former is a function definition that specifies how to obtain a given service, while the latter is an explicit request to the kernel made via a software interrupt.

Unix systems include several libraries of functions that provide APIs to programmers. Some of the APIs defined by the libc standard C library refer to wrapper routines(routines whose only purpose is to issue a system call). Usually, each system call has a corresponding wrapper routine, which defines the API that application programs should employ.

The converse is not true, by the way—an API does not necessarily correspond to a specific system call. First of all, the API could offer its services directly in User Mode. (For something abstract such as math functions, there may be no reason to make system calls.) Second, a single API function could make several system calls. Moreover,several API functions could make the same system call, but wrap extra functionality around it. For instance, in Linux, the malloc(), calloc(), and free() APIs are implemented in the libc library. The code in this library keeps track of the allocation and deallocation requests and uses the brk() system call to enlarge or shrink the process heap.

The POSIX standard refers to APIs and not to system calls.

2 System Call Handler and Service Routines

When a User Mode process invokes a system call, the CPU switches to Kernel Mode and starts the execution of a kernel function. In the 80 × 86 architecture a Linux system call can be invoked in two different ways. The net result of both methods, however, is a jump to an assembly language function called the system call handler.

Because the kernel implements many different system calls, the User Mode process must pass a parameter called the system call number to identify the required system call; the eax register is used by Linux for this purpose.

In the kernel, positive or 0 values denote a successful termination of the system call, while negative values denote an error condition.

The system call handler, which has a structure similar to that of the other exception handlers, performs the following operations:

  • Saves the contents of most registers in the Kernel Mode stack.
  • Handles the system call by invoking a corresponding C function called the system call service routine.
  • Exits from the handler: the registers are loaded with the values saved in the Kernel Mode stack, and the CPU is switched back from Kernel Mode to User Mode.

The name of the service routine associated with the xyz() system call is usually sys_ xyz(); there are, however, a few exceptions to this rule.

Figure 10-1 illustrates the relationships between the application program that invokes a system call, the corresponding wrapper routine, the system call handler, and the system call service routine.

To associate each system call number with its corresponding service routine, the kernel uses a system call dispatch table.

3 Entering and Exiting a System Call

Native applications can invoke a system call in two different ways:

  • By executing the int $0x80 assembly language instruction.
  • By executing the sysenter assembly language instruction.

Similarly, the kernel can exit from a system call—thus switching the CPU back to User Mode—in two ways:

  • By executing the iret assembly language instruction.
  • By executing the sysexit assembly language instruction.

The int assembly language instruction is inherently slow because it performs several consistency and security checks. The sysenter instruction, dubbed in Intel documentation as “Fast System Call,” provides a faster way to switch from User Mode to Kernel Mode.

4 Parameter Passing

Like ordinary functions, system calls often require some input/output parameters, which may consist of actual values (i.e., numbers), addresses of variables in the address space of the User Mode process, or even addresses of data structures including pointers to User Mode functions.

4.1 Verifying the Parameters

All system call parameters must be carefully checked before the kernel attempts to satisfy a user request.

4.2 Accessing the Process Address Space

System call service routines often need to read or write data contained in the process’s address space.

4.3 Generating the Exception Tables and the Fixup Code

5 Kernel Wrapper Routines

Although system calls are used mainly by User Mode processes, they can also be invoked by kernel threads, which cannot use library functions. To simplify the declarations of the corresponding wrapper routines, Linux defines a set of seven macros called _syscall0 through _syscall6.