A disk cache is a software mechanism that allows the system to keep in RAM some data that is normally stored on a disk, so that further accesses to that data can be sat- isfied quickly without accessing the disk.
The dentry cache, which stores dentry objects representing filesystem pathnames, and the inode cache, which stores inode objects representing disk inodes. The page cache, which is a disk cache working on whole pages of data.
The page cache is the main disk cache used by the Linux kernel. In most cases, the kernel refers to the page cache when reading from or writing to disk. New pages are added to the page cache to satisfy User Mode processes’s read requests. If the page is not already in the cache, a new entry is added to the cache and filled with the data read from the disk.
Kernel designers have implemented the page cache to fulfill two main requirements:
- Quickly locate a specific page containing data relative to a given owner.
- Keep track of how every page in the cache should be handled when reading or writing its content.
A page does not necessarily contain physically adjacent disk blocks, so it cannot be identified by a device number and a block number. Instead, a page in the page cache is identified by an owner and by an index within the owner’s data—usually, an inode and an offset inside the corresponding file.
The core data structure of the page cache is the
address_space object, a data structure embedded in the inode object that owns the page. Many pages in the cache may refer to the same owner, thus they may be linked to the same
address_space object. This object also establishes a link between the owner’s pages and a set of methods that operate on these pages.
page cache vs buffer cache.
Starting from stable version 2.4.10, the buffer cache does not really exist anymore. In fact, for reasons of efficiency, block buffers are no longer allocated individually; instead, they are stored in dedicated pages called “buffer pages,” which are kept in the page cache.
A buffer page is a page of data associated with additional descriptors called “buffer heads,” whose main purpose is to quickly locate the disk address of each individual block in the page. In fact, the chunks of data stored in a page belonging to the page cache are not necessarily adjacent on disk.
bio 相对 buffer_head 的好处有：bio 可以更方便的使用高端内存，因为它只与 page 打交道，并不直接使用地址。bio 可以表示 direct I/O。对向量形式的 I/O支持更好，防止 I/O 被打散。但是 buffer_head 还是需要的，它用于映射磁盘块到内存，因为 bio 中并没有包含 kernel 需要的 buffer 状态的成员以及一些其它信息。