linux kernel中HList和Hashtable

HList

先来直观的比较下普通链表和哈希链表(HList)：

1.1 普通链表

普通链表的表头和节点相同

1
2
3

struct list_head {  
    struct list_head *next, *prev;  
};

详情请参考内核数据结构之链表。

1.2 哈希链表

HList的资料结构定义在 include/linux/types.h 之中:

哈希链表头

1
2
3

struct hlist_head {  
    struct hlist_node *first;  
};

哈希链表节点

1
2
3

struct hlist_node {  
    struct hlist_node *next, **pprev;  
};

hlist_head 和 hlist_node 用于 hash table 中 bucket 的实际操作，具有相同 hash value 的节点会放在同一条 hlist 中。为了节省空间，hlist_head 只使用一个 first 指针指向 hlist_node，没有指向尾节点的指针。

hlist_node 有两个指针，但是需要注意的是 pprev 是指向指针的指针，它指向的是前一个节点的 next指针。
因为这时候表头(hlist_head)和节点(hlist_node)的数据结构不同。如果使用struct hlist_node *prev，只适用于前一个为节点的情况，而不适用于前一个为表头的情况。如果每次操作都要考虑指针类型转换，会是一件麻烦的事情。所以，我们需要一种统一的操作，而不用考虑前一个元素是节点还是表头。pprev指向前一个元素的next指针，不用管前一个元素是节点还是表头。当我们需要操作前一个元素(节点或表头)，可以统一使用*(node->pprev)来访问和修改前一元素的next(或first)指针。

原理图如下：

1.3 相关操作

HList 的操作与 List 一样定义在: include/linux/list.h ，以 hlist_ 开头

Hashtable

2.1 定义

在 Linux kernel 3.7 之后采用由Sasha Levin设计的通用型 hash table (LWN: A generic hash table)，使用 DEFINE_HASHTABLE(name, bits)的macro来宣告 hash table:

name: the name of the hash table
bits: the number of bits of hash values

第二的参数 bits 比较特别，它代表的是 hash value 的有效位元数。若 bits = 3，hash value 的范围会是 0~7，若 bits = 10，hash value 的范围会是 0 ~ 1023。前者需要准备 8 个 buckets，后者则需要 1024 个 buckets，bits 参数会决定 hash table 的 buckets 的数量 (=2^bits)。hashtable实际上会以 hlist_head array 的方式来配置。

举例来说， DEFINE_HASHTABLE(htable, 3)会展开成:
struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };
这是一个有 8 个 buckets 的 hash table。
经由 hash function 将值映射到这 8 个 buckets 中，当冲突发生时，直接加到 hlist_head 后的串列中。

2.2 相关操作

hash table 相关的操作定义在 include/linux/hashtable.h

hash_init - initialize a hash table
hash_empty - check whether a hashtable is empty
hash_add - add an object to a hashtable
hash_del - remove an object from a hashtable
hash_for_each - iterate over a hashtable
hash_for_each_possible - iterate over all possible objects hashing to the same bucket

2.3 相关API

HLIST_HEAD_INIT: 静态初始化hlist_head
HLIST_HEAD: 静态初始化hlist_head
INIT_HLIST_HEAD: 动态初始化hlist_head
INIT_HLIST_NODE: 动态初始化hlist_node
hlist_unhashed: 判断hlist_node是否添加到hash链表中
hlist_empty: 判断hash链表是否为空
hlist_del: 在hash链表中删除一个节点
hlist_del_init: 在hash链表中删除一个节点
hlist_add_head: 在hash链表头添加一个节点
hlist_add_before: 在指定节点之前添加一个节点
hlist_add_behind: 在指定节点之后添加一个节点
hlist_add_fake
hlist_fake
hlist_is_singular_node: 判断hlist是否只有一个节点
hlist_move_list: 将一个hash链表从一个hlist_head移动到另外一个hlist_head中
hlist_entry: 根据hlist_node找到其外层结构体
hlist_entry_safe: 同上
hlist_for_each: 遍历hash链表
hlist_for_each_safe: 同上
hlist_for_each_entry: 遍历hash链表
hlist_for_each_entry_safe: 同上
hlist_for_each_entry_continue: 从当前节点之后遍历hash链表
hlist_for_each_entry_from: 从当前节点开始遍历hash链表

2.4 hash function

hashtable 预设的 hash function 定义在 include/linux/hash.h ，有 32 和 64 两个版本，如下:

/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define GOLDEN_RATIO_PRIME_32 0x9e370001UL

static inline u32 hash_32(u32 val, unsigned int bits)
{
    /* On some cpus multiply is faster, on others gcc will do shifts */
    u32 hash = val * GOLDEN_RATIO_PRIME_32;

    /* High bits are more random, so use them. */
    return hash >> (32 - bits);
}

/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL

static __always_inline u64 hash_64(u64 val, unsigned int bits)
{
    u64 hash = val;

#if defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) && BITS_PER_LONG == 64
    hash = hash * GOLDEN_RATIO_PRIME_64;
#else
    /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
    u64 n = hash;
    n <<= 18;
    hash -= n;
    n <<= 33;
    hash -= n;
    n <<= 3;
    hash += n;
    n <<= 3;
    hash -= n;
    n <<= 4;
    hash += n;
    n <<= 2;
    hash += n;
#endif

    /* High bits are more random, so use them. */
    return hash >> (64 - bits);
}

2.5 Hashtable example

下面用例子来说明 hashtable 的使用:

struct object
{
    int id;
    char name[16];

    struct hlist_node node;
};

void hashtable_example()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    struct object obj9 = {
        .id = 9,
        .name = "obj9",
    };
    hash_add(htable, &obj9.node, obj9.id);
}

经过上面的操作之后，hash table 呈现如下:
obj1 和 obj9 的 hash value 冲突，放入同一个 bucket 的串列中

以 hash_for_each_possible()寻访 bucket 内所有的节点(hlist_node)，因为 hash value 可能会有冲突的关系，同一个 bucket 内可能会有不同 key value 的节点，所以需要检查是不是要查找的 key value。

int key = 1;
struct object* obj;
hash_for_each_possible(htable, obj, node, key) {
    if(obj->id == key) {
        printf("key=%d => %s\n", key, obj->name);
    }
}

函数的详细使用方法可以参考hashtable_example.c

#include <stdio.h>
#include "list.h"
#include "hash.h"
#include "hashtable.h"

struct object
{
    int id;
    char name[16];

    struct hlist_node node;
};

//------------------------------------------------------------------------------
void print_hash_values()
{
    u8 bits = 3;
    int num_buckets = (1<<bits);
    int i = 0;
    for(i=0; i<num_buckets; ++i) {
        printf("hash_min(%d)=%d\n", i, hash_min(i, bits));
    }
}

void hashtable_example()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    struct object obj9 = {
        .id = 9,
        .name = "obj9",
    };
    hash_add(htable, &obj9.node, obj9.id);


    // find
    int key = 1;
    struct object* obj;
    hash_for_each_possible(htable, obj, node, key) {
        if(obj->id == key) {
            printf("key=%d => %s\n", key, obj->name);
        }
    }
}

void hash_del_example()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    struct object obj9 = {
        .id = 9,
        .name = "obj9",
    };
    hash_add(htable, &obj9.node, obj9.id);


    // find and del
    int key = 1;
    struct object* obj;
    hash_for_each_possible(htable, obj, node, key) {
        if(obj->id == key) {
            hash_del(&obj->node);
        }
    }


    int bkt;
    struct object* cur;
    hash_for_each(htable, bkt, cur, node) {
        printf("bucket[%d]=> %s\n", bkt, cur->name);
    }
}


void hashtable_show_buckets()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    struct object obj9 = {
        .id = 9,
        .name = "obj9",
    };
    hash_add(htable, &obj9.node, obj9.id);


    // int bkt;
    // struct object* cur;
    // hash_for_each(htable, bkt, cur, node) {
    //     printf("bucket[%d]=> %s\n", bkt, cur->name);
    // }

    // show hashtable buckets
    int i=0;
    for (i = 0; i < HASH_SIZE(htable); ++i)
    {
        if (!hlist_empty(&htable[i]))
        {
            printf("bucket[%d]=> ", i);

            struct object* obj;
            hlist_for_each_entry(obj, &htable[i], node) {
                printf("%s, ", obj->name);
            }
            printf("\n");
        }
    }
}

// Every bucket in the hashtable is a linked list which will hold all objects
// that are hashed to the same bucket.
void hash_add_same_key_example()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 0,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 0,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 0,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    int bkt;
    struct object* cur;
    hash_for_each(htable, bkt, cur, node) {
        printf("bucket[%d]=> %s\n", bkt, cur->name);
    }
}

int main()
{
    hashtable_show_buckets();
    return 0;
}

参考资料:

文章目录