Trie的概念

在计算机科学中,trie,又称前缀树或字典树,是一种有序树,用于保存关联数组,其中的键通常是字符串。与二叉查找树不同,键不是直接保存在节点中,而是由节点在树中的位置决定。一个节点的所有子孙都有相同的前缀,也就是这个节点对应的字符串,而根节点对应空字符串。一般情况下,不是所有的节点都有对应的值,只有叶子节点和部分内部节点所对应的键才有相关的值。

看到概念,也许读者会觉得有点抽象,博主搜到一篇大神写的博客,顿时觉得茅塞顿开(ps:博主给的参考链接可能访问不了,请fq)。这篇文章真是太棒了,原文是英文,博主就不翻译了,相信读者们的英语水平,故直接转到本篇博文中,希望对大家会有所帮助。

Trie implementation in C

To implement the kind of storage which stores strings as the search keys , there is a need to have special data structures which can store the strings efficiently and the searching of data based on the string keys is easier, efficient and faster. One such data structure is a tree based implementation called Trie.

Trie is a data structure which can be used to implement a dictionary kind of application. It provides all the functionality to insert a string, search a string and delete a string from the dictionary. The insertion and deletion operation takes O(n) time where n is the length of the string to be deleted or inserted.
Some of the application of tries involve web based search engines, URL completion in autocomplete feature, Spell checker etc.

Structure of Trie(Specific to this implementation):

The trie implemented here consists of nodes. Each node has these fields:

  1. Key - Part of the string to be serached,inserted or deleted.

  2. Value - The value associated with a string (e.g In a dictionary it could be the meaning of the word which we are searching)

  3. Neighbour node address - It consists of the address of the neighbouring node at the same level.

  4. Previous neighbour address - It consists of the address of the previous node at the same level.

  5. Children node address - It consists of the address of the child nodes of the current node.

  6. Parent node address - It consists of the address of the parent node of the current node.

The additional nodes like Parent and Previous nodes are added to this implementation for making the search, and deletions easier.

Here is a diagrammatical view of a trie nodes I have used in this implementation. The field key is not represented in the diagram due to symmetry purposes.

这里写图片描述

Let us consider an example to understand tries in detail.

Suppose we have to implement a database for the HR department of an organisation in which we have to store an employee’s name and their ages. There is an assumption for this example that there each employee’s name is unique.So there is a strange policy in this organisation that any new employee which has a name that already exists in the organisation, it would not hire that new employee.

Let’s use this hypothetical example just to understand how tries work.

  • Consider we have a new employee named Andrew with age 36. Lets populate our trie for “andrew”.

这里写图片描述

  • Now add “tina”.

这里写图片描述

  • Add “argo”.

这里写图片描述

  • Add “tim”.

这里写图片描述

  • Add “t”.

这里写图片描述

  • Add “amy”.

这里写图片描述

  • Add “aramis”.

这里写图片描述

This is the complete Trie with all the entries. Now let us try deleting the names. I am not capturing the trivial cases.

  • Lets try deleting Argo.

这里写图片描述

  • Delete Tina

这里写图片描述

  • Delete Andrew

这里写图片描述

There is also a video from IIT Delhi which explains the tries. Tries Explained.

The implementation for this Trie is given below. Please provide your suggestions to further improve the implementation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
typedef int trieVal_t;
typedef struct trieNode {
char key;
trieVal_t value;
struct trieNode *next;
struct trieNode *prev;
struct trieNode *children;
struct trieNode *parent;
} trieNode_t;
void TrieCreate(trieNode_t **root);
trieNode_t* TrieSearch(trieNode_t *root, const char *key);
void TrieAdd(trieNode_t **root, char *key, int data);
void TrieRemove(trieNode_t **root, char *key);
void TrieDestroy( trieNode_t* root );
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
/*trie.c*/
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>
trieNode_t *TrieCreateNode(char key, int data);
void TrieCreate(trieNode_t **root)
{
*root = TrieCreateNode('\0', 0xffffffff);
}
trieNode_t *TrieCreateNode(char key, int data)
{
trieNode_t *node = NULL;
node = (trieNode_t *)malloc(sizeof(trieNode_t));
if(NULL == node)
{
printf("Malloc failed\n");
return node;
}
node->key = key;
node->next = NULL;
node->children = NULL;
node->value = data;
node->parent= NULL;
node->prev= NULL;
return node;
}
void TrieAdd(trieNode_t **root, char *key, int data)
{
trieNode_t *pTrav = NULL;
if(NULL == *root)
{
printf("NULL tree\n");
return;
}
#ifdef DEBUG
printf("\nInserting key %s: \n",key);
#endif
pTrav = (*root)->children;
if(pTrav == NULL)
{
/*First Node*/
for(pTrav = *root; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('\0', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
return;
}
if(TrieSearch(pTrav, key))
{
printf("Duplicate!\n");
return;
}
while(*key != '\0')
{
if(*key == pTrav->key)
{
key++;
#ifdef DEBUG
printf("Traversing child: [%c]\n",pTrav->children->key);
#endif
pTrav = pTrav->children;
}
else
break;
}
while(pTrav->next)
{
if(*key == pTrav->next->key)
{
key++;
TrieAdd(&(pTrav->next), key, data);
return;
}
pTrav = pTrav->next;
}
if(*key)
{
pTrav->next = TrieCreateNode(*key, 0xffffffff);
}
else
{
pTrav->next = TrieCreateNode(*key, data);
}
pTrav->next->parent = pTrav->parent;
pTrav->next->prev = pTrav;
#ifdef DEBUG
printf("Inserting [%c] as neighbour of [%c] \n",pTrav->next->key, pTrav->key);
#endif
if(!(*key))
return;
key++;
for(pTrav = pTrav->next; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('\0', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
return;
}
trieNode_t* TrieSearch(trieNode_t *root, const char *key)
{
trieNode_t *level = root;
trieNode_t *pPtr = NULL;
int lvl=0;
while(1)
{
trieNode_t *found = NULL;
trieNode_t *curr;
for (curr = level; curr != NULL; curr = curr->next)
{
if (curr->key == *key)
{
found = curr;
lvl++;
break;
}
}
if (found == NULL)
return NULL;
if (*key == '\0')
{
pPtr = curr;
return pPtr;
}
level = found->children;
key++;
}
}
void TrieRemove(trieNode_t **root, char *key)
{
trieNode_t *tPtr = NULL;
trieNode_t *tmp = NULL;
if(NULL == *root || NULL == key)
return;
tPtr = TrieSearch((*root)->children, key);
if(NULL == tPtr)
{
printf("Key [%s] not found in trie\n", key);
return;
}
#ifdef DEBUG
printf("Deleting key [%s] from trie\n", key);
#endif
while(1)
{
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else if(tPtr->prev && !(tPtr->next))
{
tmp = tPtr;
tPtr->prev->next = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else if(!(tPtr->prev) && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = NULL;
tPtr->parent->children = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else
{
tmp = tPtr;
tPtr = tPtr->parent;
tPtr->children = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
}
#ifdef DEBUG
printf("Deleted key [%s] from trie\n", key);
#endif
}
void TrieDestroy( trieNode_t* root )
{
trieNode_t *tPtr = root;
trieNode_t *tmp = root;
while(tPtr)
{
while(tPtr->children)
tPtr = tPtr->children;
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else if(tPtr->prev && !(tPtr->next))
{
tmp = tPtr;
tPtr->prev->next = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else if(!(tPtr->prev) && tPtr->next)
{
tmp = tPtr;
tPtr->parent->children = tPtr->next;
tPtr->next->prev = NULL;
tPtr = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else
{
tmp = tPtr;
if(tPtr->parent == NULL)
{
/*Root*/
free(tmp);
return;
}
tPtr = tPtr->parent;
tPtr->children = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/*triedriver.c*/
/*
* To Compile : gcc -o trie trie.c triedriver.c
* To run: ./trie
*/
#include <stdio.h>
#include <stdlib.h>
#include "trie.h"
int main()
{
trieNode_t *root = NULL;
trieNode_t* ret = NULL;
printf("Trie Example\n");
/*Create a trie*/
TrieCreate(&root);
TrieAdd(&root, "andrew", 1);
TrieAdd(&root, "tina", 2);
TrieAdd(&root, "argo", 3);
TrieAdd(&root, "timor", 5);
TrieRemove(&root, "tim");
TrieAdd(&root, "tim", 6);
TrieRemove(&root, "tim");
TrieAdd(&root, "ti", 6);
TrieAdd(&root, "amy", 7);
TrieAdd(&root, "aramis", 8);
ret = TrieSearch(root->children, "andrew");
if(NULL != ret)
printf("value %d\n", ret->value);
else
printf("not found\n");
/*Destroy the trie*/
TrieDestroy(root);
return 0;
}

In order to print the debug messages, use -DDEBUG while compiling with gcc:

gcc trie.c triedriver.c -DDEBUG -o trie


参考资料:

  1. 维基百科
  2. simplestcodings