Trie树的概念及其实现

Trie的概念

在计算机科学中，trie，又称前缀树或字典树，是一种有序树，用于保存关联数组，其中的键通常是字符串。与二叉查找树不同，键不是直接保存在节点中，而是由节点在树中的位置决定。一个节点的所有子孙都有相同的前缀，也就是这个节点对应的字符串，而根节点对应空字符串。一般情况下，不是所有的节点都有对应的值，只有叶子节点和部分内部节点所对应的键才有相关的值。

看到概念，也许读者会觉得有点抽象，博主搜到一篇大神写的博客，顿时觉得茅塞顿开（ps：博主给的参考链接可能访问不了，请fq）。这篇文章真是太棒了，原文是英文，博主就不翻译了，相信读者们的英语水平，故直接转到本篇博文中，希望对大家会有所帮助。

Trie implementation in C

To implement the kind of storage which stores strings as the search keys , there is a need to have special data structures which can store the strings efficiently and the searching of data based on the string keys is easier, efficient and faster. One such data structure is a tree based implementation called Trie.

Trie is a data structure which can be used to implement a dictionary kind of application. It provides all the functionality to insert a string, search a string and delete a string from the dictionary. The insertion and deletion operation takes O(n) time where n is the length of the string to be deleted or inserted.
Some of the application of tries involve web based search engines, URL completion in autocomplete feature, Spell checker etc.

** Structure of Trie(Specific to this implementation): **

The trie implemented here consists of nodes. Each node has these fields:

Key - Part of the string to be serached,inserted or deleted.
Value - The value associated with a string (e.g In a dictionary it could be the meaning of the word which we are searching)
Neighbour node address - It consists of the address of the neighbouring node at the same level.
Previous neighbour address - It consists of the address of the previous node at the same level.
Children node address - It consists of the address of the child nodes of the current node.
Parent node address - It consists of the address of the parent node of the current node.

The additional nodes like Parent and Previous nodes are added to this implementation for making the search, and deletions easier.

Here is a diagrammatical view of a trie nodes I have used in this implementation. The field key is not represented in the diagram due to symmetry purposes.

这里写图片描述

Let us consider an example to understand tries in detail.

Suppose we have to implement a database for the HR department of an organisation in which we have to store an employee’s name and their ages. There is an assumption for this example that there each employee’s name is unique.So there is a strange policy in this organisation that any new employee which has a name that already exists in the organisation, it would not hire that new employee.

Let’s use this hypothetical example just to understand how tries work.

Consider we have a new employee named Andrew with age 36. Lets populate our trie for “andrew”.

这里写图片描述

Now add “tina”.

这里写图片描述

Add “argo”.

这里写图片描述

Add “tim”.

这里写图片描述

Add “t”.

这里写图片描述

Add “amy”.

这里写图片描述

Add “aramis”.

这里写图片描述

This is the complete Trie with all the entries. Now let us try deleting the names. I am not capturing the trivial cases.

Lets try deleting Argo.

这里写图片描述

Delete Tina

这里写图片描述

Delete Andrew

这里写图片描述

There is also a video from IIT Delhi which explains the tries. Tries Explained.

The implementation for this Trie is given below. Please provide your suggestions to further improve the implementation.

typedef int trieVal_t;
typedef struct trieNode {
    char key;
    trieVal_t value;
    struct trieNode *next;
    struct trieNode *prev;
    struct trieNode *children;
    struct trieNode *parent;
} trieNode_t;
void TrieCreate(trieNode_t **root);
trieNode_t* TrieSearch(trieNode_t *root, const char *key);
void TrieAdd(trieNode_t **root, char *key, int data);
void TrieRemove(trieNode_t **root, char *key);
void TrieDestroy( trieNode_t* root );

/*trie.c*/
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>
trieNode_t *TrieCreateNode(char key, int data);
void TrieCreate(trieNode_t **root)
{
 *root = TrieCreateNode('\0', 0xffffffff);
}
trieNode_t *TrieCreateNode(char key, int data)
{
 trieNode_t *node = NULL;
 node = (trieNode_t *)malloc(sizeof(trieNode_t));
 if(NULL == node)
 {
  printf("Malloc failed\n");
  return node;
 }
 node->key = key;
 node->next = NULL;
 node->children = NULL;
 node->value = data;
 node->parent= NULL;
 node->prev= NULL;
 return node;
}
void TrieAdd(trieNode_t **root, char *key, int data)
{
 trieNode_t *pTrav = NULL;
 if(NULL == *root)
 {
  printf("NULL tree\n");
  return;
 }
#ifdef DEBUG
 printf("\nInserting key %s: \n",key);
#endif
 pTrav = (*root)->children;
 if(pTrav == NULL)
 {
  /*First Node*/
  for(pTrav = *root; *key; pTrav = pTrav->children)
  {
   pTrav->children = TrieCreateNode(*key, 0xffffffff);
   pTrav->children->parent = pTrav;
#ifdef DEBUG
   printf("Inserting: [%c]\n",pTrav->children->key);
#endif
   key++;
  }
  pTrav->children = TrieCreateNode('\0', data);
  pTrav->children->parent = pTrav;
#ifdef DEBUG
  printf("Inserting: [%c]\n",pTrav->children->key);
#endif
  return;
 }
 if(TrieSearch(pTrav, key))
 {
  printf("Duplicate!\n");
  return;
 }
 while(*key != '\0')
 {
  if(*key == pTrav->key)
  {
   key++;
#ifdef DEBUG
   printf("Traversing child: [%c]\n",pTrav->children->key);
#endif
   pTrav = pTrav->children;
  }
  else
   break;
 }
 while(pTrav->next)
 {
  if(*key == pTrav->next->key)
  {
   key++;
   TrieAdd(&(pTrav->next), key, data);
   return;
  }
  pTrav = pTrav->next;
 }
 if(*key)
 {
  pTrav->next = TrieCreateNode(*key, 0xffffffff);
 }
 else
 {
  pTrav->next = TrieCreateNode(*key, data);
 }
 pTrav->next->parent = pTrav->parent;
 pTrav->next->prev = pTrav;
#ifdef DEBUG
 printf("Inserting [%c] as neighbour of [%c] \n",pTrav->next->key, pTrav->key);
#endif
 if(!(*key))
  return;
 key++;
 for(pTrav = pTrav->next; *key; pTrav = pTrav->children)
 {
  pTrav->children = TrieCreateNode(*key, 0xffffffff);
  pTrav->children->parent = pTrav;
#ifdef DEBUG
  printf("Inserting: [%c]\n",pTrav->children->key);
#endif
  key++;
 }
 pTrav->children = TrieCreateNode('\0', data);
 pTrav->children->parent = pTrav;
#ifdef DEBUG
 printf("Inserting: [%c]\n",pTrav->children->key);
#endif
 return;
}
trieNode_t* TrieSearch(trieNode_t *root, const char *key)
{
 trieNode_t *level = root;
 trieNode_t *pPtr = NULL;
 int lvl=0;
 while(1)
 {
  trieNode_t *found = NULL;
  trieNode_t *curr;
  for (curr = level; curr != NULL; curr = curr->next)
  {
   if (curr->key == *key)
   {
    found = curr;
    lvl++;
    break;
   }
  }
  if (found == NULL)
   return NULL;
  if (*key == '\0')
  {
   pPtr = curr;
   return pPtr;
  }
  level = found->children;
  key++;
 }
}
void TrieRemove(trieNode_t **root, char *key)
{
 trieNode_t *tPtr = NULL;
 trieNode_t *tmp = NULL;
 if(NULL == *root || NULL == key)
  return;
 tPtr = TrieSearch((*root)->children, key);
 if(NULL == tPtr)
 {
  printf("Key [%s] not found in trie\n", key);
  return;
 }
#ifdef DEBUG
 printf("Deleting key [%s] from trie\n", key);
#endif
 while(1)
 {
  if( tPtr->prev && tPtr->next)
  {
   tmp = tPtr;
   tPtr->next->prev = tPtr->prev;
   tPtr->prev->next = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else if(tPtr->prev && !(tPtr->next))
  {
   tmp = tPtr;
   tPtr->prev->next = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else if(!(tPtr->prev) && tPtr->next)
  {
   tmp = tPtr;
   tPtr->next->prev = NULL;
   tPtr->parent->children = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else
  {
   tmp = tPtr;
   tPtr = tPtr->parent;
   tPtr->children = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
 }
#ifdef DEBUG
 printf("Deleted key [%s] from trie\n", key);
#endif
}
void TrieDestroy( trieNode_t* root )
{
 trieNode_t *tPtr = root;
 trieNode_t *tmp = root;
    while(tPtr)
 {
  while(tPtr->children)
   tPtr = tPtr->children;
  if( tPtr->prev && tPtr->next)
  {
   tmp = tPtr;
   tPtr->next->prev = tPtr->prev;
   tPtr->prev->next = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else if(tPtr->prev && !(tPtr->next))
  {
   tmp = tPtr;
   tPtr->prev->next = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else if(!(tPtr->prev) && tPtr->next)
  {
   tmp = tPtr;
   tPtr->parent->children = tPtr->next;
   tPtr->next->prev = NULL;
   tPtr = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else
  {
   tmp = tPtr;
   if(tPtr->parent == NULL)
   {
    /*Root*/
    free(tmp);
    return;
   }
   tPtr = tPtr->parent;
   tPtr->children = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
 }
}

/*triedriver.c*/
/*
 * To Compile : gcc -o trie trie.c triedriver.c
 * To run: ./trie
 */
#include <stdio.h>
#include <stdlib.h>
#include "trie.h"
int main()
{
    trieNode_t *root = NULL;
    trieNode_t* ret = NULL;
    printf("Trie Example\n");
    /*Create a trie*/
    TrieCreate(&root);
    TrieAdd(&root, "andrew", 1);
    TrieAdd(&root, "tina", 2);
    TrieAdd(&root, "argo", 3);
    TrieAdd(&root, "timor", 5);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "tim", 6);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "ti", 6);
    TrieAdd(&root, "amy", 7);
    TrieAdd(&root, "aramis", 8);
     ret = TrieSearch(root->children, "andrew");
    if(NULL != ret)
      printf("value %d\n", ret->value);
    else
      printf("not found\n");
    /*Destroy the trie*/
    TrieDestroy(root);
    return 0;
}

In order to print the debug messages, use -DDEBUG while compiling with gcc:

gcc trie.c triedriver.c -DDEBUG -o trie

参考资料：

文章目录

Trie的概念

Trie implementation in C