| 
 | NeoBio API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Object | +--neobio.alignment.Trie
This class implements a trie, or a digital search tree. A trie is a multiway tree (each node can have multiple children) that represents a set of strings.
Each node contains data encapsulated in an object instance. Each edge spells out a character and each path from the root represents a string described by the characters labelling the traversed edges. Moreover, for each string represented, there is a unique path from the root.
The trie of the following example represents the strings 'a', 'd', 'b', 'ac', 'ba', 'be', 'bd', 'bad' and 'bae'.
      [0]
     --+--
    /  |  \
  a/  d|   \b
 [1]  [2]  [4]
  |       --+--
  |      /  |  \
 c|    a/  e|  d\
 [3]  [5]  [6]  [9]
     --+--
    /     \
  d/      e\
 [7]       [8]
 
It is easy to see that strings with common prefixes will branch off from each other at the first distinguishing character. This feature makes the trie a good data structure to identify and represent phrases of a text such as the ones induced by the Lempel-Ziv familiy of compression algorithms. For instance, the LZ78 version parses the text into phrases, where each phrase is the longest matching phrase seen previously plus one character.
In this implementation, each node is actually an instance of this class. To build a trie, one must first create the root using the public constructor:
 Trie root = new Trie (some_object);
 
Here some_object contains any relevant information encapsulated in an
 object instance. Typically, that's the only moment the public constructor is used. From
 now on, all new nodes will be added as a new child of one existing node using the
 add method:
 new_node = any_node.add (some_object, character);
 
Here character is the character that will label the edge from
 any_node to new_node. Note that this transition must not
 already exist, otherwise an exception is raised.
 
To find the longest prefix of a given string, we follow a path from the root down
 the tree, character by character, with the spellDown method:
 next_node = root;
 while (next_node != null)
 {
     current_node = next_node;
     char c = get next character from somewhere
     next_node = current_node.spellDown (c);
 }
 
spellDown follows the edge out of current_node labelled by
 the character c and returns the next node. If there is no such a path, it
 returns null.
To retrieve the information stored at any node, simply use the getData
 method.
In fact, there are many ways to implement a trie. To avoid wasting space with multiple pointers at each node, this implementation uses an approach with a linked list of siblings. Each node actually contains a pointer to one of its children and a pointer to one of its siblings only. Together with the pointers, each node also stores the character that labels the edge to the pointed node.
 
 [0]
  |
 a|  d     b
 [1]---[2]---[4]
  |           |
 c|          a|  e     d
 [3]         [5]---[6]---[9]
              |
             d|  e
             [7]---[8]
 
In this way, a trie is similar to a binary tree. Although this implementation is
 more efficient in terms of space, the search for a label with a given character leaving
 a node n is no more constant but proportional to the number of children of
 n. In the previous example, it is necessary to traverse three edges to
 reach node 9 from node 4 with character d.
This class is used by the FactorSequence to build a linked list of factors of a sequence in a LZ78 fashion, i.e. where each factor is the longest factor previously seen plus one character.
FactorSequence| Field Summary | |
| protected  java.lang.Object | dataThis node's stored data. | 
| protected  Trie | siblingA pointer to this node's next sibling. | 
| protected  Trie | sonA pointer to the first of this node's children. | 
| protected  char | to_siblingThe character that labels the edge from this node to the sibling pointer by sibling. | 
| protected  char | to_sonThe character that labels the edge from this node to the child node pointer by son. | 
| Constructor Summary | |
| Trie(java.lang.Object data)Creates a new trie node with the specified data. | |
| Method Summary | |
|  Trie | add(java.lang.Object data,
       char c)Adds a new child to this node. | 
| protected  Trie | addSibling(java.lang.Object data,
                     char c)Adds a sibling to this node. | 
|  java.lang.Object | getData()Returns the data associated with this node. | 
|  Trie | spellDown(char c)Follows a path from this node to one of its children by spelling the character supplied as an argument. | 
| protected  Trie | spellRight(char c)Follows a path from this node to one of its sibling by spelling the character supplied as an argument. | 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
protected Trie son
protected char to_son
son.
protected Trie sibling
protected char to_sibling
sibling.
protected java.lang.Object data
| Constructor Detail | 
public Trie(java.lang.Object data)
add method.
data - the data that will be associated with the new node| Method Detail | 
public java.lang.Object getData()
public Trie add(java.lang.Object data,
                char c)
data argument, and the edge from this node to the new node will be
 labelled by the character argument. If this node already have an edge labelled with
 this character, an exception is raised. Otherwise, the new node created and
 returned.
 If this node have no child, a new node is created straight away. Otherwise, the task is assigned to its first child that will add the new node as a sibling.
data - the data that will be associated with the new nodec - the character that will label the edge from this node to the new node
java.lang.IllegalStateException - if this node already have an edge labelled by
 c
protected Trie addSibling(java.lang.Object data,
                          char c)
data argument, and the edge from this node to the new node will be
 labelled by the character argument. If this node already have a sibling with this
 character, an exception is raised. Otherwise, the new node is created and returned.
 If this node have no direct sibling, a new node is created straight away. Otherwise, the task is assigned to its next sibling.
data - the data that will be associated with the new nodec - the character that will label the edge from this node to the new node
java.lang.IllegalStateException - if this node already have an edge labelled by
 cpublic Trie spellDown(char c)
null is returned.
 Otherwise, the reached child node is returned.
 If this node's direct child is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c - the character that labels the path to be followed to this node's child
cprotected Trie spellRight(char c)
null is returned.
 Otherwise, the reached sibling node is returned.
 If this node's direct sibling is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c - the character that labels the path to be followed to the sibling
c| 
 |  | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||