Ternary Search Tree

At first glance, a TernarySearchTree may not seem to gain much over a Binary Search Tree [BinaryTree]: it just seems to use more space and algorithmic complexity. In the BST you just stick at the current node if the n'th character matched, although you can't advance n right away when you go left/right. Likewise, it may seem to simply be a StringTrie where the data structure to look up individual letters is a BinaryTree, rather than an array.

In a TST, each comparison is much simpler than in a BST, and more complex than in a naive StringTrie. Each node requires only a constant amount of storage (one character, for a TST containing strings of characters), and that comparison is a constant-time operation (in most cases, a single op). The recursion is truly trivial. Compare this to a binary search tree, where you may have to compare multiple characters at any node, each node must contain an entire string, etc. Think of it this way: a BST containing the set { "foo", "bar", "foobar" } must contain those three strings, as such (standard sorting, given insertion order):

foo / \ bar foobar

The corresponding TST (which is, less sensitive to insertion order) is (with $ as the end-of-string character, less than any other character):

        f
       /|
      b o
      | |
      a o         // This comment is here because
      | |\       //  trailing backslashes
      r $ b     //   can confuse wiki.
      |   |
      $   a
          |
          r
          |
          $

Many more nodes, but

the nodes are much simpler,
the comparisons are much simpler,
redundant prefixes are automatically eliminated,
no double-dereferences are necessary.

For tree-based sets, you have three common options. The two most often considered are the Binary Search Tree and StringTries (the type of thing where to find strings of alphabetic characters, you start at a root node representing the string of length zero, which has 26 children, which all have 26 children, which...). They have different space and time properties of course, and there are places where each belongs. The TST is a good tradeoff, often offering better performance than the BST on certain data sets, without the ballooning storage space of a full trie.

TSTs broadly offer better performance than a BST, and takes up less space that a Trie. One can refer to that as an average. Or one can refer to it as a solution to two problems with the BST and the StringTrie.

References:

http://www.codeproject.com/cs/algorithms/tst.asp
- A reasonably good paper on TSTs, with quite good example code in CeeSharp.
http://www.octavian.org/cs/software.html
- A great example of ternary search tries written in CeeLanguage
http://www.javaworld.com/javaworld/jw-02-2001/jw-0216-ternary.html
- Nice diagrams in this article.
http://citeseer.ist.psu.edu/bentley97fast.html
- "Fast Algorithms for Sorting and Searching Strings" by Bentley & Sedgewick (1997).

Contributors: AdamBerger, GuyMurphy, KarlKnechtel, WilliamUnderwood, and others

CategoryDataStructure