2-3 Trees as Search Trees

Balanced search trees are found in many flavors and have been the major data structure used for structures called dictionaries, where insertion, deletion, and searching must take place. In 1970, John Hopcroft introduced 2-3 search trees as an improvement on existing balanced binary trees. Later they were generalized to B-trees by Bayer and McCreight. B-trees were then simplified by Bayer to form red-black trees.

The rules for 2-3 search trees are rather elementary.

- All data appears at the leaves.
- Data elements are ordered from left (minimum) to right (maximum).
- Every path through the tree is the same length.
- Interior nodes have two or three subtrees.

Thus if there are n records (or data items) stored in the leaves of a 2-3 search tree, then the tree is between log_{3}n and log_{2}n in height.

Interior nodes hold no records, but contain some information about the keys of elements stored in their subtrees. Consider figure 1 below.

Figure 1 - A 2-3 Tree Interior Node

The interior node may contain two numbers or keys:

rlow = smallest key in the R (right) subtree

if it has three subtrees, and one key (mlow) if it has two subtrees.

An example of an entire 2-3 tree is provided as figure 2.

Figure 2 - A 2-3 Search Tree

The information stored at the nodes and the rule that the elements are in order from left to right, provide us with a mechanism for searching the tree. In the above example, if we were looking for 89, we would check the subtree minima stored at the root and go to the right subtree since the minimum value in that subtree is 65 and we are looking for something bigger. At the top node in the right subtree we look in the middle subtree since its minimum value is 89 - exactly what we are looking for. Note that if we were searching for a 94, we would traverse the same path since 89 is smaller than the target and there is no right subtree to go to at that level.

Thus searching involves traversing the tree by traveling through the only subtrees that can contain the target element. In other words, we go to the rightmost subtree whose minimum value is less than the target. There is an exception - if we are looking for something that is smaller than any element in the tree. In that case we merely travel down the leftmost branches of the tree.

Here in figure 3 are the specifications for the problem of traversing the tree looking for a target value.

Figure 3 - Traversal Specifications

For our journey down the path to the leaf most likely to have the same value as our target x we shall use a loop. As we mentioned above, we wish to stay in the rightmost subtree that contains elements no greater than x. In our algorithm, the top, or root of this subtree will be pointed to by the pointer p. Consider the loop invariant shown in figure 4.

Figure 4 - Traversal Loop Invariant

It should be evident that if we traverse the tree keeping the invariant true, we must eventually arrive at a leaf containing the target element - if the target is indeed in the search tree.

But, we need some conventions in order to specify the algorithm precisely. We shall use the interior node variables intuitively as functions as well as values. For example: mlow(p) is the value of mlow at the interior node pointed to by p. We also use tree parts such as leftkid(p), middlekid(p), and rightkid(p) as methods that return pointers to the appropriate subtrees of that rooted at p.

The major portion of the search algorithm is given as Figure 5.

Figure 5 - Searching Algorithm

Proving correctness involves two things. First, we must be convinced that the algorithm will eventually halt. This is not difficult if we note that at every stage, p is set to a node further down the tree and thus must eventually reach a leaf since the tree is finite.

Next, we must show that the loop is correct, that is, executing it takes us from the precondition to the postcondition. This involves a proof by induction on the number of times the while-loop is executed. Verifying the loop in this algorithm consists of three parts:

** Entry:**
the invariant is true when we enter the loop. This should be obvious when we recall that p points to the root of the entire tree.

** Execution:**
the invariant remains true as we execute the loop. If p is the root of a subtree of which obeys the invariant, the if statement assigns p to its rightmost subtree whose minimum value is not greater than x.

** Exit:**
the postcondition is true upon exit from the loop. If the invariant is true, and we are at a leaf, then the postcondition is true since it is the same as the invariant applied to a leaf.

It is easy now to see that if x is in the tree, it is at the leaf pointed to by p. We now swiftly employ this location algorithm to develop the entire search or find algorithm presented below in figure 6.

Figure 6 - Search Algorithm

and note that if x is in the tree, this algorithm finds it.

Since all we do is traverse from the root to the leaf, the complexity of this algorithm is the height of the tree: somewhere between log_{3}n and log_{2}n for a tree with n leaves. This is O(log n).

Now let us develop the methods necessary to insert a node in a 2-3 tree. Examine the 2-3 tree below in figure 7 and note where nodes containing the values 5 and 35 should appear in the collection of leaves.

Figure 7 - A 2-3 Search Tree With Leaves to Insert

If we search for a 5 in the tree, the locate routine returns a value of p pointing to the leaf containing the 3. Inserting the 5 is rather simple. We take the parent of p (i.e., the interior node holding the <mlow, rlow> pair <8, ->) and as illustrated below in figure 8, just

- make the leaf holding 8 the right child,
- set rlow to 8,
- attach a new leaf with 5 in it as middle child, and
- set mlow to 5.

Inserting a 35 is quite another matter. When we search for 35, the locate routine ends up at the leaf containing the 26. This means that 35 must go just to the right of it. But, since the parent of 26 already has three children, we must take these plus the new leaf and divide them between two nodes. This is done in Figure 8.

Figure 8 - A 2-3 Search Tree With an Interior Node to Insert

A new interior node was created and the leaves containing 35 and 44 were connected to it. All interior node values have been set properly. We must now attach the new node.

One very important fact should be noted at this point.

.We need not change any interior node values at nodes other than the parents of the nodes we are attaching

This is because nodes higher in the tree have interior values (mlow and rlow) which come from the __least__ element in the leftmost subtree of the parent interior node. For example in figure 8 the root holds values 17 and 65. These do not occur at any other level.

Let us continue with our insertion problem. We must now insert the new interior node in our tree. We know several things about it.

- It holds the pair <44, ->
- It should be inserted to the right of the node containing <26, ->
- The smallest leaf in its tree is 35

Since the parent of <26, -> has three children, we must split it as we did before and then connect the new node (in figure 9 below this holds the pair <65, ->) into the tree. Note that its smallest leaf contains the 35.

Another aspect surfaces here. Since the node that was split was the root, we must create a new root and set its mlow value to 35. The final result appears as figure 9.

Figure 9 - The Final 2-3 Search Tree

Let us recap. If a new node is to be inserted as the child of a node with two children, everything is fairly straightforward. When the proposed parent of the insertee has three children already, things are not so simple. A new node must be created and the four nodes in question (the three children and the insertee) are attached to them.

If we always insert nodes to the right of another node, there are just three cases of this type. These are enumerated in figure 10.

Figure 10- Insertion Cases for Nodes with 3 Children

This does, however bring up the special case when one seeks to insert a leaf that holds a value smaller than anything in the tree. Here we merely swap values with the leftmost leaf and attach as before. Figure 11 illustrates this.

Figure 11 - Dealing with a Minimum Value

Now to create the insert routine. We first run the locate routine and find the leaf that is to the immediate left of the value we wish to insert, unless our insertee is smaller than anything in the tree. Then we construct a leaf, place the element to be inserted in it and attach the leaf to the tree.

We shall need several new routines for our 2-3 trees. The first is the function createleaf(x) which returns the location of a new leaf node containing x and the second is attach(t, p, q, value(q)) which attaches the leaf located at q to the 2-3 tree t. Examine the insert routine of figure 12.** **

Figure 12 - Insert Algorithm

Note that the insert routine took care of the case where x was less than anything in the tree (as shown in figure 11). The attach routine which will be developed below must be designed to attach subtrees and thus works with both interior and leaf nodes. Figure 13 contains the specifications for the attach routine. Note that the precondition follows from the locate routine's postcondition.

Figure 13 - Attach Specifications

Of special interest is the fact that the subtree rooted at q should go just to the right of that rooted at p in order to add it to t and maintain t as a 2-3 search tree. Thus we wish to connect q to p's parent.

First, though, we must ask if p points to the root of t. If so, then we are forced to construct a new root and attach both p and q to it. We also need new method at this point to do the work. In the algorithm of figure 14, the routine connect(p, r, position) does the actual connecting of p's tree to r at the left, middle, or right. The diagram at the right of figure 14 shows the result of attaching everything to the new root.** **

Figure 14 - Inserting at the Root

If p has a parent, then this parent has either two or three subtrees under the rules for 2-3 search trees. Figure 15 presents the algorithm for attaching a new node to one that has two subtrees. At the right are the initial configurations for the two cases that occur when this happens.

Figure 15 - Inserting With Two Subtrees

The configurations which occur when the parent of p has three children are enumerated in Figure 10. The parent must be split and the portion on the right is attached to the tree.

First we set things up by creating a new interior node to which we shall attach two of the subtrees of the parent of p.

Figure 16 - Creating an Interior Node

Then we take the cases from right to left. First is the situation where the subtree rooted at q must be inserted to the right of the subtrees of s, the parent of the subtree rooted at p. Note that we remember the smallest value in the new tree rooted at r.

Figure 17 - New Node on Right

In the remaining two cases, q's tree will be inserted on the left of the right child of s, the parent of p. So, we can immediately move this right subtree over to the new node (r) and proceed.

Figure 18 - Attaching to New Node

At this stage, note that the subtree rooted at v has been attached to the new node as shown in figure 19. Also, the rlow value of s is now the mlow value for r. Now we consider the first case, when p is the center child of s and q falls between the largest and second largest of the children of s.

Figure 19 - New Node Second from Right

And, finally we come to the last case, where p's tree is the leftmost subtree of s.

Figure 20 - New Node Second from Left

After all of the subtrees have been connected to r, we must remove any evidence of s's old right subtree and attach r to the parent of s. This is done in figure 21.

Figure 21 - Finishing the Connection

Showing correctness involves tracing through the algorithm and verifying that the new leaf has been added to the tree and that it remains a 2-3 search tree. Since the attach routine is recursive, this involves verifying that its precondition is true at the end when it is called to attach the new interior node to the tree.

Complexity of insertion is rather straightforward. It is the sum of that of the locate and attach routines. The locate routine requires O(logn) steps and the attach routine attaches a subtree one taller in height each time it is called. (By the way, this fact also insures that the process halts.) This means that its complexity is at worst the height of the tree. Thus inserting an element is of complexity O(logn).