15.1 B-trees

It is important to note that the "B" in the name B-tree does NOT stand for "Binary" but for "Balanced." Confusing the two is a common beginners' mistake. The reader should also be warned that there are other kinds of "balanced" trees than B-trees.

15.1.1 B-trees Defined

The trees considered so far have always had one (potential) data items in each node, and each node had the same number of (potential) descendents. By contrast, B-trees may have more than one data item in a node, and the number of descendents of a node depends on whether the node is a root or not and how large the tree happens to be. B-trees are wider than binary trees, and not as deep. They are automatically kept balanced by the use of carefully constructed insertion and deletion algorithms so that the height of all the leaf elements is the same. This assists in such tasks as searching for data. Each B-tree has a number associated with it, called its order, that determines the maximum amount of data in a node.

A B-tree of order n has - a root node containing from 1 to 2n elements - from n to 2n elements in all other nodes - from 0 to k+1 descendents for a node having k elements - all leaves at the same level.

Conceptually, each node contains a number k of data items as well as the linkage to k+1 other nodes. It could be thought of as a sequence

p0, d1, p1, d2, p2, d3, ... dn, pn, ... d2n, p2n

where each pi is pointer to a successor node and each di is a data item. Only the first k data positions and k+1 pointer positions are in use at a given time. Observe that if all 2n data positions are occupied, there are 2n+1 descendent nodes. Several implementations are possible. One could have, for instance:

 CONST
  min = order;
  max = 2*order;
TYPE
  range = [1 .. max];
  BTree = POINTER TO BNode;
  NodeItem = 
    RECORD
      data : ItemType;
      rPoint : BTree;
    END;
  BTreeNode = 
    RECORD
      lPoint : BTree;  (* the zeroth pointer; left of first item *)
      data : ARRAY Range OF NodeItem;
      lastUsed : Range;
    END;

or, still using arrays:

TYPE
  dRange = [1 .. max];
  pRange = [1 .. max+1];
  BTree = POINTER TO BNode;
  BTreeNode = 
    RECORD
      successor : ARRAY Range OF BTree;
      data : ARRAY Range OF ItemType;
      lastUsed : Range;
    END;

Alternately, each node could be constructed as a linked list of pointers and data.

In order to enter the data into a B-tree, one uses the same "left-is-less and right-is-greater" rule as for binary search trees. Moreover, the data in each node is kept sorted from lowest to highest. The B-tree is kept balanced by splitting nodes when they become full. To illustrate, suppose a B-tree of order 2 (maximum data items per node is four) has cardinal data arriving in the order:

15, 4, 6, 12, 11, 17, 20, 30, 31, 5, 16, 37

The first four items are sorted into the root node until it looks like:

No successor pointers are shown yet, because none are needed. The arrival of the next item causes the middle one of the five to be passed up a level (occupying a new root) and the node with the other four to be split in two. Everything is arranged with the "left-is-less and right-is-greater" rule and the tree now looks like this:

The next three items (17, 20, and 30) all go into the second node, and on the arrival of the number 30, it splits as well, promoting the middle item up to its parent. Since there is room for it there, a new root need not be created. If there is not room for the promoted item, a split may take place on that level and an item (not necessarily the same one) promoted up one more level. Such splits and promotions either stop at a level where there is room, or propagate all the way to the root, and are either entered there or cause a new root to be created.

The last three items (31, 5, 16, and 37) contribute to the filling of the nodes on the second level, but do not cause any more splits.

To maintain the shallow structure of the B-tree, deletion of data items must proceed conversely. If a node with more than n items has one deleted, the structure remains the same. However, if a deletion would cause the number of data items in a node to become fewer than the minimum n (the order) then the node is combined with the node on its left or right and the parent data item separating them is pulled down. If the combined number items is exactly 2n, the deletion is finished. If it is more, the node is split, causing a different item than before to be promoted back up. For instance, assuming that both the 15 and 16 were deleted, and the rule is "combine with the node containing the larger data if you can", the result would be:

On a further deletion of the items 37 and then 12, its node finds there are only n items in the node with greater data, so it borrows from the one with the lesser data instead, producing:

If 17 is deleted, its node cannot borrow from the node with the greater data or the lesser data (in that order), so it combines with the one with the lesser data and the result is:

If the last item in the root node is ever deleted, the two successor nodes are combined (and then split if necessary). If the result is a single node of 2n elements, it becomes the root.

15.1.2 Defining A Semi-Generic B-tree

Since in "real life" the data type to be entered will not likely be as simple as a cardinal, provision is made in the following for the easy replacement of the type to be entreed. The illustration above seems to indicate that inorder traversal is the only one worth using, so no provision is made for preorder and postorder traversals.

DEFINITION MODULE BTrees;

(******************
  Design by R. Sutcliffe copyright 1995-1996
  Initial Code by Gordon Tisher
  June 9, 1995; last modified June 23, 1995
  Modified by R. Sutcliffe 1996 10 11
  This module provides a B-Tree ADT.
 ******************)

FROM DataADT IMPORT
  KeyFieldType, DataType, ActionProc;

(* rename the imported types to the local ones. This makes things more generic and easier to change. *)
TYPE
  ItemType = DataType;
  KeyType = KeyFieldType;
  
CONST
  order = 2;

TYPE
  BTree;
  BTreeState = (allRight, empty, entreeFailed);

PROCEDURE Status (tree : BTree) : BTreeState;
(* Pre : t is a valid initialized BTree
   Post : The State of the tree is returned *)

PROCEDURE Init (VAR tree : BTree);
(* Allocates memory for a new tree sets state to allRight *)

PROCEDURE Destroy (VAR tree : BTree);
(* disposes the whole tree *)

PROCEDURE Add (VAR tree : BTree; data : ItemType);
(* Adds a new item to the tree. If successful sets state to allRight, else to entreeFailed *)
  
PROCEDURE Delete (VAR tree : BTree; key : KeyType);
(* deletes an item whose key is defined equal to _key_ from the tree. If successful sets state to allRight; if tree was empty sets state to empty *)
  
PROCEDURE Depth (tree : BTree) : CARDINAL;
(* returns the number of levels in the tree *)

PROCEDURE Search (tree : BTree; key : KeyType; VAR data : ItemType) : BOOLEAN;
(* if found, returns TRUE and _data_ returns item at that point  *)

PROCEDURE WriteTree (tree : BTree);
(* writes out the tree.  This is for testing purposes and likely would not be included in a finished product. *)

PROCEDURE Traverse (tree : BTree; Proc : ActionProc);
(* Pre : tree is a valid initialized BTree
   Post : the nodes are traversed inorder and Proc is performed on each data item. *)
     
END BTrees.

Contents