# Structural Recursion

Last time, we discussed the concept of recursion, where an abstraction
uses itself as part of its implementation, and we saw its application
to procedural abstractions. We turn our attention now to *structural
recursion*, where we use recursion in defining the representation of
an abstract data type.

## Recursive Lists

The data representation of a linked list is an example of structural
recursion: the `Node`

type uses itself in its own representation:

```
struct Node {
int datum;
Node *next;
};
```

This representation satisfies the requirements for a recursive abstraction:

The empty list is the base case, represented by a null pointer.

A non-empty list is a recursive case; it can be subdivided into the first node and the rest of the list, which represents a list in its own right.

Independent of its representation, a linked list can actually be defined recursively as either an empty list, or a datum followed by a smaller list.

Given the recursive definition of a list, it is natural to process a list with a recursive function. The base case of the recursive function will be the minimum-size list allowed by the function, and larger lists will be handled in the recursive case. As an example, the following is a recursive function to compute the length of a list:

```
// REQUIRES: node represents a valid list
// EFFECTS: Computes the length of the list that starts at the
// given node.
int length(const Node *list) {
if (list == nullptr) { // empty list
return 0;
} else { // non-empty list
return 1 + length(list->next); // list->next is a smaller list
}
}
```

The length of an empty list is 0. The length of a non-empty list is
one more than the length of the rest of the list; the elements in the
list consist of the initial datum and the elements in the rest of the
list. We use the `length()`

function itself as an abstraction to
compute the number of elements in the remainder of the list, taking
the recursive leap of faith that it will compute the right answer.

As another example, consider the problem of finding the maximum
element in a list. Unlike for `length()`

, the minimal list is not an
empty one, since an empty list has no elements in it. Instead, the
minimum required size is a list with a single datum, in which case the
maximum element is just that lone datum, constituting our base case.
For a larger list, we can break it down recursively as follows:

Find the maximum element in the rest of the list. This element is at least as large as any other element in the rest of the list.

Compare the first element to the max of the remainder. The larger of the two is transitively at least as large as the elements in the rest of the list.

The following implements this algorithm:

```
// REQUIRES: node represents a valid, non-empty list
// EFFECTS: Returns the maximum element in the list that starts at
// the given node.
int list_max(const Node *list) {
if (list->next == nullptr) { // list has only one element
return list->datum;
} else { // list has more than one element
return std::max(list->datum, // compare first datum to
list_max(list->next)); // max of rest of list
}
}
```

The base case is a list with one element. Such a list has an empty
`next`

list, so we check for that and return the list’s lone datum.
The recursive case computes the max of the rest of the list and then
uses `std::max()`

to determine the maximum of that item and the
first item in the list. As always, we take the recursive leap of
faith, assuming that the recursive call to `list_max()`

computes the
right answer for the smaller list.

## Trees

A list is a linear-recursive data structure: each non-empty list is
subdivided into a datum and a single smaller list. A *tree* is a data
structure that is, naturally, tree recursive. A non-empty tree is
subdivided into a datum and several smaller trees. In this course, we
only consider *binary trees*, where non-empty trees are subdivided
into exactly two smaller trees.

The term *tree* stems [1] from the fact that its branching structure
resembles that of a botanical tree. Terminology with respect to tree
data structures borrows from both botanical and family trees.

The

*root*is the node that originates the branching structure. In our diagrams, the root is pictured at the top of the tree.A non-empty tree consists of a

*parent*node and two*child*nodes. For a binary tree, there is a*left child*and a*right child*. Nodes that have the same parent are*siblings*.A node whose children are all empty is a

*leaf*.The size of a tree is the number of elements it contains.

The

*height*of a tree is the number of levels at which it has elements. Equivalently, it is the length of the longest path from the root to a leaf node.

Algorithms on trees are often written as tree-recursive functions, so that the recursive case makes more than one recursive call. The general strategy is to directly compute the result for the smallest tree allowed by the function, constituting the base case. The recursive case makes recursive calls to compute the results for the left and right children, then combines those results with the datum at the root to compute the answer for the whole tree.

As an example, the following algorithm computes the size of a tree:

The size of an empty tree is zero.

The size of a non-empty tree is the size of the left child, plus the size of the right child, plus one for the root datum.

Before we can implement this algorithm in code, we need a data
representation. As with a list, we use a `Node`

struct, but it now
has a datum and pointers to left and right `Node`

s:

```
struct Node {
int datum;
Node *left;
Node *right;
};
```

Like a list, we use a null pointer to represent an empty tree.

We can now implement the `size()`

function:

```
// REQUIRES: node represents a valid tree
// EFFECTS: Returns the number of elements in the tree represented
// by the given node.
int size(const Node *tree) {
if (!tree) { // empty tree
return 0;
} else { // non-empty tree
return 1 + size(tree->left) + size(tree->right);
}
}
```

As with other recursive functions, we take the recursive leap of faith
that `size()`

computes the right answer on smaller trees.

The height of a tree is the number of levels it contains. An empty tree contains no levels, so its height is zero. For a non-empty tree, we exploit the alternate definition of height, that it is the length of the longest path from root to leaf. The longest such patch is just one node longer than the longest path in the child subtrees, since the root adds one additional node to the part of the path contained in a child. Thus, we compute the height as follows:

```
// REQUIRES: node represents a valid tree
// EFFECTS: Returns the height of the tree represented by the given
// node.
int height(const Node *tree) {
if (!tree) { // empty tree
return 0;
} else { // non-empty tree
return 1 + std::max(height(tree->left),
height(tree->right));
}
}
```

We use `std::max()`

to obtain the longer path from the two child
trees, then add one to the result to account for the root node.

### Tree Traversals

The following function prints every element in a tree to standard output:

```
// REQUIRES: node represents a valid tree
// MODIFIES: cout
// EFFECTS: Prints each element in the given tree to standard out,
// with each element followed by a space.
void print(const Node *tree) {
if (tree) { // non-empty tree
cout << tree->datum << " ";
print(tree->left);
print(tree->right);
}
}
```

This algorithm processes a root datum before processing any of the
data in the children. It thus implements a *preorder traversal*. For
the tree in Figure 82, it prints the elements in the
order `6 4 1 7 2 9 5`

.

Moving the print statement results in alternate orderings. The
following implements an *inorder traversal*:

```
if (tree) { // non-empty tree
print(tree->left);
cout << tree->datum << " ";
print(tree->right);
}
```

The data in the left child is processed first, then the root datum,
then the data in the right child. For the tree with height 5, it
prints `1 4 6 2 5 9 7`

.

A *postorder* traversal processes the root datum after the data in the
children:

```
if (tree) { // non-empty tree
print(tree->left);
print(tree->right);
cout << tree->datum << " ";
}
```

For the same tree, the postorder traversal prints `1 4 5 9 2 7 6`

.

The structure of a binary tree can be uniquely determined by knowing any two of these orderings.

## Binary Search Trees (BSTs)

A *binary search tree (BST)* is a binary tree whose elements are
stored in an order that maintains a sorting invariant. Specifically, a
binary search tree is either:

empty, or

a root datum with two subtrees such that:

The two subtrees are themselves binary search trees.

Every element in the left subtree is strictly less than the root datum.

Every element in the right subtree is strictly greater than the root datum. [2]

Our binary search trees do not permit duplicate elements, since we will be using them to build set and map abstractions. If a BST does permit duplicates, it will use a modified definition that allows items equal to the root in one of the subtrees.

Figure 84 shows an example of a binary search tree.

Every element in the left subtree is less that the root datum, and every element in the right subtree is greater than the root. The left and right subtrees meet the requirements for a binary search tree. Thus, the whole tree is a binary search tree.

Figure 85 depicts trees that do not meet the definition of a binary search tree.

In the tree on the left, the left subtree contains the element 7, which is not smaller than the root element 5. This violates the second condition in the recursive definition above of a BST. In the tree on the right, the right subtree is empty, which is a valid binary search tree. However, the left subtree is not a valid BST, since it does not meet the sorting invariant for a BST. Thus, the tree on the right is not a binary search tree.

A binary search tree is thus named because searching for an element can be done efficiently, in time proportional to the height of the tree rather than the size. A search algorithm need only recurse on one side of the tree at each level. For example, in searching for the element 2 in the BST in Figure 84, the element must be in the left subtree, since 2 is less than the root element 6. Within the left subtree, it must again be to the left, since 2 is less than 5. Within the next subtree, the 2 must be to the right of the 1, leading us to the actual location of the 2.

More generally, the following algorithm determines whether or not a BST contains a particular value:

If the tree is empty, it does not contain the value.

Otherwise, if the root datum is equal to the value, the tree contains the element.

If the value is less than the root element, it must be in the left subtree if it is in the tree, so we repeat the search on the left subtree.

Otherwise, the value is greater than the root element, so it is in the right subtree if it is in the tree at all. Thus, we repeat the search on the right subtree.

The first two cases above are base cases, since they directly compute an answer. The latter two are recursive cases, and we take the recursive leap of faith that the recursive calls will compute the correct result on the subtrees.

The algorithm leads to the following implementation on our tree representation:

```
// REQUIRES: node represents a valid binary search tree
// EFFECTS: Returns whether or not the given value is in the tree
// represented by node.
bool contains(const Node *node, int value) {
if (!node) { // empty tree
return false;
} else if (node->datum == value) { // non-empty tree, equal to root
return true;
} else if (value < node->datum) { // less than root
return contains(node->left, value);
} else { // greater than root
return contains(node->right, value);
}
}
```

This implementation is linear recursive, since at most one recursive
call is made by each invocation of `contains()`

. Furthermore, every
recursive call is a tail call, so the implementation is tail
recursive. This example illustrates that the body of a linear- or
tail-recursive function may contain more than one recursive call, as
long as at most one of those calls is actually made.

Let us consider another algorithm, that of finding the maximum element of a BST, which requires there to be at least one element in the tree. If there is only one element, then the lone, root element is the maximum. The root is also the maximum element when the right subtree is empty; everything in the left subtree is smaller than the root, making the root the largest item. On the other hand, when the right tree is not empty, every element in that subtree is larger than the root and everything in the left subtree. Then the largest element in the whole tree is the same as the largest element in the right subtree.

Summarizing this, we have the following algorithm:

If the right subtree is empty, then the root is the maximum.

Otherwise, the maximum item is the maximum element in the right subtree.

We implement the algorithm as follows:

```
// REQUIRES: node represents a valid non-empty binary search tree
// EFFECTS: Returns the maximum element in the tree represented by
// node.
int tree_max(const Node *node) {
if (!node->right) { // base case
return node->datum;
} else { // recursive case
return tree_max(node->right);
}
}
```

As with `contains()`

, `tree_max()`

is tail recursive, and it runs
in time proportional to the height of the tree.

### The `BinarySearchTree`

Interface

In the full linked-list definition we saw in Linked Lists, we defined a separate `IntList`

class to act as
the interface for the list, defining `Node`

as a member of that
class. We then generalized the element type, resulting in a `List`

class template. We follow the same strategy here by defining a
`BinarySearchTree`

class template as the interface for a BST.

```
template <typename T>
class BinarySearchTree {
public:
// EFFECTS: Constructs an empty BST.
BinarySearchTree();
// EFFECTS: Constructs a copy of the given tree.
BinarySearchTree(const BinarySearchTree &other);
// EFFECTS: Replaces the contents of this tree with a copy of the
// given tree.
BinarySearchTree & operator=(const BinarySearchTree &other);
// EFFECTS: Destructs this tree.
~BinarySearchTree();
// EFFECTS: Returns whether this tree is empty.
bool empty() const;
// EFFECTS: Returns the number of elements in this tree.
int size() const;
// EFFECTS: Returns whether the given item is contained in this
// tree.
bool contains(const T &value) const;
// REQUIRES: value is not in this tree
// EFFECTS: Inserts the given item into this tree.
void insert(const T &value);
private:
// Represents a single node of a tree.
struct Node {
T datum;
Node *left;
Node *right;
// INVARIANTS: left and right are either null or pointers to
// valid Nodes
};
// The root node of this tree.
Node *root;
// INVARIANTS: root is either null or a pointer to a valid Node
};
```

As with a list, we define `Node`

as a nested class of
`BinarySearchTree`

to encapsulate it within the latter ADT. Since it
is an implementation detail and not part of the BST interface, we
define `Node`

as a private member.

The `contains()`

member function differs from the one we defined
before; the member function just takes in a data item, while our
previous definition operates on a node as well. We define the latter
as a private helper function and call it with the root node:

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Observe that `contains_impl()`

does not refer to a `BinarySearchTree`

or any of its members. Thus, there is no need for it to have a `this`

pointer to a `BinarySearchTree`

object. We can declare the function
as a *static member function* to eliminate the `this`

pointer.

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
static bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Just like a static member variable is associated with a class rather than an individual object, a static member function is also not associated with an individual object, and it cannot refer to non-static member variables.

A public static member function can be called from outside the class using the scope-resolution operator, the same syntax for referring to a static member variable:

```
BinarySearchTree<int>::contains_impl(nullptr, -1);
// compile error because contains_impl() is not public
```

### BST-Based Set

Previously, we have seen array-based set implementations, one that used an unsorted array and another a sorted array. We can also implement a set using a binary search tree to store the data:

```
template <typename T>
class BSTSet {
public:
// EFFECTS: Inserts the given value into this set, if it is not
// already in the set.
void insert(const T &value) {
if (!elts.contains(value)) {
elts.insert(value);
}
}
// EFFECTS: Returns whether value is in this set.
bool contains(const T &value) const {
return elts.contains(value);
}
// EFFECTS: Returns the size of this set.
int size() const {
return elts.size();
}
private:
BinarySearchTree<T> elts;
};
```

If the underlying BST is *balanced*, meaning that each subtree within
the BST has close to the same number of elements in its left and right
child, then the height of the tree is in \(\textrm{O}(\log n)\),
where \(n\) is the size of the tree. Thus, the `contains()`

and
`insert()`

operations take logarithmic time, rather than the linear
time they would take on an unsorted set.

Unfortunately, our BST implementation does not guarantee that it will be balanced. In fact, inserting items in increasing order results in a maximally unbalanced tree as in Figure 86, resembling a list rather than a tree structure.

There are more complicated binary-search-tree implementations that do guarantee balance, but they are beyond the scope of this course.