# Structural Recursion¶

Last time, we discussed the concept of recursion, where an abstraction
uses itself as part of its implementation, and we saw its application
to procedural abstractions. We turn our attention now to *structural
recursion*, where we use recursion in defining the representation of
an abstract data type.

## Recursive Lists¶

The data representation of a linked list is an example of structural
recursion: the `Node`

type uses itself in its own representation:

```
struct Node {
int datum;
Node *next;
};
```

This representation satisfies the requirements for a recursive abstraction:

The empty list is the base case, represented by a null pointer.

A non-empty list is a recursive case; it can be subdivided into the first node and the rest of the list, which represents a list in its own right.

Independent of its representation, a linked list can actually be defined recursively as either an empty list, or a datum followed by a smaller list.

Given the recursive definition of a list, it is natural to process a list with a recursive function. The base case of the recursive function will be the minimum-size list allowed by the function, and larger lists will be handled in the recursive case. As an example, the following is a recursive function to compute the length of a list:

```
// REQUIRES: node represents a valid list
// EFFECTS: Computes the length of the list that starts at the
// given node.
int length(const Node *list) {
if (list == nullptr) { // empty list
return 0;
} else { // non-empty list
return 1 + length(list->next); // list->next is a smaller list
}
}
```

The length of an empty list is 0. The length of a non-empty list is
one more than the length of the rest of the list; the elements in the
list consist of the initial datum and the elements in the rest of the
list. We use the `length()`

function itself as an abstraction to
compute the number of elements in the remainder of the list, taking
the recursive leap of faith that it will compute the right answer.

As another example, consider the problem of finding the maximum
element in a list. Unlike for `length()`

, the minimal list is not an
empty one, since an empty list has no elements in it. Instead, the
minimum required size is a list with a single datum, in which case the
maximum element is just that lone datum, constituting our base case.
For a larger list, we can break it down recursively as follows:

Find the maximum element in the rest of the list. This element is at least as large as any other element in the rest of the list.

Compare the first element to the max of the remainder. The larger of the two is transitively at least as large as the elements in the rest of the list.

The following implements this algorithm:

```
// REQUIRES: node represents a valid, non-empty list
// EFFECTS: Returns the maximum element in the list that starts at
// the given node.
int list_max(const Node *list) {
if (list->next == nullptr) { // list has only one element
return list->datum;
} else { // list has more than one element
return std::max(list->datum, // compare first datum to
list_max(list->next)); // max of rest of lest
}
}
```

The base case is a list with one element. Such a list has an empty
`next`

list, so we check for that and return the list’s lone datum.
The recursive case computes the max of the rest of the list and then
uses `std::max()`

to determine the maximum of that item and the
first item in the list. As always, we take the recursive leap of
faith, assuming that the recursive call to `list_max()`

computes the
right answer for the smaller list.

## Trees¶

A list is a linear-recursive data structure: each non-empty list is
subdivided into a datum and a single smaller list. A *tree* is a data
structure that is, naturally, tree recursive. A non-empty tree is
subdivided into a datum and several smaller trees. In this course, we
only consider *binary trees*, where non-empty trees are subdivided
into exactly two smaller trees.

The term *tree* stems 1 from the fact that its branching structure
resembles that of a botanical tree. Terminology with respect to tree
data structures borrows from both botanical and family trees.

- 1
Pun intended.

The

*root*is the node that originates the branching structure. In our diagrams, the root is pictured at the top of the tree.A non-empty tree consists of a

*parent*node and two*child*nodes. For a binary tree, there is a*left child*and a*right child*. Nodes that have the same parent are*siblings*.A node whose children are all empty is a

*leaf*.The size of a tree is the number of elements it contains.

The

*height*of a tree is the number of levels at which it has elements. Equivalently, it is the length of the longest path from the root to a leaf node.

Algorithms on trees are often written as tree-recursive functions, so that the recursive case makes more than one recursive call. The general strategy is to directly compute the result for the smallest tree allowed by the function, constituting the base case. The recursive case makes recursive calls to compute the results for the left and right children, then combines those results with the datum at the root to compute the answer for the whole tree.

As an example, the following algorithm computes the size of a tree:

The size of an empty tree is zero.

The size of a non-empty tree is the size of the left child, plus the size of the right child, plus one for the root datum.

Before we can implement this algorithm in code, we need a data
representation. As with a list, we use a `Node`

struct, but it now
has a datum and pointers to left and right `Node`

s:

```
struct Node {
int datum;
Node *left;
Node *right;
};
```

Like a list, we use a null pointer to represent an empty tree.

We can now implement the `size()`

function:

```
// REQUIRES: node represents a valid tree
// EFFECTS: Returns the number of elements in the tree represented
// by the given node.
int size(const Node *tree) {
if (!tree) { // empty tree
return 0;
} else { // non-empty tree
return 1 + size(tree->left) + size(tree->right);
}
}
```

As with other recursive functions, we take the recursive leap of faith
that `size()`

computes the right answer on smaller trees.

The height of a tree is the number of levels it contains. An empty tree contains no levels, so its height is zero. For a non-empty tree, we exploit the alternate definition of height, that it is the length of the longest path from root to leaf. The longest such patch is just one node longer than the longest path in the child subtrees, since the root adds one additional node to the part of the path contained in a child. Thus, we compute the height as follows:

```
// REQUIRES: node represents a valid tree
// EFFECTS: Returns the height of the tree represented by the given
// node.
int height(const Node *tree) {
if (!tree) { // empty tree
return 0;
} else { // non-empty tree
return 1 + std::max(height(tree->left),
height(tree->right));
}
}
```

We use `std::max()`

to obtain the longer path from the two child
trees, then add one to the result to account for the root node.

### Tree Traversals¶

The following function prints every element in a tree to standard output:

```
// REQUIRES: node represents a valid tree
// MODIFIES: cout
// EFFECTS: Prints each element in the given tree to standard out,
// with each element followed by a space.
void print(const Node *tree) {
if (tree) { // non-empty tree
cout << tree->datum << " ";
print(tree->left);
print(tree->right);
}
}
```

This algorithm processes a root datum before processing any of the
data in the children. It thus implements a *preorder traversal*. For
the tree in Figure 84, it prints the elements in the
order `6 4 1 7 2 9 5`

.

Moving the print statement results in alternate orderings. The
following implements an *inorder traversal*:

```
if (tree) { // non-empty tree
print(tree->left);
cout << tree->datum << " ";
print(tree->right);
}
```

The data in the left child is processed first, then the root datum,
then the data in the right child. For the tree with height 5, it
prints `1 4 6 2 5 9 7`

.

A *postorder* traversal processes the root datum after the data in the
children:

```
if (tree) { // non-empty tree
print(tree->left);
print(tree->right);
cout << tree->datum << " ";
}
```

For the same tree, the postorder traversal prints `1 4 5 9 2 7 6`

.

The structure of a binary tree can be uniquely determined by knowing any two of these orderings.

## Binary Search Trees (BSTs)¶

A *binary search tree (BST)* is a binary tree whose elements are
stored in an order that maintains a sorting invariant. Specifically, a
binary search tree is either:

empty, or

a root datum with two subtrees such that:

The two subtrees are themselves binary search trees.

Every element in the left subtree is strictly less than the root datum.

Every element in the right subtree is strictly greater than the root datum. 2

- 2
Our binary search trees do not permit duplicate elements, since we will be using them to build set and map abstractions. If a BST does permit duplicates, it will use a modified definition that allows items equal to the root in one of the subtrees.

Figure 86 shows an example of a binary search tree.

Every element in the left subtree is less that the root datum, and every element in the right subtree is greater than the root. The left and right subtrees meet the requirements for a binary search tree. Thus, the whole tree is a binary search tree.

Figure 87 depicts trees that do not meet the definition of a binary search tree.

In the tree on the left, the left subtree contains the element 7, which is not smaller than the root element 5. This violates the second condition in the recursive definition above of a BST. In the tree on the right, the right subtree is empty, which is a valid binary search tree. However, the left subtree is not a valid BST, since it does not meet the sorting invariant for a BST. Thus, the tree on the right is not a binary search tree.

A binary search tree is thus named because searching for an element can be done efficiently, in time proportional to the height of the tree rather than the size. A search algorithm need only recurse on one side of the tree at each level. For example, in searching for the element 2 in the BST in Figure 86, the element must be in the left subtree, since 2 is less than the root element 6. Within the left subtree, it must again be to the left, since 2 is less than 5. Within the next subtree, the 2 must be to the right of the 1, leading us to the actual location of the 2.

More generally, the following algorithm determines whether or not a BST contains a particular value:

If the tree is empty, it does not contain the value.

Otherwise, if the root datum is equal to the value, the tree contains the element.

If the value is less than the root element, it must be in the left subtree if it is in the tree, so we repeat the search on the left subtree.

Otherwise, the value is greater than the root element, so it is in the right subtree if it is in the tree at all. Thus, we repeat the search on the right subtree.

The first two cases above are base cases, since they directly compute an answer. The latter two are recursive cases, and we take the recursive leap of faith that the recursive calls will compute the correct result on the subtrees.

The algorithm leads to the following implementation on our tree representation:

```
// REQUIRES: node represents a valid binary search tree
// EFFECTS: Returns whether or not the given value is in the tree
// represented by node.
bool contains(const Node *node, int value) {
if (!node) { // empty tree
return false;
} else if (node->datum == value) { // non-empty tree, equal to root
return true;
} else if (value < node->datum) { // less than root
return contains(node->left, value);
} else { // greater than root
return contains(node->right, value);
}
}
```

This implementation is linear recursive, since at most one recursive
call is made by each invocation of `contains()`

. Furthermore, every
recursive call is a tail call, so the implementation is tail
recursive. This example illustrates that the body of a linear- or
tail-recursive function may contain more than one recursive call, as
long as at most one of those calls is actually made.

Let us consider another algorithm, that of finding the maximum element of a BST, which requires there to be at least one element in the tree. If there is only one element, then the lone, root element is the maximum. The root is also the maximum element when the right subtree is empty; everything in the left subtree is smaller than the root, making the root the largest item. On the other hand, when the right tree is not empty, every element in that subtree is larger than the root and everything in the left subtree. Then the largest element in the whole tree is the same as the largest element in the right subtree.

Summarizing this, we have the following algorithm:

If the right subtree is empty, then the root is the maximum.

Otherwise, the maximum item is the maximum element in the right subtree.

We implement the algorithm as follows:

```
// REQUIRES: node represents a valid non-empty binary search tree
// EFFECTS: Returns the maximum element in the tree represented by
// node.
int tree_max(const Node *node) {
if (!node->right) { // base case
return node->datum;
} else { // recursive case
return tree_max(node->right);
}
}
```

As with `contains()`

, `tree_max()`

is tail recursive, and it runs
in time proportional to the height of the tree.

### The `BinarySearchTree`

Interface¶

In the full linked-list definition we saw in Linked Lists, we defined a separate `IntList`

class to act as
the interface for the list, defining `Node`

as a member of that
class. We then generalized the element type, resulting in a `List`

class template. We follow the same strategy here by defining a
`BinarySearchTree`

class template as the interface for a BST.

```
template <typename T>
class BinarySearchTree {
public:
// EFFECTS: Constructs an empty BST.
BinarySearchTree();
// EFFECTS: Constructs a copy of the given tree.
BinarySearchTree(const BinarySearchTree &other);
// EFFECTS: Replaces the contents of this tree with a copy of the
// given tree.
BinarySearchTree & operator=(const BinarySearchTree &other);
// EFFECTS: Destructs this tree.
~BinarySearchTree();
// EFFECTS: Returns whether this tree is empty.
bool empty() const;
// EFFECTS: Returns the number of elements in this tree.
int size() const;
// EFFECTS: Returns whether the given item is contained in this
// tree.
bool contains(const T &value) const;
// REQUIRES: value is not in this tree
// EFFECTS: Inserts the given item into this tree.
void insert(const T &value);
private:
// Represents a single node of a tree.
struct Node {
T datum;
Node *left;
Node *right;
// INVARIANTS: left and right are either null or pointers to
// valid Nodes
};
// The root node of this tree.
Node *root;
// INVARIANTS: root is either null or a pointer to a valid Node
};
```

As with a list, we define `Node`

as a nested class of
`BinarySearchTree`

to encapsulate it within the latter ADT. Since it
is an implementation detail and not part of the BST interface, we
define `Node`

as a private member.

The `contains()`

member function differs from the one we defined
before; the member function just takes in a data item, while our
previous definition operates on a node as well. We define the latter
as a private helper function and call it with the root node:

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Observe that `contains_impl()`

does not refer to a `BinarySearchTree`

or any of its members. Thus, there is no need for it to have a `this`

pointer to a `BinarySearchTree`

object. We can declare the function
as a *static member function* to eliminate the `this`

pointer.

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
static bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Just like a static member variable is associated with a class rather than an individual object, a static member function is also not associated with an individual object, and it cannot refer to non-static member variables.

A public static member function can be called from outside the class using the scope-resolution operator, the same syntax for referring to a static member variable:

```
BinarySearchTree<int>::contains_impl(nullptr, -1);
// compile error because contains_impl() is not public
```

### BST-Based Set¶

Previously, we have seen array-based set implementations, one that used an unsorted array and another a sorted array. We can also implement a set using a binary search tree to store the data:

```
template <typename T>
class BSTSet {
public:
// EFFECTS: Inserts the given value into this set, if it is not
// already in the set.
void insert(const T &value) {
if (!elts.contains(value)) {
elts.insert(value);
}
}
// EFFECTS: Returns whether value is in this set.
bool contains(const T &value) const {
return elts.contains(value);
}
// EFFECTS: Returns the size of this set.
int size() const {
return elts.size();
}
private:
BinarySearchTree<T> elts;
};
```

If the underlying BST is *balanced*, meaning that each subtree within
the BST has close to the same number of elements in its left and right
child, then the height of the tree is in \(\textrm{O}(\log n)\),
where \(n\) is the size of the tree. Thus, the `contains()`

and
`insert()`

operations take logarithmic time, rather than the linear
time they would take on an unsorted set.

Unfortunately, our BST implementation does not guarantee that it will be balanced. In fact, inserting items in increasing order results in a maximally unbalanced tree as in Figure 88, resembling a list rather than a tree structure.

There are more complicated binary-search-tree implementations that do guarantee balance, but they are beyond the scope of this course.

## Maps and Pairs¶

A *map* is a data structure that maps *keys* to *values*. It is thus
an *associative* data structure, since it associates a value with each
key. As an example, we may want to associate exam scores with
individual test takers:

aliceywu |
100 |

akamil |
23 |

taligoro |
100 |

jjuett |
73 |

We can store these data in a standard-library map:

```
std::map<std::string, int> scores;
scores["aliceywu"] = 100;
scores["akamil"] = 23;
scores["taligoro"] = 100;
scores["jjuett"] = 73;
cout << scores["akamil"] << endl; // prints 23
cout << scores["aliceywu"] << endl; // prints 100
```

The `map`

class template has two required template parameters, the
first for the key type and the second for the value type. We are
mapping string IDs to integer scores, so we use a ```
map<string,
int>
```

. We can then use the subscript operator with a key to insert a
key-value pair into the map, or to look up the value associated with a
key.

A key feature of the standard-library map is that it is not erroneous
to look up a non-existent key. Instead, the map inserts and returns a
*value-initialized* datum for the key. For primitive types, value
initialization produces the zero value for the type. (This is distinct
from default initialization, which produces an undefined value.)

```
cout << scores["gmatute"] << endl; // prints 0
```

A map is similar to a set in that the keys in the map are unique; each key is associated with at most one value. However, the values themselves are not unique. The map above has two keys that are associated with the value 100.

Given the similarity with a set, we can also implement a map using a binary search tree. However, rather than using the key-value pair for ordering and uniqueness, we need to use just the key, and the value merely tags along for the ride, as shown in Figure 89.

To make this work, we need a heterogeneous data type for the `datum`

member of a node, so that it can store separate key and value items.
We can define our own struct or class, or we can use the `pair`

template in the standard `<utility>`

library.

```
std::pair<int, bool> p1; // default constructor value initializes both items
p1.first -= 5; // first now has value -5
p1.second = false;
std::pair<string, int> p2 = { "hello", 4 }; // explicit initialization
cout << p2.first << ": " << p2.second << endl; // prints hello: 4
```

A pair stores a first and second item, which can be of different
types. The `pair`

template is parameterized by the types of these
items.

We can use a pair and a BST to implement a map:

```
template <typename Key_type, typename Value_type>
class Map {
public:
// EFFECTS: Returns whether this map is empty.
bool empty() const;
// EFFECTS: Returns the number of key-value pairs in this map.
size_t size() const;
// EFFECTS: Returns the value associated with the given key. If
// the key is not in the map, inserts the key,
// associating it with a value-initialized object of type
// Value_type, and returns the newly inserted value.
Value_type& operator[](const Key_type& key);
private:
// Type alias for a key-value pair.
using Pair_type = std::pair<Key_type, Value_type>;
// Comparator that compares pairs by their key components.
class PairLess {
public:
bool operator()(...) const;
};
// Data representation.
BinarySearchTree<Pair_type, PairLess> entries;
};
```

For this to work, we need a `BinarySearchTree`

that can take a
custom comparator, to compare key-value pairs by just the value
component. This comparator can be defaulted to `std::less`

, which
compares elements according to the `<`

operator:

```
template <typename T, typename Compare=std::less<T>>
class BinarySearchTree {
...
};
```

All comparisons within the tree now must use an object of the
`Compare`

type. We leave the details as an exercise for the reader.

### Iterating over a Map¶

The standard-library map defines an iterator that produces key-value pairs upon dereference. The following is an example that constructs a map, counting how many times each unique word occurs in a vector. It then iterates over the map to print out the counts.

```
void print_word_counts(const std::vector<std::string> &words) {
std::map<std::string, int> word_counts;
// Each time a word is seen, add 1 to its entry in the map
for (const auto &word : words) {
word_counts[word] += 1;
}
// Print out the results by iterating through the map.
for (const auto &key_value : word_counts) { // key-value pairs
const auto &word = key_value.first;
const auto &count = key_value.second;
cout << word << "occurred " << count << " times." << endl;
}
}
```

When incrementing a word’s count, we do not need to check whether the word is in the map; if not, the map automatically inserts the word into the map with a value-initialized count of zero, which we then increment to one.

We use range-based for loops to iterate over both the vector and the
map, declaring a type-deduced reference to each element. A reference
avoids making a copy, which is nontrivial for strings. The iteration
over the map produces key-value pairs (```
std::pair<std::string,
int>
```

), and we access the `first`

and `second`

members to obtain
the word and count, respectively.