# Binary Search Trees (BSTs)

Last time, we saw binary trees, which are a recursive data structure
that is either empty, or consists of an element and two subtrees. A
*binary search tree (BST)* is a binary tree whose elements are stored
in an order that maintains a sorting invariant. Specifically, a binary
search tree is either:

empty, or

a root datum with two subtrees such that:

The two subtrees are themselves binary search trees.

Every element in the left subtree is strictly less than the root datum.

Every element in the right subtree is strictly greater than the root datum. [1]

Figure 85 shows an example of a binary search tree.

Every element in the left subtree is less that the root datum, and every element in the right subtree is greater than the root. The left and right subtrees meet the requirements for a binary search tree. Thus, the whole tree is a binary search tree.

Figure 86 depicts trees that do not meet the definition of a binary search tree.

In the tree on the left, the left subtree contains the element 7, which is not smaller than the root element 5. This violates the second condition in the recursive definition above of a BST. In the tree on the right, the right subtree is empty, which is a valid binary search tree. However, the left subtree is not a valid BST, since it does not meet the sorting invariant for a BST. Thus, the tree on the right is not a binary search tree.

A binary search tree is thus named because searching for an element can be done efficiently, in time proportional to the height of the tree rather than the size. A search algorithm need only recurse on one side of the tree at each level. For example, in searching for the element 2 in the BST in Figure 85, the element must be in the left subtree, since 2 is less than the root element 6. Within the left subtree, it must again be to the left, since 2 is less than 5. Within the next subtree, the 2 must be to the right of the 1, leading us to the actual location of the 2.

More generally, the following algorithm determines whether or not a BST contains a particular value:

If the tree is empty, it does not contain the value.

Otherwise, if the root datum is equal to the value, the tree contains the element.

If the value is less than the root element, it must be in the left subtree if it is in the tree, so we repeat the search on the left subtree.

Otherwise, the value is greater than the root element, so it is in the right subtree if it is in the tree at all. Thus, we repeat the search on the right subtree.

The first two cases above are base cases, since they directly compute an answer. The latter two are recursive cases, and we take the recursive leap of faith that the recursive calls will compute the correct result on the subtrees.

The algorithm leads to the following implementation on our tree representation:

```
// REQUIRES: node represents a valid binary search tree
// EFFECTS: Returns whether or not the given value is in the tree
// represented by node.
bool contains(const Node *node, int value) {
if (!node) { // empty tree
return false;
} else if (node->datum == value) { // non-empty tree, equal to root
return true;
} else if (value < node->datum) { // less than root
return contains(node->left, value);
} else { // greater than root
return contains(node->right, value);
}
}
```

This implementation is linear recursive, since at most one recursive
call is made by each invocation of `contains()`

. Furthermore, every
recursive call is a tail call, so the implementation is tail
recursive. This example illustrates that the body of a linear- or
tail-recursive function may contain more than one recursive call, as
long as at most one of those calls is actually made.

Let us consider another algorithm, that of finding the maximum element of a BST, which requires there to be at least one element in the tree. If there is only one element, then the lone, root element is the maximum. The root is also the maximum element when the right subtree is empty; everything in the left subtree is smaller than the root, making the root the largest item. On the other hand, when the right tree is not empty, every element in that subtree is larger than the root and everything in the left subtree. Then the largest element in the whole tree is the same as the largest element in the right subtree.

Summarizing this, we have the following algorithm:

If the right subtree is empty, then the root is the maximum.

Otherwise, the maximum item is the maximum element in the right subtree.

We implement the algorithm as follows:

```
// REQUIRES: node represents a valid non-empty binary search tree
// EFFECTS: Returns the maximum element in the tree represented by
// node.
int tree_max(const Node *node) {
if (!node->right) { // base case
return node->datum;
} else { // recursive case
return tree_max(node->right);
}
}
```

As with `contains()`

, `tree_max()`

is tail recursive, and it runs
in time proportional to the height of the tree.

## The `BinarySearchTree`

Interface

In the full linked-list definition we saw in Linked Lists, we defined a separate `IntList`

class to act as
the interface for the list, defining `Node`

as a member of that
class. We then generalized the element type, resulting in a `List`

class template. We follow the same strategy here by defining a
`BinarySearchTree`

class template as the interface for a BST.

```
template <typename T>
class BinarySearchTree {
public:
// EFFECTS: Constructs an empty BST.
BinarySearchTree();
// EFFECTS: Constructs a copy of the given tree.
BinarySearchTree(const BinarySearchTree &other);
// EFFECTS: Replaces the contents of this tree with a copy of the
// given tree.
BinarySearchTree & operator=(const BinarySearchTree &other);
// EFFECTS: Destructs this tree.
~BinarySearchTree();
// EFFECTS: Returns whether this tree is empty.
bool empty() const;
// EFFECTS: Returns the number of elements in this tree.
int size() const;
// EFFECTS: Returns whether the given item is contained in this
// tree.
bool contains(const T &value) const;
// REQUIRES: value is not in this tree
// EFFECTS: Inserts the given item into this tree.
void insert(const T &value);
private:
// Represents a single node of a tree.
struct Node {
T datum;
Node *left;
Node *right;
// INVARIANTS: left and right are either null or pointers to
// valid Nodes
};
// The root node of this tree.
Node *root;
// INVARIANTS: root is either null or a pointer to a valid Node
};
```

As with a list, we define `Node`

as a nested class of
`BinarySearchTree`

to encapsulate it within the latter ADT. Since it
is an implementation detail and not part of the BST interface, we
define `Node`

as a private member.

The `contains()`

member function differs from the one we defined
before; the member function just takes in a data item, while our
previous definition operates on a node as well. We define the latter
as a private helper function and call it with the root node:

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Observe that `contains_impl()`

does not refer to a `BinarySearchTree`

or any of its members. Thus, there is no need for it to have a `this`

pointer to a `BinarySearchTree`

object. We can declare the function
as a *static member function* to eliminate the `this`

pointer.

```
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
static bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
```

Just like a static member variable is associated with a class rather than an individual object, a static member function is also not associated with an individual object, and it cannot refer to non-static member variables.

A public static member function can be called from outside the class using the scope-resolution operator, the same syntax for referring to a static member variable:

```
BinarySearchTree<int>::contains_impl(nullptr, -1);
// compile error because contains_impl() is not public
```

## BST-Based Set

Previously, we have seen array-based set implementations, one that used an unsorted array and another a sorted array. We can also implement a set using a binary search tree to store the data:

```
template <typename T>
class BSTSet {
public:
// EFFECTS: Inserts the given value into this set, if it is not
// already in the set.
void insert(const T &value) {
if (!elts.contains(value)) {
elts.insert(value);
}
}
// EFFECTS: Returns whether value is in this set.
bool contains(const T &value) const {
return elts.contains(value);
}
// EFFECTS: Returns the size of this set.
int size() const {
return elts.size();
}
private:
BinarySearchTree<T> elts;
};
```

If the underlying BST is *balanced*, meaning that each subtree within
the BST has close to the same number of elements in its left and right
child, then the height of the tree is in \(\textrm{O}(\log n)\),
where \(n\) is the size of the tree. Thus, the `contains()`

and
`insert()`

operations take logarithmic time, rather than the linear
time they would take on an unsorted set.

Unfortunately, our BST implementation does not guarantee that it will be balanced. In fact, inserting items in increasing order results in a maximally unbalanced tree as in Figure 87, resembling a list rather than a tree structure.

There are more complicated binary-search-tree implementations that do guarantee balance, but they are beyond the scope of this course.

# Maps and Pairs

A *map* is a data structure that maps *keys* to *values*. It is thus
an *associative* data structure, since it associates a value with each
key. As an example, we may want to associate exam scores with
individual test takers:

aliceywu |
100 |

akamil |
23 |

taligoro |
100 |

jjuett |
73 |

We can store these data in a standard-library map:

```
std::map<std::string, int> scores;
scores["aliceywu"] = 100;
scores["akamil"] = 23;
scores["taligoro"] = 100;
scores["jjuett"] = 73;
cout << scores["akamil"] << endl; // prints 23
cout << scores["aliceywu"] << endl; // prints 100
```

The `map`

class template has two required template parameters, the
first for the key type and the second for the value type. We are
mapping string IDs to integer scores, so we use a ```
map<string,
int>
```

. We can then use the subscript operator with a key to insert a
key-value pair into the map, or to look up the value associated with a
key.

A key feature of the standard-library map is that it is not erroneous
to look up a non-existent key. Instead, the map inserts and returns a
*value-initialized* datum for the key. For primitive types, value
initialization produces the zero value for the type. (This is distinct
from default initialization, which produces an undefined value.)

```
cout << scores["gmatute"] << endl; // prints 0
```

A map is similar to a set in that the keys in the map are unique; each key is associated with at most one value. However, the values themselves are not unique. The map above has two keys that are associated with the value 100.

Given the similarity with a set, we can also implement a map using a binary search tree. However, rather than using the key-value pair for ordering and uniqueness, we need to use just the key, and the value merely tags along for the ride, as shown in Figure 88.

To make this work, we need a heterogeneous data type for the `datum`

member of a node, so that it can store separate key and value items.
We can define our own struct or class, or we can use the `pair`

template in the standard `<utility>`

library.

```
std::pair<int, bool> p1; // default constructor value initializes both items
p1.first -= 5; // first now has value -5
p1.second = false;
std::pair<string, int> p2 = { "hello", 4 }; // explicit initialization
cout << p2.first << ": " << p2.second << endl; // prints hello: 4
```

A pair stores a first and second item, which can be of different
types. The `pair`

template is parameterized by the types of these
items.

We can use a pair and a BST to implement a map:

```
template <typename Key_type, typename Value_type>
class Map {
public:
// EFFECTS: Returns whether this map is empty.
bool empty() const;
// EFFECTS: Returns the number of key-value pairs in this map.
size_t size() const;
// EFFECTS: Returns the value associated with the given key. If
// the key is not in the map, inserts the key,
// associating it with a value-initialized object of type
// Value_type, and returns the newly inserted value.
Value_type& operator[](const Key_type& key);
private:
// Type alias for a key-value pair.
using Pair_type = std::pair<Key_type, Value_type>;
// Comparator that compares pairs by their key components.
class PairLess {
public:
bool operator()(...) const;
};
// Data representation.
BinarySearchTree<Pair_type, PairLess> entries;
};
```

For this to work, we need a `BinarySearchTree`

that can take a
custom comparator, to compare key-value pairs by just the value
component. This comparator can be defaulted to `std::less`

, which
compares elements according to the `<`

operator:

```
template <typename T, typename Compare=std::less<T>>
class BinarySearchTree {
...
};
```

All comparisons within the tree now must use an object of the
`Compare`

type. We leave the details as an exercise for the reader.