Maps and Pairs

A map is a data structure that maps keys to values. It is thus an associative data structure, since it associates a value with each key. As an example, we may want to associate exam scores with individual test takers:

aliceywu

100

akamil

23

taligoro

100

jjuett

73

We can store these data in a standard-library map:

std::map<std::string, int> scores;
scores["aliceywu"] = 100;
scores["akamil"] = 23;
scores["taligoro"] = 100;
scores["jjuett"] = 73;
cout << scores["akamil"] << endl;    // prints 23
cout << scores["aliceywu"] << endl;  // prints 100

The map class template has two required template parameters, the first for the key type and the second for the value type. We are mapping string IDs to integer scores, so we use a map<string, int>. We can then use the subscript operator with a key to insert a key-value pair into the map, or to look up the value associated with a key.

A key feature of the standard-library map is that it is not erroneous to look up a non-existent key. Instead, the map inserts and returns a value-initialized datum for the key. For primitive types, value initialization produces the zero value for the type. (This is distinct from default initialization, which produces an undefined value.)

cout << scores["gmatute"] << endl;  // prints 0

A map is similar to a set in that the keys in the map are unique; each key is associated with at most one value. However, the values themselves are not unique. The map above has two keys that are associated with the value 100.

Given the similarity with a set, we can also implement a map using a binary search tree. However, rather than using the key-value pair for ordering and uniqueness, we need to use just the key, and the value merely tags along for the ride, as shown in Figure 89.

_images/21_map_representation.svg

Figure 89 A map represented as a binary search tree.

To make this work, we need a heterogeneous data type for the datum member of a node, so that it can store separate key and value items. We can define our own struct or class, or we can use the pair template in the standard <utility> library.

std::pair<int, bool> p1;   // default constructor value initializes both items
p1.first -= 5;                                       // first now has value -5
p1.second = false;

std::pair<string, int> p2 = { "hello", 4 };         // explicit initialization
cout << p2.first << ": " << p2.second << endl;              // prints hello: 4

A pair stores a first and second item, which can be of different types. The pair template is parameterized by the types of these items.

We can use a pair and a BST to implement a map:

template <typename Key_type, typename Value_type>
class Map {
public:
  // EFFECTS: Returns whether this map is empty.
  bool empty() const;

  // EFFECTS: Returns the number of key-value pairs in this map.
  size_t size() const;

  // EFFECTS: Returns the value associated with the given key. If
  //          the key is not in the map, inserts the key,
  //          associating it with a value-initialized object of type
  //          Value_type, and returns the newly inserted value.
  Value_type& operator[](const Key_type& key);

private:
  // Type alias for a key-value pair.
  using Pair_type = std::pair<Key_type, Value_type>;

  // Comparator that compares pairs by their key components.
  class PairLess {
  public:
    bool operator()(...) const;
  };

  // Data representation.
  BinarySearchTree<Pair_type, PairLess> entries;
};

For this to work, we need a BinarySearchTree that can take a custom comparator, to compare key-value pairs by just the value component. This comparator can be defaulted to std::less, which compares elements according to the < operator:

template <typename T, typename Compare=std::less<T>>
class BinarySearchTree {
  ...
};

All comparisons within the tree now must use an object of the Compare type. We leave the details as an exercise for the reader.

Type Deduction and Range-Based For Loops

Suppose we wanted to write a function template that prints out the elements of sequence. We could write it to work with iterators:

template <typename Iter_type>
void print_all(Iter_type begin, Iter_type end) {
  for (Iter_type it = begin; it != end; ++it) {
    cout << *it << endl;
  }
}

On the other hand, we desire an interface that works directly on a sequence container:

template <typename Sequence>
void print_all(const Sequence &sequence) {
  for (typename Sequence::Iterator it = sequence.begin();
       it != sequence.end(); ++it) {
    cout << *it << endl;
  }
}

Declaring the iterator type is very verbose, as it consists of both a qualified name and the typename keyword, since the qualified name is a dependent type. Furthermore, the declaration makes the assumption that the iterator type is named Iterator, which is not true for standard-library containers that use lower-case type names.

Rather than writing out the type of it explicitly, we can ask the compiler to deduce it for us by using the auto keyword:

template <typename Sequence>
void print_all(const Sequence &sequence) {
  for (auto it = sequence.begin(); it != sequence.end(); ++it) {
    cout << *it << endl;
  }
}

The compiler deduces the type from its initialization. If the return type of sequence.begin() is List<int>::Iterator, then the type of it is deduced to have type List<int>::Iterator. On the other hand, if the return type is vector<Duck>::iterator, then the type of it is deduced to be vector<Duck>::iterator. The return type need not be a nested type; if it is char *, then it will be deduced to have type char *. Thus, not only does type deduction save us keystrokes for complicated types, it also abstracts over how the types are defined.

Range-Based For Loops

A range-based for loop is a special syntax for iterating over sequences that support traversal by iterator. Rather than writing out each piece of the traversal, we can have the compiler generate it for us:

vector<int> vec = { 1, 2, 3, 4, 5 };
for (int item : vec) {
  cout << item << endl;
}

The syntax of a range-based for loop is:

for (<type> <variable> : <sequence>)
  <body>

In the example above, the variable is named item and has type int, and vec is the sequence over which the loop iterates. The compiler automatically converts this into a traversal by iterator:

for (auto it = vec.begin(); it != vec.end(); ++it) {
  int item = *it;
  cout << item << endl;
}

The variable item is initialized in each iteration from an element in the sequence. Then the code in the body of the range-based for loop follows.

The following loop attempts to set every element in the vector to the value 42:

for (int item : vec) {
  item = 42;             // attempt to set element to 42
}

However, this loop does not actually modify any elements. To understand why, let us take a look at its translation into a traversal by iterator:

for (auto it = vec.begin(); it != vec.end(); ++it) {
  int item = *it;
  item = 42;
}

The loop is actually modifying a copy of each element rather than an element itself. To avoid making a copy, we can declare the loop variable to be a reference:

for (int &item : vec) {
  item = 42;             // actually set element to 42
}

This translates to:

for (auto it = vec.begin(); it != vec.end(); ++it) {
  int &item = *it;
  item = 42;
}

This successfully modifies each element to have the value 42.

Range-based for loops are often used in combination with auto, as in the following:

vector<int> vec = { 1, 2, 3, 4, 5 };

for (auto &item : vec) {
  item = 42;
}

for (auto item : vec) {
  cout << item << endl;
}

The first loop declares item as a reference, so that it aliases an element in the sequence. The second loop does not declare item as a reference, so it produces a copy of an element in the sequence. The following is the translation of both loops:

vector<int> vec = { 1, 2, 3, 4, 5 };

for (auto it = vec.begin(); it != vec.end(); ++it) {
  int &item = *it;                                     // alias
  item = 42;
}

for (auto it = vec.begin(); it != vec.end(); ++it) {
  int item = *it;                                      // copy
  item = 42;
}

With a range-based for loop, we can simplify print_all():

template <typename Sequence>
void print_all(const Sequence &sequence) {
  for (auto &item : sequence) {
    cout << item << endl;
  }
}

We can call it on any sequence [1] that supports traversal by iterator, as long as the element type can be inserted into an output stream:

vector<string> vec = { "hello", "world!" };
print_all(vec);

Iterating over a Map

The standard-library map defines an iterator that produces key-value pairs upon dereference. The following is an example that constructs a map, counting how many times each unique word occurs in a vector. It then iterates over the map to print out the counts.

void print_word_counts(const std::vector<std::string> &words) {
  std::map<std::string, int> word_counts;

  // Each time a word is seen, add 1 to its entry in the map
  for (const auto &word : words) {
    word_counts[word] += 1;
  }

  // Print out the results by iterating through the map.
  for (const auto &key_value : word_counts) {     // key-value pairs
    const auto &word = key_value.first;
    const auto &count = key_value.second;
    cout << word << "occurred " << count << " times." << endl;
  }
}

When incrementing a word’s count, we do not need to check whether the word is in the map; if not, the map automatically inserts the word into the map with a value-initialized count of zero, which we then increment to one.

We use range-based for loops to iterate over both the vector and the map, declaring a type-deduced reference to each element. A reference avoids making a copy, which is nontrivial for strings. The iteration over the map produces key-value pairs (std::pair<std::string, int>), and we access the first and second members to obtain the word and count, respectively.