Maps and Pairs
A map is a data structure that maps keys to values. It is thus an associative data structure, since it associates a value with each key. As an example, we may want to associate exam scores with individual test takers:
aliceywu |
100 |
akamil |
23 |
taligoro |
100 |
jjuett |
73 |
We can store these data in a standard-library map:
std::map<std::string, int> scores;
scores["aliceywu"] = 100;
scores["akamil"] = 23;
scores["taligoro"] = 100;
scores["jjuett"] = 73;
cout << scores["akamil"] << endl; // prints 23
cout << scores["aliceywu"] << endl; // prints 100
The map
class template has two required template parameters, the
first for the key type and the second for the value type. We are
mapping string IDs to integer scores, so we use a map<string,
int>
. We can then use the subscript operator with a key to insert a
key-value pair into the map, or to look up the value associated with a
key.
A key feature of the standard-library map is that it is not erroneous to look up a non-existent key. Instead, the map inserts and returns a value-initialized datum for the key. For primitive types, value initialization produces the zero value for the type. (This is distinct from default initialization, which produces an undefined value.)
cout << scores["gmatute"] << endl; // prints 0
A map is similar to a set in that the keys in the map are unique; each key is associated with at most one value. However, the values themselves are not unique. The map above has two keys that are associated with the value 100.
Given the similarity with a set, we can also implement a map using a binary search tree. However, rather than using the key-value pair for ordering and uniqueness, we need to use just the key, and the value merely tags along for the ride, as shown in Figure 89.
Figure 89 A map represented as a binary search tree.
To make this work, we need a heterogeneous data type for the datum
member of a node, so that it can store separate key and value items.
We can define our own struct or class, or we can use the pair
template in the standard <utility>
library.
std::pair<int, bool> p1; // default constructor value initializes both items
p1.first -= 5; // first now has value -5
p1.second = false;
std::pair<string, int> p2 = { "hello", 4 }; // explicit initialization
cout << p2.first << ": " << p2.second << endl; // prints hello: 4
A pair stores a first and second item, which can be of different
types. The pair
template is parameterized by the types of these
items.
We can use a pair and a BST to implement a map:
template <typename Key_type, typename Value_type>
class Map {
public:
// EFFECTS: Returns whether this map is empty.
bool empty() const;
// EFFECTS: Returns the number of key-value pairs in this map.
size_t size() const;
// EFFECTS: Returns the value associated with the given key. If
// the key is not in the map, inserts the key,
// associating it with a value-initialized object of type
// Value_type, and returns the newly inserted value.
Value_type& operator[](const Key_type& key);
private:
// Type alias for a key-value pair.
using Pair_type = std::pair<Key_type, Value_type>;
// Comparator that compares pairs by their key components.
class PairLess {
public:
bool operator()(...) const;
};
// Data representation.
BinarySearchTree<Pair_type, PairLess> entries;
};
For this to work, we need a BinarySearchTree
that can take a
custom comparator, to compare key-value pairs by just the value
component. This comparator can be defaulted to std::less
, which
compares elements according to the <
operator:
template <typename T, typename Compare=std::less<T>>
class BinarySearchTree {
...
};
All comparisons within the tree now must use an object of the
Compare
type. We leave the details as an exercise for the reader.
Type Deduction and Range-Based For Loops
Suppose we wanted to write a function template that prints out the elements of sequence. We could write it to work with iterators:
template <typename Iter_type>
void print_all(Iter_type begin, Iter_type end) {
for (Iter_type it = begin; it != end; ++it) {
cout << *it << endl;
}
}
On the other hand, we desire an interface that works directly on a sequence container:
template <typename Sequence>
void print_all(const Sequence &sequence) {
for (typename Sequence::Iterator it = sequence.begin();
it != sequence.end(); ++it) {
cout << *it << endl;
}
}
Declaring the iterator type is very verbose, as it consists of both a
qualified name and the typename
keyword, since the qualified name
is a dependent type. Furthermore, the declaration makes the assumption
that the iterator type is named Iterator
, which is not true for
standard-library containers that use lower-case type names.
Rather than writing out the type of it
explicitly, we can ask the
compiler to deduce it for us by using the auto
keyword:
template <typename Sequence>
void print_all(const Sequence &sequence) {
for (auto it = sequence.begin(); it != sequence.end(); ++it) {
cout << *it << endl;
}
}
The compiler deduces the type from its initialization. If the return
type of sequence.begin()
is List<int>::Iterator
, then the type
of it
is deduced to have type List<int>::Iterator
. On the
other hand, if the return type is vector<Duck>::iterator
, then the
type of it
is deduced to be vector<Duck>::iterator
. The return
type need not be a nested type; if it is char *
, then it
will
be deduced to have type char *
. Thus, not only does type deduction
save us keystrokes for complicated types, it also abstracts over how
the types are defined.
Range-Based For Loops
A range-based for loop is a special syntax for iterating over sequences that support traversal by iterator. Rather than writing out each piece of the traversal, we can have the compiler generate it for us:
vector<int> vec = { 1, 2, 3, 4, 5 };
for (int item : vec) {
cout << item << endl;
}
The syntax of a range-based for loop is:
for (<type> <variable> : <sequence>)
<body>
In the example above, the variable is named item
and has type
int
, and vec
is the sequence over which the loop iterates. The
compiler automatically converts this into a traversal by iterator:
for (auto it = vec.begin(); it != vec.end(); ++it) {
int item = *it;
cout << item << endl;
}
The variable item
is initialized in each iteration from an element
in the sequence. Then the code in the body of the range-based for
loop follows.
The following loop attempts to set every element in the vector to the value 42:
for (int item : vec) {
item = 42; // attempt to set element to 42
}
However, this loop does not actually modify any elements. To understand why, let us take a look at its translation into a traversal by iterator:
for (auto it = vec.begin(); it != vec.end(); ++it) {
int item = *it;
item = 42;
}
The loop is actually modifying a copy of each element rather than an element itself. To avoid making a copy, we can declare the loop variable to be a reference:
for (int &item : vec) {
item = 42; // actually set element to 42
}
This translates to:
for (auto it = vec.begin(); it != vec.end(); ++it) {
int &item = *it;
item = 42;
}
This successfully modifies each element to have the value 42.
Range-based for loops are often used in combination with auto
, as
in the following:
vector<int> vec = { 1, 2, 3, 4, 5 };
for (auto &item : vec) {
item = 42;
}
for (auto item : vec) {
cout << item << endl;
}
The first loop declares item
as a reference, so that it aliases an
element in the sequence. The second loop does not declare item
as
a reference, so it produces a copy of an element in the sequence. The
following is the translation of both loops:
vector<int> vec = { 1, 2, 3, 4, 5 };
for (auto it = vec.begin(); it != vec.end(); ++it) {
int &item = *it; // alias
item = 42;
}
for (auto it = vec.begin(); it != vec.end(); ++it) {
int item = *it; // copy
item = 42;
}
With a range-based for loop, we can simplify print_all()
:
template <typename Sequence>
void print_all(const Sequence &sequence) {
for (auto &item : sequence) {
cout << item << endl;
}
}
We can call it on any sequence 1 that supports traversal by iterator, as long as the element type can be inserted into an output stream:
vector<string> vec = { "hello", "world!" };
print_all(vec);
- 1
We can even call
print_all()
on an array, as long as it still is an array:string arr[] = { "hello", "world!" }; print_all(arr);
This is because a range-based for loop actually calls
std::begin()
andstd::end()
on the sequence, rather than directly calling the respective member functions. For arrays,std::begin()
andstd::end()
return pointers, while for class types, they call thebegin()
andend()
member functions, respectively.
Iterating over a Map
The standard-library map defines an iterator that produces key-value pairs upon dereference. The following is an example that constructs a map, counting how many times each unique word occurs in a vector. It then iterates over the map to print out the counts.
void print_word_counts(const std::vector<std::string> &words) {
std::map<std::string, int> word_counts;
// Each time a word is seen, add 1 to its entry in the map
for (const auto &word : words) {
word_counts[word] += 1;
}
// Print out the results by iterating through the map.
for (const auto &key_value : word_counts) { // key-value pairs
const auto &word = key_value.first;
const auto &count = key_value.second;
cout << word << "occurred " << count << " times." << endl;
}
}
When incrementing a word’s count, we do not need to check whether the word is in the map; if not, the map automatically inserts the word into the map with a value-initialized count of zero, which we then increment to one.
We use range-based for loops to iterate over both the vector and the
map, declaring a type-deduced reference to each element. A reference
avoids making a copy, which is nontrivial for strings. The iteration
over the map produces key-value pairs (std::pair<std::string,
int>
), and we access the first
and second
members to obtain
the word and count, respectively.