Implementing Iterators
Last time, we saw how to iterate over the elements of a linked list from within the list class itself:
template <typename T>
void List<T>::print(std::ostream &os) const {
for (Node *node_ptr = first; node_ptr != nullptr;
node_ptr = node_ptr->next) {
os << node_ptr->datum << " ";
}
}
The strategy is to start with the first node in the list, then follow
the nodes’ next
pointers until reaching the null sentinel. We
can try to follow the same strategy from outside the class:
List<int> list;
...
for (Node *node_ptr = list.first; node_ptr != nullptr;
node_ptr = node_ptr->next) {
cout << node_ptr->datum << endl;
}
However, this strategy doesn’t work from outside the list because the
nested Node
class is private – it is an implementation detail, so
outside code should not depend on it anyway. The first
member is
also private, so the initialization of node_ptr
is invalid.
Instead, we need to provide a different interface for iterating over a
list.
We have seen two patterns for iterating over a sequential container, traversal by index and traversal by pointer. In traversal by index, we use a loop variable of integral type, increment it in each iteration, and access an element by using the index as an offset into the container:
const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by index
for (int i = 0; i < SIZE; ++i) {
cout << array[i] << endl;
}
To use this pattern with a container, it must support random access,
the ability to directly access an element through an index.
Array-based sequential containers generally support random access. For
built-in arrays, it translates directly to pointer arithmetic.
Array-based class types such as std::array
and std::vector
overload the subscript operator to index into the underlying array,
which then turns into pointer arithmetic.
Linked lists do not provide random access; since they are not
array-based, accessing an element in the middle of the list cannot be
done with pointer arithmetic, but must traverse from the start of the
list as in print()
. Thus, traversal by index is not an appropriate
pattern for a linked list.
Traversal by pointer, on the other hand, starts with a pointer to the beginning of a sequence and iterates by moving that pointer forward to each subsequent element, until that pointer reaches past the end of the sequence:
const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by pointer
for (int *ptr = array; ptr != array + SIZE; ++ptr) {
cout << *ptr << endl;
}
A similar pattern would work for a linked list: start off with something like a pointer that “points” to the beginning of the list, move it forward one element at a time, and stop when we get past the end of the list:
List<int> list;
...
for (Iterator it = list.begin(); it != list.end(); ++it) {
cout << *it << endl;
}
We use an object called an iterator to iterate over the elements. The pattern above is called traversal by iterator, and it is a generalization of traversal by pointer.
To traverse with an iterator, it must provide the following operations: [1]
dereference (prefix
*
)increment (prefix
++
)equality checks (
==
and!=
)
In addition, the container itself must provide two member functions:
begin()
returns an iterator to the start of the sequenceend()
returns a “past-the-end” iterator that represents a position that is past the last element of the sequence
An iterator is a class-type object that has the same interface as a pointer. We provide the same interface by overloading the required operators:
template <typename T>
class Iterator {
public:
T & operator*() const;
Iterator &operator++();
bool operator==(Iterator rhs) const;
bool operator!=(Iterator rhs) const;
};
The unary *
operator is overloaded to return the element the
iterator is pointing to by reference. The prefix ++
operator moves
the iterator to point to the next element in the sequence, and it
returns the iterator itself by reference. This allows the operator to
be chained:
++++it;
The equality operators determine if the receiver points to the same
element as the rhs
iterator. Unlike most class types, we pass
iterators by value – they are generally small, and it is standard
practice in C++ to make copies of them when we pass them to a
function, just like we would for pointers.
Before we proceed to implement the operators, we need a data
representation. The representation of an iterator is specific to a
particular kind of container. For a linked list, traversal by iterator
is just an abstraction over the traversal in print()
:
A list iterator is an abstraction over a pointer to a node.
The call
list.begin()
returns an iterator constructed fromfirst
.The past-the-end iterator returned by
list.end()
is represented by a null pointer.Comparing two iterators compares their underlying node pointers.
Incrementing an iterator moves its node pointer to the next node using the original node’s
next
member.Derferencing the iterator obtains the
datum
member of the underlying node.
Thus, we represent an iterator with just a pointer to a node.
template <typename T>
class Iterator {
...
private:
Node *node_ptr;
};
As mentioned above, we use a null pointer for an iterator that is past
the end of a list. The end condition for the traversal in print()
is when the node pointer is null – after the traversal reaches the
last node, it sets the pointer to the value of the last node’s
next
member, which is null according to the list’s representation
invariant. Thus, it makes sense to use a null pointer to represent a
past-the-end iterator.
Now that we have a representation, we should consider representation
invariants. It is the case that node_ptr
will either be null or
point to a valid list node when the iterator is created. However, we
will see that an iterator may be invalidated, which will result in
node_ptr
pointing to an invalid node. Thus, there is no invariant
that will hold for a list iterator’s representation.
Iterator Definition
Before we proceed to implement Iterator
, observe the following
issues with its definition:
Node
is not a top-level type, but a member of theList
class template, so it cannot be named from the outside without the scope-resolution operator.The
Node
struct is private, so it cannot be accessed from outside code.The iterator type is associated with
List
, so it should be encapsulated within theList
template.
We can solve these issues by defining Iterator
as a member of the
List
template:
template <typename T>
class List {
...
private:
struct Node {
int datum;
Node *next;
};
public:
class Iterator {
public:
T & operator*() const;
Iterator &operator++();
bool operator==(Iterator rhs) const;
bool operator!=(Iterator rhs) const;
private:
Node *node_ptr;
};
private:
Node *first;
Node *last;
};
We must define the Iterator
class after the definition of the
Node
struct so that Node
is in scope when it is referenced in
the Iterator
class. [2] Iterator
itself is part of the
interface for List
, so it is defined as public within List
.
The scope of a class member begins at the member declaration and includes the rest of the class body, all member-function bodies, and all member-initializer lists.
We can now implement the member functions of Iterator
.
Dereference and Increment Operators
The dereference operator requires that the iterator is dereferenceable, which means that it is pointing to a valid element in the container. We cannot in general check that this is the case, but we can check whether the iterator is a past-the-end iterator:
// REQUIRES: this is a dereferenceable iterator
// EFFECTS: Returns the element that this iterator points to.
template <typename T>
T & List<T>::Iterator::operator*() const {
assert(node_ptr); // check whether this is a past-the-end iterator
return node_ptr->datum;
}
The operator*()
function is a member of Iterator
, which itself
is a member of List<T>
– thus, we need two scope-resolution
operators when referring to the function. After asserting that the
iterator is not past the end, the function just returns the datum
member of the underlying node. The return is by reference, so that the
element can be modified through the iterator:
List<int> Iterator it = ...;
*it = 3; // LHS of assignment must be an object
The dereference operator does not modify the iterator. In addition, while the function returns an object by reference to non-const, modifying that object does not modify the iterator, since the object is not a member of the iterator itself. Thus, the dereference operator can be declared as a const member function.
The operator++()
function modifies the iterator by moving it to
“point to” the next element, so it cannot be declared as const. As
with dereference, the past-the-end iterator cannot be incremented –
node_ptr
would be null, and there wouldn’t be a next
pointer
to move the iterator to.
// REQUIRES: this is a dereferenceable iterator
// EFFECTS: Returns the element that this iterator points to.
template <typename T>
typename List<T>::Iterator & List<T>::Iterator::operator++() {
assert(node_ptr); // check whether this is a past-the-end iterator
node_ptr = node_ptr->next;
return *this;
}
The function moves the iterator to the next element by following the
next
pointer of the current node. Once the iterator has been
moved, the function returns the iterator by reference, in keeping with
the pattern for prefix increment. [3]
The postfix increment operator can be overloaded as well. To
distinguish its signature from prefix, C++ uses a dummy int
parameter for postfix:
template <typename T>
typename List<T>::Iterator List<T>::Iterator::operator++(int) {
assert(node_ptr); // check whether this is a past-the-end iterator
Iterator tmp = *this; // make a copy of this iterator
node_ptr = node_ptr->next;
return tmp; // return the copy
}
We need not provide a parameter name, since we are not actually using the parameter object. In keeping with the contract for postfix increment, the function returns a copy of the original iterator by value.
The typename
Keyword
The return type of operator++()
is a reference to
List<T>::Iterator
; Iterator
is a member of a template that is
dependent on the template parameter T
. C++ requires the
typename
keyword before a dependent name that refers to a type, so
that the compiler knows that it is a type and not a value. [4]
The reasons are beyond the scope of this course, but they have to do with the way the compiler instantiates templates.
The following illustrates when the typename
keyword is required:
template <typename U>
void func() {
IntList list; // not a qualified type
IntList::Iterator it1; // outer class does not depend on template parameter U
List<int>::Iterator it2; // outer class does not depend on template parameter U
typename List<U>::Iterator it3; // outer class depends on template parameter U
int capacity =
SortedSet<U>::MAX_CAPACITY; // member is not a type
}
An unqualified name, one without the scope-resolution operator,
never needs the typename
keyword. In a qualified name, if the
outer type does not depend on a template parameter, then no
typename
keyword is required. If the outer type does depend on a
template parameter, then the typename
keyword is required when the
inner name refers to a type. If the inner name does not refer to a
type, the typename
keyword is erroneous to use, since it
explicitly tells the compiler that the inner name is a type.
In practice, compilers can often determine when the typename
keyword is required, and many C++ programmers rely on the compiler to
tell them that it is needed rather than learning the rules:
$ g++ --std=c++17 foo.cpp
foo.cpp:5:20: error: expected ';' after expression
List<U>::Iterator it;
^
;
foo.cpp:5:21: error: use of undeclared identifier 'it'
List<U>::Iterator it;
^
foo.cpp:5:3: error: missing 'typename' prior to dependent type name
'List<int>::Iterator'
List<U>::Iterator it;
^~~~~~~~~~~~~~~~~
foo.cpp:9:3: note: in instantiation of function template specialization
'func<int>' requested here
func<int>();
^
3 errors generated.
Equality Comparisons
Two iterators are defined as equal if either they are both past the
end, or they “point to” the same element in the same list. Thus, they
are equal exactly when their node_ptr
members point to the same
node.
Comparing two iterators does not require them to be dereferenceable or even pointing to a valid element. Thus, the operators do not have a REQUIRES clause.
template <typename T>
bool List<T>::Iterator::operator==(Iterator rhs) const {
return node_ptr == rhs.node_ptr;
}
template <typename T>
bool List<T>::Iterator::operator!=(Iterator rhs) const {
return node_ptr != rhs.node_ptr;
}
We do not need to qualify Iterator
in the parameter type – once
the compiler knows that we are defining a member of Iterator
, we
can refer to the class with an unqualified name.
Creating Iterators
We have defined the operator overloads for Iterator
. However, we
have not provided a means of creating an Iterator
object. Without
a user-defined constructor, we do get an implicit default constructor,
but it just initializes node_ptr
with a junk value. Instead, we
define the following two constructors:
an explicit default constructor that makes the iterator a past-the-end iterator
a constructor that takes in a pointer to a node
The latter is a private constructor – the Node
class is not part
of the interface for the list, so the constructor that takes a node
pointer also is not part of the interface, and the outside world would
not be able to call it even if it were.
The constructor definitions are as follows:
template <typename T>
class List {
...
public:
class Iterator {
public:
// EFFECTS: Constructs a past-the-end iterator.
Iterator() : node_ptr(nullptr) {}
...
private:
// EFFECTS: Constructs an iterator from the given node pointer.
Iterator(Node *node_ptr_in) : node_ptr(node_ptr_in) {}
Node *node_ptr;
};
};
We can now implement begin()
and end()
member functions in
List
that return a start and past-the-end iterator, respectively.
template <typename T>
class List {
...
public:
// EFFECTS: Returns an iterator that points to the first element,
// or a past-the-end iterator if this list is empty.
Iterator begin() {
return Iterator(first);
}
// EFFECTS: Returns a past-the-end iterator.
Iterator end() {
return Iterator();
}
};
The begin()
function returns an iterator that points to the first
element in the list. However, if the list is empty, first
is null;
this is the representation of a past-the-end iterator, so begin()
returns such an iterator when the list is empty. The end()
function returns a default-constructed past-the-end iterator.
Unfortunately, the implementation of begin()
does not compile:
./List.hpp:122:12: error: calling a private constructor of class
'List<int>::Iterator'
return Iterator(first);
^
A private member is only accessible from within the scope of the class
that defines the member. The private Iterator
constructor is a
member of Iterator
, but it is being accessed from outside the
Iterator
class. On the other hand, Iterator
is within the
scope of List
, so it can access private List
members such as
Node
. Thus, a nested class can access private members of the outer
class, but not vice versa.
Friend Declarations
The solution to the problem above is a friend declaration, in which a class gives an outside entity access to the class’s private members.
template <typename T>
class List {
...
public:
class Iterator {
friend class List;
...
};
};
A friend declaration can appear anywhere directly within a class body, and it tells the compiler that the given entity is allowed to access private members. The entity may be a function or a type; for instance, it is common practice to define the insertion operator as a friend:
class Card {
...
private:
std::string rank;
std::string suit;
friend std::ostream & operator<<(std::ostream &os, const Card &card);
};
std::ostream & operator<<(std::ostream &os, const Card &card) {
return os << card.rank << " of " << card.suit;
}
Friendship is given, not taken. For instance, class C
may declare
F
as a friend:
friend class F;
This allows F
to access the private members of C
, but it does
not allow C
to access the private members of F
.
The friend declaration completes our iterator implementation, so we can now perform a traversal by iterator:
List<int> list;
for (int i = 1; i < 5; ++i) {
list.push_back(i);
}
for (List<int>::Iterator it = list.begin(); it != list.end(); ++it) {
cout << *it << endl;
}
This is just an abstraction over a loop that uses a node pointer directly:
for (Node *node_ptr = list.first; node_ptr != nullptr;
node_ptr = node_ptr->next) {
cout << node_ptr->datum << endl;
}
The iterator starts at the first
member, just like the traversal
using a node pointer. Incrementing the iterator moves it to the next
node, the same as in the loop above. Dereferencing the iterator
accesses the respective node’s datum
member, as in the body of the
loop above. The traversal ends when the iterator reaches a null
pointer, just like the node-pointer loop. Thus, the iterator provides
the same traversal as can be done from within the List
template,
but with the interface of a pointer rather than having the user rely
on implementation details.
Generic Iterator Functions
Iterators provide a common interface for traversing through a sequence of elements, and the standard-library sequences all support the iterator interface.
vector<int> vec = { 1, 2, 3, 4 };
for (vector<int>::iterator it = vec.begin(); it != vec.end(); ++it) {
cout << *it << endl;
}
The common interface allows us to write generic functions that work on any sequence. For instance, we can write a function template for finding the maximum element in a sequence:
// REQUIRES: end is after or equal to begin
// EFFECTS: Returns an iterator to the maximum element in [begin, end).
// Returns begin when the sequence is empty.
template <typename Iter_type>
Iter_type max_element(Iter_type begin, Iter_type end) {
Iter_type max_so_far = begin;
for (; begin != end; ++begin) {
if (*max_so_far < *begin) {
max_so_far = begin;
}
}
return max_so_far;
}
The max_element()
function template takes a begin and end iterator
of any type and returns an iterator to the maximum element that is
between the two iterators. As is standard in C++, the end iterator is
an exclusive bound – the function stops when it reaches end
and
does not dereference it. Then max_element()
implements a standard
find-the-max algorithm: start with the first element as the maximum so
far, then update the max so far when a larger item is encountered. The
iterators must be dereferenced to get to the elements to compare them,
and the element type must support the <
operator.
We can use max_element()
with any iterator and element type, as
long as the element type supports <
:
vector<int> vec;
.. // fill vec with numbers
cout << *max_element(vec.begin(), vec.end()) << endl;
List<int> list;
.. // fill list with numbers
cout << *max_element(list.begin(), list.end()) << endl;
List<Card> cards;
.. // fill cards with Cards
cout << *max_element(cards.begin(), cards.end()) << endl;
int const SIZE = 10;
double arr[SIZE];
.. // arr fill with numbers
cout << *max_element(arr, arr + SIZE) << endl;
As usual with function templates, the compiler deduces the template
parameter from the function arguments. The last example illustrates
that we can even use max_element()
with an array; since a pointer
has the same interface as an iterator, we just need to provide
max_element()
with a pointer to the first element and another that
is just past the end.
The standard <algorithm>
library contains many function templates
such as std::max_element()
that operate on iterators. However, the
standard-library function templates require an iterator to define
several member type aliases to work:
template <typename T>
class List {
...
public:
class Iterator {
...
public:
using iterator_category = std::input_iterator_tag;
using value_type = T;
using difference_type = void;
using pointer = void;
using reference = T &;
};
};
A discussion of these type aliases is beyond the scope of this course.
As another example, the following is a function template that
determines whether a sequence contains a duplicate item. It compares
every pair of elements with the ==
operator to determine if any
are equal:
template <typename Iter_type>
bool no_duplicates(Iter_type begin, Iter_type end) {
for (; begin != end; ++begin) {
Iter_type other = begin; // copy iterator to current element
++other; // move copy one element forward
for (; other != end; ++other) {
if (*begin == *other) { // compare element with those that come after
return false;
}
}
}
return true;
}
Iterator Invalidation
Recall that a dangling pointer points to an invalid object, one that is no longer alive. Since iterators represent indirection to sequence elements, an iterator can also end up pointing to an invalid object. Such an iterator is said to be invalidated. The following is an example:
List<int> list;
list.push_back(3);
list.push_back(-5);
list.push_back(2);
list.push_back(4);
List<int>::Iterator iter = list.begin();
cout << *iter << endl;
list.pop_front(); // invalidates iter
cout << *iter << endl; // UNDEFINED BEHAVIOR
The code constructs a list and inserts elements into it. It then creates an iterator that is pointing to the first element before proceeding to remove that element. This invalidates the iterator, so that dereferencing the iterator results in undefined behavior.
In general, modifying a sequence can invalidate existing iterators. An
iterator may be invalidated even if the element it is pointing to is
not removed. For instance, a vector iterator is invalidated when a
grow operation occurs, since the element the iterator is pointing to
moves to a different location. A member function’s documentation
should indicate whether it may invalidate iterators, as in the
cppreference.com documentation for
std::vector::push_back()
:
If the new
size()
is greater thancapacity()
then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Type Deduction
Suppose we wanted to write a function template that prints out the elements of a sequence. We could write it to work with iterators:
template <typename Iter_type>
void print_all(Iter_type begin, Iter_type end) {
for (Iter_type it = begin; it != end; ++it) {
cout << *it << endl;
}
}
On the other hand, we desire an interface that works directly on a sequence container:
template <typename Sequence>
void print_all(const Sequence &sequence) {
for (typename Sequence::Iterator it = sequence.begin();
it != sequence.end(); ++it) {
cout << *it << endl;
}
}
Declaring the iterator type is very verbose, as it consists of both a
qualified name and the typename
keyword, since the qualified name
is a dependent type. Furthermore, the declaration makes the assumption
that the iterator type is named Iterator
, which is not true for
standard-library containers that use lower-case type names.
Rather than writing out the type of it
explicitly, we can ask the
compiler to deduce it for us by using the auto
keyword:
template <typename Sequence>
void print_all(const Sequence &sequence) {
for (auto it = sequence.begin(); it != sequence.end(); ++it) {
cout << *it << endl;
}
}
The compiler deduces the type from its initialization. If the return
type of sequence.begin()
is List<int>::Iterator
, then the type
of it
is deduced to have type List<int>::Iterator
. On the
other hand, if the return type is vector<Duck>::iterator
, then the
type of it
is deduced to be vector<Duck>::iterator
. The return
type need not be a nested type; if it is char *
, then it
will
be deduced to have type char *
. Thus, not only does type deduction
save us keystrokes for complicated types, it also abstracts over how
the types are defined.