Implementing Iterators

Last time, we saw how to iterate over the elements of a linked list from within the list class itself:

template <typename T>
void List<T>::print(std::ostream &os) const {
  for (Node *node_ptr = first; node_ptr != nullptr;
       node_ptr = node_ptr->next) {
    os << node_ptr->datum << " ";
  }
}

The strategy is to start with the first node in the list, then follow the nodes’ next pointers until reaching the null sentinel. We can try to follow the same strategy from outside the class:

List<int> list;
...
for (Node *node_ptr = list.first; node_ptr != nullptr;
     node_ptr = node_ptr->next) {
  cout << node_ptr->datum << endl;
}

However, this strategy doesn’t work from outside the list because the nested Node class is private – it is an implementation detail, so outside code should not depend on it anyway. The first member is also private, so the initialization of node_ptr is invalid. Instead, we need to provide a different interface for iterating over a list.

We have seen two patterns for iterating over a sequential container, traversal by index and traversal by pointer. In traversal by index, we use a loop variable of integral type, increment it in each iteration, and access an element by using the index as an offset into the container:

const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by index
for (int i = 0; i < SIZE; ++i) {
  cout << array[i] << endl;
}

To use this pattern with a container, it must support random access, the ability to directly access an element through an index. Array-based sequential containers generally support random access. For built-in arrays, it translates directly to pointer arithmetic. Array-based class types such as std::array and std::vector overload the subscript operator to index into the underlying array, which then turns into pointer arithmetic.

Linked lists do not provide random access; since they are not array-based, accessing an element in the middle of the list cannot be done with pointer arithmetic, but must traverse from the start of the list as in print(). Thus, traversal by index is not an appropriate pattern for a linked list.

Traversal by pointer, on the other hand, starts with a pointer to the beginning of a sequence and iterates by moving that pointer forward to each subsequent element, until that pointer reaches past the end of the sequence:

const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by pointer
for (int *ptr = array; ptr != array + SIZE; ++ptr) {
  cout << *ptr << endl;
}

A similar pattern would work for a linked list: start off with something like a pointer that “points” to the beginning of the list, move it forward one element at a time, and stop when we get past the end of the list:

List<int> list;
...
for (Iterator it = list.begin(); it != list.end(); ++it) {
  cout << *it << endl;
}

We use an object called an iterator to iterate over the elements. The pattern above is called traversal by iterator, and it is a generalization of traversal by pointer.

To traverse with an iterator, it must provide the following operations: [1]

dereference (prefix *)
increment (prefix ++)
equality checks (== and !=)

In addition, the container itself must provide two member functions:

begin() returns an iterator to the start of the sequence
end() returns a “past-the-end” iterator that represents a position that is past the last element of the sequence

An iterator is a class-type object that has the same interface as a pointer. We provide the same interface by overloading the required operators:

template <typename T>
class Iterator {
public:
  T & operator*() const;

  Iterator &operator++();

  bool operator==(Iterator rhs) const;

  bool operator!=(Iterator rhs) const;
};

The unary * operator is overloaded to return the element the iterator is pointing to by reference. The prefix ++ operator moves the iterator to point to the next element in the sequence, and it returns the iterator itself by reference. This allows the operator to be chained:

++++it;

The equality operators determine if the receiver points to the same element as the rhs iterator. Unlike most class types, we pass iterators by value – they are generally small, and it is standard practice in C++ to make copies of them when we pass them to a function, just like we would for pointers.

Before we proceed to implement the operators, we need a data representation. The representation of an iterator is specific to a particular kind of container. For a linked list, traversal by iterator is just an abstraction over the traversal in print():

A list iterator is an abstraction over a pointer to a node.
The call list.begin() returns an iterator constructed from first.
The past-the-end iterator returned by list.end() is represented by a null pointer.
Comparing two iterators compares their underlying node pointers.
Incrementing an iterator moves its node pointer to the next node using the original node’s next member.
Derferencing the iterator obtains the datum member of the underlying node.

Thus, we represent an iterator with just a pointer to a node.

template <typename T>
class Iterator {
  ...
private:
  Node *node_ptr;
};

Figure 71 A list iterator is an abstraction of a pointer to a node.

As mentioned above, we use a null pointer for an iterator that is past the end of a list. The end condition for the traversal in print() is when the node pointer is null – after the traversal reaches the last node, it sets the pointer to the value of the last node’s next member, which is null according to the list’s representation invariant. Thus, it makes sense to use a null pointer to represent a past-the-end iterator.

Figure 72 A list past-the-end iterator has a null pointer as its stored node pointer.

Now that we have a representation, we should consider representation invariants. It is the case that node_ptr will either be null or point to a valid list node when the iterator is created. However, we will see that an iterator may be invalidated, which will result in node_ptr pointing to an invalid node. Thus, there is no invariant that will hold for a list iterator’s representation.

Iterator Definition

Before we proceed to implement Iterator, observe the following issues with its definition:

Node is not a top-level type, but a member of the List class template, so it cannot be named from the outside without the scope-resolution operator.
The Node struct is private, so it cannot be accessed from outside code.
The iterator type is associated with List, so it should be encapsulated within the List template.

We can solve these issues by defining Iterator as a member of the List template:

template <typename T>
class List {
  ...
private:
  struct Node {
    int datum;
    Node *next;
  };

public:
  class Iterator {
  public:
    T & operator*() const;

    Iterator &operator++();

    bool operator==(Iterator rhs) const;

    bool operator!=(Iterator rhs) const;
  private:
    Node *node_ptr;
  };

private:
  Node *first;
  Node *last;
};

We must define the Iterator class after the definition of the Node struct so that Node is in scope when it is referenced in the Iterator class. [2] Iterator itself is part of the interface for List, so it is defined as public within List.

We can now implement the member functions of Iterator.

Dereference and Increment Operators

The dereference operator requires that the iterator is dereferenceable, which means that it is pointing to a valid element in the container. We cannot in general check that this is the case, but we can check whether the iterator is a past-the-end iterator:

// REQUIRES: this is a dereferenceable iterator
// EFFECTS:  Returns the element that this iterator points to.
template <typename T>
T & List<T>::Iterator::operator*() const {
  assert(node_ptr);  // check whether this is a past-the-end iterator
  return node_ptr->datum;
}

The operator*() function is a member of Iterator, which itself is a member of List<T> – thus, we need two scope-resolution operators when referring to the function. After asserting that the iterator is not past the end, the function just returns the datum member of the underlying node. The return is by reference, so that the element can be modified through the iterator:

List<int> Iterator it = ...;
*it = 3;                      // LHS of assignment must be an object

The dereference operator does not modify the iterator. In addition, while the function returns an object by reference to non-const, modifying that object does not modify the iterator, since the object is not a member of the iterator itself. Thus, the dereference operator can be declared as a const member function.

The operator++() function modifies the iterator by moving it to “point to” the next element, so it cannot be declared as const. As with dereference, the past-the-end iterator cannot be incremented – node_ptr would be null, and there wouldn’t be a next pointer to move the iterator to.

// REQUIRES: this is a dereferenceable iterator
// EFFECTS:  Returns the element that this iterator points to.
template <typename T>
typename List<T>::Iterator & List<T>::Iterator::operator++() {
  assert(node_ptr);  // check whether this is a past-the-end iterator
  node_ptr = node_ptr->next;
  return *this;
}

The function moves the iterator to the next element by following the next pointer of the current node. Once the iterator has been moved, the function returns the iterator by reference, in keeping with the pattern for prefix increment. [3]

The `typename` Keyword

The return type of operator++() is a reference to List<T>::Iterator; Iterator is a member of a template that is dependent on the template parameter T. C++ requires the typename keyword before a dependent name that refers to a type, so that the compiler knows that it is a type and not a value. [4]

The following illustrates when the typename keyword is required:

template <typename U>
void func() {
  IntList list;                     // not a qualified type
  IntList::Iterator it1;    // outer class does not depend on template parameter U
  List<int>::Iterator it2;  // outer class does not depend on template parameter U
  typename List<U>::Iterator it3;   // outer class depends on template parameter U
  int capacity =
    SortedSet<U>::MAX_CAPACITY;     // member is not a type
}

An unqualified name, one without the scope-resolution operator, never needs the typename keyword. In a qualified name, if the outer type does not depend on a template parameter, then no typename keyword is required. If the outer type does depend on a template parameter, then the typename keyword is required when the inner name refers to a type. If the inner name does not refer to a type, the typename keyword is erroneous to use, since it explicitly tells the compiler that the inner name is a type.

In practice, compilers can often determine when the typename keyword is required, and many C++ programmers rely on the compiler to tell them that it is needed rather than learning the rules:

$ g++ --std=c++17 foo.cpp
foo.cpp:5:20: error: expected ';' after expression
  List<U>::Iterator it;
                   ^
                   ;
foo.cpp:5:21: error: use of undeclared identifier 'it'
  List<U>::Iterator it;
                    ^
foo.cpp:5:3: error: missing 'typename' prior to dependent type name
      'List<int>::Iterator'
  List<U>::Iterator it;
  ^~~~~~~~~~~~~~~~~
foo.cpp:9:3: note: in instantiation of function template specialization
      'func<int>' requested here
  func<int>();
  ^
3 errors generated.

Equality Comparisons

Two iterators are defined as equal if either they are both past the end, or they “point to” the same element in the same list. Thus, they are equal exactly when their node_ptr members point to the same node.

_images/17_list_iterator_comparison.svg — Figure 73 Two iterators are equal when their node pointers store the same address, pointing to the same node.

Comparing two iterators does not require them to be dereferenceable or even pointing to a valid element. Thus, the operators do not have a REQUIRES clause.

template <typename T>
bool List<T>::Iterator::operator==(Iterator rhs) const {
  return node_ptr == rhs.node_ptr;
}

template <typename T>
bool List<T>::Iterator::operator!=(Iterator rhs) const {
  return node_ptr != rhs.node_ptr;
}

We do not need to qualify Iterator in the parameter type – once the compiler knows that we are defining a member of Iterator, we can refer to the class with an unqualified name.

Creating Iterators

We have defined the operator overloads for Iterator. However, we have not provided a means of creating an Iterator object. Without a user-defined constructor, we do get an implicit default constructor, but it just initializes node_ptr with a junk value. Instead, we define the following two constructors:

an explicit default constructor that makes the iterator a past-the-end iterator
a constructor that takes in a pointer to a node

The latter is a private constructor – the Node class is not part of the interface for the list, so the constructor that takes a node pointer also is not part of the interface, and the outside world would not be able to call it even if it were.

The constructor definitions are as follows:

template <typename T>
class List {
  ...
public:
  class Iterator {
  public:
    // EFFECTS: Constructs a past-the-end iterator.
    Iterator() : node_ptr(nullptr) {}

    ...
  private:
    // EFFECTS: Constructs an iterator from the given node pointer.
    Iterator(Node *node_ptr_in) : node_ptr(node_ptr_in) {}

    Node *node_ptr;
  };
};

We can now implement begin() and end() member functions in List that return a start and past-the-end iterator, respectively.

template <typename T>
class List {
  ...
public:
  // EFFECTS: Returns an iterator that points to the first element,
  //          or a past-the-end iterator if this list is empty.
  Iterator begin() {
    return Iterator(first);
  }

  // EFFECTS: Returns a past-the-end iterator.
  Iterator end() {
    return Iterator();
  }
};

The begin() function returns an iterator that points to the first element in the list. However, if the list is empty, first is null; this is the representation of a past-the-end iterator, so begin() returns such an iterator when the list is empty. The end() function returns a default-constructed past-the-end iterator.

Unfortunately, the implementation of begin() does not compile:

./List.hpp:122:12: error: calling a private constructor of class
      'List<int>::Iterator'
    return Iterator(first);
           ^

A private member is only accessible from within the scope of the class that defines the member. The private Iterator constructor is a member of Iterator, but it is being accessed from outside the Iterator class. On the other hand, Iterator is within the scope of List, so it can access private List members such as Node. Thus, a nested class can access private members of the outer class, but not vice versa.

Friend Declarations

The solution to the problem above is a friend declaration, in which a class gives an outside entity access to the class’s private members.

template <typename T>
class List {
  ...
public:
  class Iterator {
    friend class List;
    ...
  };
};

A friend declaration can appear anywhere directly within a class body, and it tells the compiler that the given entity is allowed to access private members. The entity may be a function or a type; for instance, it is common practice to define the insertion operator as a friend:

class Card {
  ...
  private:
    std::string rank;
    std::string suit;

    friend std::ostream & operator<<(std::ostream &os, const Card &card);
};

std::ostream & operator<<(std::ostream &os, const Card &card) {
  return os << card.rank << " of " << card.suit;
}

Friendship is given, not taken. For instance, class C may declare F as a friend:

friend class F;

This allows F to access the private members of C, but it does not allow C to access the private members of F.

The friend declaration completes our iterator implementation, so we can now perform a traversal by iterator:

List<int> list;
for (int i = 1; i < 5; ++i) {
  list.push_back(i);
}

for (List<int>::Iterator it = list.begin(); it != list.end(); ++it) {
  cout << *it << endl;
}

This is just an abstraction over a loop that uses a node pointer directly:

for (Node *node_ptr = list.first; node_ptr != nullptr;
     node_ptr = node_ptr->next) {
  cout << node_ptr->datum << endl;
}

The iterator starts at the first member, just like the traversal using a node pointer. Incrementing the iterator moves it to the next node, the same as in the loop above. Dereferencing the iterator accesses the respective node’s datum member, as in the body of the loop above. The traversal ends when the iterator reaches a null pointer, just like the node-pointer loop. Thus, the iterator provides the same traversal as can be done from within the List template, but with the interface of a pointer rather than having the user rely on implementation details.

Generic Iterator Functions

Iterators provide a common interface for traversing through a sequence of elements, and the standard-library sequences all support the iterator interface.

vector<int> vec = { 1, 2, 3, 4 };
for (vector<int>::iterator it = vec.begin(); it != vec.end(); ++it) {
  cout << *it << endl;
}

The common interface allows us to write generic functions that work on any sequence. For instance, we can write a function template for finding the maximum element in a sequence:

// REQUIRES: end is after or equal to begin
// EFFECTS:  Returns an iterator to the maximum element in [begin, end).
//           Returns begin when the sequence is empty.
template <typename Iter_type>
Iter_type max_element(Iter_type begin, Iter_type end) {
  Iter_type max_so_far = begin;
  for (; begin != end; ++begin) {
    if (*max_so_far < *begin) {
      max_so_far = begin;
    }
  }
  return max_so_far;
}

The max_element() function template takes a begin and end iterator of any type and returns an iterator to the maximum element that is between the two iterators. As is standard in C++, the end iterator is an exclusive bound – the function stops when it reaches end and does not dereference it. Then max_element() implements a standard find-the-max algorithm: start with the first element as the maximum so far, then update the max so far when a larger item is encountered. The iterators must be dereferenced to get to the elements to compare them, and the element type must support the < operator.

We can use max_element() with any iterator and element type, as long as the element type supports <:

vector<int> vec;
.. // fill vec with numbers
cout << *max_element(vec.begin(), vec.end()) << endl;

List<int> list;
.. // fill list with numbers
cout << *max_element(list.begin(), list.end()) << endl;

List<Card> cards;
.. // fill cards with Cards
cout << *max_element(cards.begin(), cards.end()) << endl;

int const SIZE = 10;
double arr[SIZE];
.. // arr fill with numbers
cout << *max_element(arr, arr + SIZE) << endl;

As usual with function templates, the compiler deduces the template parameter from the function arguments. The last example illustrates that we can even use max_element() with an array; since a pointer has the same interface as an iterator, we just need to provide max_element() with a pointer to the first element and another that is just past the end.

The standard <algorithm> library contains many function templates such as std::max_element() that operate on iterators. However, the standard-library function templates require an iterator to define several member type aliases to work:

template <typename T>
class List {
  ...
public:
  class Iterator {
    ...
  public:
    using iterator_category = std::input_iterator_tag;
    using value_type = T;
    using difference_type = void;
    using pointer = void;
    using reference = T &;
  };
};

A discussion of these type aliases is beyond the scope of this course.

As another example, the following is a function template that determines whether a sequence contains a duplicate item. It compares every pair of elements with the == operator to determine if any are equal:

template <typename Iter_type>
bool no_duplicates(Iter_type begin, Iter_type end) {
  for (; begin != end; ++begin) {
    Iter_type other = begin;        // copy iterator to current element
    ++other;                        // move copy one element forward
    for (; other != end; ++other) {
      if (*begin == *other) {       // compare element with those that come after
        return false;
      }
    }
  }
  return true;
}

Iterator Invalidation

Recall that a dangling pointer points to an invalid object, one that is no longer alive. Since iterators represent indirection to sequence elements, an iterator can also end up pointing to an invalid object. Such an iterator is said to be invalidated. The following is an example:

List<int> list;
list.push_back(3);
list.push_back(-5);
list.push_back(2);
list.push_back(4);

List<int>::Iterator iter = list.begin();
cout << *iter << endl;

list.pop_front();       // invalidates iter

cout << *iter << endl;  // UNDEFINED BEHAVIOR

The code constructs a list and inserts elements into it. It then creates an iterator that is pointing to the first element before proceeding to remove that element. This invalidates the iterator, so that dereferencing the iterator results in undefined behavior.

_images/17_invalidated_list_iterator.svg — Figure 74 Removing the element that an iterator is pointing to invalidates the iterator.

In general, modifying a sequence can invalidate existing iterators. An iterator may be invalidated even if the element it is pointing to is not removed. For instance, a vector iterator is invalidated when a grow operation occurs, since the element the iterator is pointing to moves to a different location. A member function’s documentation should indicate whether it may invalidate iterators, as in the cppreference.com documentation for std::vector::push_back():

If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.

Type Deduction

Suppose we wanted to write a function template that prints out the elements of a sequence. We could write it to work with iterators:

template <typename Iter_type>
void print_all(Iter_type begin, Iter_type end) {
  for (Iter_type it = begin; it != end; ++it) {
    cout << *it << endl;
  }
}

On the other hand, we desire an interface that works directly on a sequence container:

template <typename Sequence>
void print_all(const Sequence &sequence) {
  for (typename Sequence::Iterator it = sequence.begin();
       it != sequence.end(); ++it) {
    cout << *it << endl;
  }
}

Declaring the iterator type is very verbose, as it consists of both a qualified name and the typename keyword, since the qualified name is a dependent type. Furthermore, the declaration makes the assumption that the iterator type is named Iterator, which is not true for standard-library containers that use lower-case type names.

Rather than writing out the type of it explicitly, we can ask the compiler to deduce it for us by using the auto keyword:

template <typename Sequence>
void print_all(const Sequence &sequence) {
  for (auto it = sequence.begin(); it != sequence.end(); ++it) {
    cout << *it << endl;
  }
}

The compiler deduces the type from its initialization. If the return type of sequence.begin() is List<int>::Iterator, then the type of it is deduced to have type List<int>::Iterator. On the other hand, if the return type is vector<Duck>::iterator, then the type of it is deduced to be vector<Duck>::iterator. The return type need not be a nested type; if it is char *, then it will be deduced to have type char *. Thus, not only does type deduction save us keystrokes for complicated types, it also abstracts over how the types are defined.