Linked Lists

A sequential container is a container that holds elements and allows them to be accessed in sequential order. The simplest sequential container is a built-in array, which stores its elements contiguously in memory:

const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by pointer
for (int *ptr = array; ptr != array + SIZE; ++ptr) {
  cout << *ptr << endl;
}

We can also traverse over arrays by index, using random access to get to an element:

const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by index
for (int i = 0; i < SIZE; ++i) {
  cout << array[i] << endl;
}

The fact that the elements are stored contiguously makes random access efficient: it translates directly to pointer arithmetic, so that it takes the same amount of time to access an element in the middle of the array as one at the beginning.

cout << *(array + i) << endl;

However, arrays have significant limitations: their size is fixed upon creation, and for non-dynamic arrays, the size must be known at compile time.

A vector is a class template that provides an abstraction over a dynamically allocated array. Like UnsortedSet, it grows as needed by swapping out an old array for a new one. A vector also provides an interface for iterating over the elements in order:

vector<int> data;
// add elements to the vector
for (int i = 1; i <= 5; ++i) {
  data.push_back(i);
}
// iterate over elements in order with traversal by iterator
for (vector<int>::iterator it = data.begin(); it != vec.end(); ++it) {
  cout << *it << endl;
}

We will cover iterators and traversal by iterator next time.

Sequence abstractions based on contiguous memory have the drawback of inefficient insertion at the beginning or middle of the sequence. For instance, to insert an item at a particular position, we need to first move the elements at that position and later out of the way, as shown in Figure 63.

_images/16_contiguous_insert_middle.svg

Figure 63 Inserting in the middle of a contiguous data structure requires elements to be shifted out of the way.

Thus, insertion at the beginning or middle is a linear-time operation.

On the other hand, if we give up the abstraction of contiguous memory, we can place a new item anywhere in memory. Since the elements are no longer stored contiguously, we cannot move to the next element by just incrementing an address. Instead, we have to explicitly keep track of where the next element is through a pointer.

_images/16_noncontiguous_insert_middle.svg

Figure 64 A noncontiguous data structure can use pointers to keep track of each element’s location. Inserting in the middle can be done by just modifying a pointer.

As illustrated in Figure 64, inserting in the middle now just involves changing the value of a pointer.

For each element in our sequence, we need to keep track of both the datum itself as well as a pointer to the next piece of the sequence. This is heterogeneous data, so we use a node class-type object to keep track of the two values:

struct Node {
  int datum;
  Node *next;
};

We define the Node type as plain-old data, using the struct keyword and accessing the members directly. For now, we will stick to int as the element type; later, we will convert our abstraction into a template. The remaining Node member is a pointer to the next Node in our sequence.

The sequence of Node objects in Figure 65 represents the elements ( 1 2 3 4 ).

_images/16_nodes.svg

Figure 65 Nodes representing a sequence of four elements. A null pointer represents the end of the sequence.

We use a null pointer as a sentinel denoting the end of the sequence, storing that in the next pointer of the last node. We refer to the sequence of nodes as a linked list, as it consists of a list of items linked together by pointers.

Rather than exposing the nodes directly as part of an abstract data type, we define an IntList ADT that internally maintains a sequence of nodes. It need only keep track of the location of the first Node, since we can get to the others by following next pointers:

class IntList {
  ...
private:
  Node *first;
}
_images/16_list.svg

Figure 66 An IntList object and its nodes.

Since the Node objects are stored indirectly from the IntList object, they must be in dynamic memory, and IntList must ensure that their memory is properly managed.

Before we consider the representation of IntList further, we first write some code that uses IntList in order to work out its interface:

int main() {
  IntList list;             // ( )
  list.push_front(1);       // ( 1 )
  list.push_front(2);       // ( 2 1 )
  list.push_front(3);       // ( 3 2 1 )

  cout << list.front();     // 3

  list.front() = 4;         // ( 4 2 1 )

  list.print(cout);         // 4 2 1

  list.pop_front();         // ( 2 1 )
  list.pop_front();         // ( 1 )
  list.pop_front();         // ( )

  cout << list.empty();     // true (or 1)
}

The class should have a default constructor that makes the list empty. It should also have a push_front() member function to insert an item at the beginning, as well as front() to retrieve the first item. The latter must support both reading and writing the first element. The remaining member functions are print(), pop_front() to remove the first item, and empty(). The following is the resulting interface:

class IntList {
public:
  // EFFECTS: Constructs an empty list.
  IntList();

  // EFFECTS:  Returns true if the list is empty.
  bool empty() const;

  // REQUIRES: the list is not empty
  // EFFECTS:  Returns (by reference) the first element.
  int & front();

  // EFFECTS:  Inserts datum at the front of the list.
  void push_front(int datum);

  // REQUIRES: the list is not empty
  // EFFECTS:  Removes the first element.
  void pop_front();

  // MODIFIES: os
  // EFFECTS:  Prints the items in the list, each followed by a space.
  void print(std::ostream &os) const;

  ...
};

Notice that Node appears nowhere in the interface. It is an implementation detail, so it should be defined as a private member of IntList. This makes it a nested class, which is a class (or struct) defined as a member of another class.

class IntList {
  ...
private:
  struct Node {
    int datum;
    Node *next;
  };

  Node *first;
};

Now that we have a representation, the next step is to determine the representation invariants. We have already decided to use a null pointer to denote the end of the sequence of nodes. Similarly, we can store a null pointer in first to represent an empty list. Thus, the invariants are as follows: [1]

  • first is either null or a pointer to a valid Node

  • the next member of all but the last node points to another valid Node

  • the next member of the last node is null

We can now proceed to implement the constructor and the remaining member functions.

The constructor must ensure that the list is initialized to satisfy the representation invariants. Since the default constructor makes the list empty, it must initialize first to be null:

IntList::IntList() : first(nullptr) {}

The empty() function just needs to check whether first is null:

bool IntList::empty() const {
  return first == nullptr;     // or just return !first;
}

For front(), we will first assert the REQUIRES clause that the list not be empty. If that is satisfied, the representation invariants tell us that first is pointing to a valid node. Its datum member holds the first element.

int & IntList::front() {
  assert(!empty());
  return first->datum;
}

The return by reference is necessary to allow code such as:

list.front() = 4;

The left-hand side of an assignment must be an object, not a temporary value, so front() must return an object by reference.

For push_front(), we need to consider two cases:

  • The list is empty, in which case first is null.

  • The list is not empty, in which case first points to a valid node.

_images/16_push_front.svg

Figure 67 Inserting an element to the front of an empty and non-empty list.

In both cases, we construct a new Node in dynamic memory, set its datum to be the value that is being inserted, set its next to be the existing value of first, and set first to point to the new Node.

void IntList::push_front(int datum) {
  Node *p = new Node;
  p->datum = datum;
  p->next = first;
  first = p;
}

A more succinct way to accomplish this is with an initializer list that directly initializes each member of the new Node; this is permitted since Node is a C-style ADT.

void IntList::push_front(int datum) {
  first = new Node{ datum, first };
}

In pop_front(), we again assert the REQUIRES clause. We should then consider two possible cases:

  • The list has one item, in which case it will end up as empty.

  • The list has more than one item, and it will not end up empty.

In the former case, the sole node is the last one, and the representation invariants tell us that its next is null. Thus, setting first to the node’s next suitably makes the list empty. In the second case, the first node’s next is pointing to the second node in the list, so first should end up pointing at that node when the first one is removed. Thus, in both cases, we need to assign the value of first->next to first.

We also need to properly manage dynamic memory; in particular, we must delete the node that is being removed, as it is memory that no longer will be used.

Consider the following implementation of pop_front():

void IntList::pop_front() {
  assert(!empty());
  delete first;
  first = first->next;
}

This code dereferences first through the arrow operator after the underlying object has been deleted, resulting in undefined behavior. Let’s try to fix this by reordering the statements:

void IntList::pop_front() {
  assert(!empty());
  first = first->next;
  delete first;
}

Now, the assignment to first orphans what previously was the first node; we no longer have a means to get to that object, resulting in a memory leak. The code then proceeds to delete what used to be the second node.

Neither ordering works correctly on its own. What we need is a temporary to keep track of the first node, so that we can change the value of first but still be able to delete the node that is being removed. The following are two correct implementations:

void IntList::pop_front() {
  assert(!empty());
  Node *victim = first;           // temporary keeps track of old first
  first = first->next;
  delete victim;
}
void IntList::pop_front() {
  assert(!empty());
  Node *new_first = first->next;  // temporary keeps track of new first
  delete first;
  first = new_first;
}

Figure 68 illustrates the first implementation of pop_front().

_images/16_pop_front.svg

Figure 68 Removing an element from the front of a list.

Traversing a Linked List

Iterating over a linked list’s elements from outside the class requires an iterator abstraction that we will see next time: the Node struct is private, so external code cannot use Nodes to iterate through the list. Code within IntList, on the other hand, does have access to Node, so it can traverse the list by starting at the first member and following each node’s next pointer until reaching the null sentinel. The following print() member function uses this strategy:

void IntList::print(std::ostream &os) const {
  for (Node *node_ptr = first; node_ptr; node_ptr = node_ptr->next) {
    os << node_ptr->datum << " ";
  }
}

The loop initializes node_ptr as a copy of first. If the list is empty, first is null, so node_ptr will be initialized to null as well. The truth value of a null pointer is false, so the condition of the loop will evaluate to false and the loop will exit. (Alternatively, node_ptr can be compared to nullptr instead: node_ptr != nullptr.)

If the list is not empty, node_ptr will not be initially null, and its truth value will be true. The body then executes, and it uses the -> operator to access the datum member of the node that node_ptr points to. The loop update then copies the node’s next pointer into node_ptr, moving node_ptr to point to the next node. When the iteration reaches the last node in the list, its next is null, so the update sets node_ptr to be null. This results in the loop condition being false, so the loop exits.

_images/16_node_traversal.svg

Figure 69 Traversal of the nodes in a list, starting at first, following next pointers, and ending at the null sentinel.

Linked List Big Three

Inserting an element into a list allocates a Node in dynamic memory, so the list must properly manage the node objects to avoid a memory leak. The compiler-generated destructor does not free the nodes’ memory, so we need to write a custom destructor instead. The law of the big three then tells us that we need to write a custom copy constructor and assignment operator as well.

Both the destructor and the assignment operator must free the list’s resources, and both the copy constructor and assignment operator perform a deep copy of another list’s resources. To avoid code duplication, we write a pop_all() member function to free all the elements and their nodes, and a push_all() function to copy all the elements from another list. These functions are not part of the interface, so we write them as private members:

class IntList {
  ...
private:
  // EFFECTS:  Removes all the elements from this list.
  void pop_all();

  // EFFECTS:  Adds all elements from other into this list.
  void push_all(const IntList &other);
};

With these two functions, we can write the big three as follows:

IntList::IntList(const IntList &other) : IntList() {
  push_all(other);
}

IntList & IntList::operator=(const IntList &rhs) {
  if (this != &rhs) {
    pop_all();
    push_all(rhs);
  }
  return *this;
}

IntList::~IntList() {
  pop_all();
}

The copy constructor delegates to the default constructor to make the list empty, then calls push_all() to copy all the elements from the other list. The assignment operator does a self-assignment check, calls pop_all() to remove all the existing elements, and invokes push_all() to copy the elements from the other list. The destructor just calls pop_all() to remove all the elements and free their associated nodes.

The implementation of pop_all() is straightforward: we already have a pop_front() function that removes a single item, so we just need to repeatedly call it until the list is empty:

void IntList::pop_all() {
  while (!empty()) {
    pop_front();
  }
}

The push_all() function needs to iterate over each of the elements in the other list, adding them one by one to the current list. We can follow the same iteration pattern we used in print(), and we have a push_front() function that adds a single element:

void IntList::push_all(const IntList &other) {
  for (Node *node_ptr = other.first; node_ptr; node_ptr = node_ptr->next) {
    push_front(node_ptr->datum);
  }
}

However, this implementation ends up inserting the elements in reverse order: if other contains the elements ( 1 2 3 ), the code would insert 1, then 2 before that, then 3 before that, resulting in ( 3 2 1 ).

Rather than inserting each element at the beginning of the list, we need to insert at the end with a push_back() function.

void IntList::push_all(const IntList &other) {
  for (Node *node_ptr = other.first; node_ptr; node_ptr = node_ptr->next) {
    push_back(node_ptr->datum);
  }
}

Insertion and Removal at the End

With our current list representation, push_back() must traverse the entire list to insert the new element:

void IntList::push_back(int datum) {
  Node *new_node = new Node{ datum, nullptr };
  if (empty()) {
    first = new_node;
  } else {
    Node *node_ptr = first;
    for (; node_ptr->next; node_ptr = node_ptr->next);  // find last node
    node_ptr->next = new_node;        // set last node's next to new_node
  }
}

This is a very inefficient algorithm; the core problem is our list implementation only keeps track of the first node, but we need to insert after the last node. We can make push_back() much more efficient by changing our list representation to keep track of the last node as well:

class IntList {
  ...
private:
  Node *first;
  Node *last;
};
_images/16_double_ended_list.svg

Figure 70 Double-ended list representation with first and last pointers.

We update our representation invariants so that an empty list is represented by both last and first being null, and that last points to the last node in the sequence for a non-empty list.

We can then write push_back() as follows:

void IntList::push_back(int datum) {
  Node *new_node = new Node{ datum, nullptr };
  if (empty()) {
    first = last = new_node;
  } else {
    last = last->next = new_node;
  }
}

In the case of inserting into an empty list, we need to set both first and last pointing at the new node. (We would need to modify pop_front() to do this as well.) When inserting at the end of a non-empty list, we have to set the next pointer of what used to be the last node to point to the new node, and we must also update the last member of the IntList so that it points to the new node.

Now that we have push_back(), the logical next step is to provide pop_back() as well. Unfortunately, pop_back() requires setting the last pointer to point to what used to be the second-to-last node, and we can only get to that node by traversing the entire list from the beginning. Here, the problem is that our nodes only have a next pointer, so they allow us to traverse in the forward direction but not in reverse. We can modify our representation once more to enable backwards traversal by adding a prev pointer to each node:

class IntList {
  ...
private:
  struct Node {
    int datum;
    Node *prev;
    Node *next;
  };

  Node *first;
  Node *last;
};
_images/16_doubly_linked_list.svg

Figure 71 Doubly linked list with next and prev pointers in each node.

This is now a doubly linked list, since each node has two links, one to the previous node and one to the next node. Our original list is a singly linked list.

The standard library provides both implementations: std::forward_list is a singly linked list, while std::list is a doubly linked list. The former uses less space than the latter, since the nodes don’t have a pointer to the previous node, but it does not provide the ability to iterate backwards over the list.

List Template

We can generalize the IntList class to hold objects of other types by making it a template. Each instantiation is still homogeneous: List<int> only holds ints, List<string> only holds strings, and so on.

template <typename T>
class List {
public:
  List();

  void empty() const;

  T & front();

  void push_front(const T &datum);

  void pop_front();

  void push_back(const T &datum);

  void pop_back();

  ...

private:
  struct Node {
    T datum;
    Node *prev;
    Node *next;
  };

  Node *first;
  Node *last;
};

By placing the Node struct inside the List template, each instantiation of List will have its own Node type; List<int>::Node, List<string>::Node, and List<Duck>::Node are all distinct types.

Now that our container is a template, we pass elements by reference. For List<int>, the elements are small enough to copy, but for class-type elements such as in List<string> or List<Duck>, we want to avoid making unnecessary copies. A good rule of thumb is to pass a function parameter by reference if the function parameter’s type is a template parameter, since the latter may be instantiated with a large class type.