Linked Lists
A sequential container is a container that holds elements and allows them to be accessed in sequential order. The simplest sequential container is a built-in array, which stores its elements contiguously in memory:
const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by pointer
for (int *ptr = array; ptr != array + SIZE; ++ptr) {
cout << *ptr << endl;
}
We can also traverse over arrays by index, using random access to get to an element:
const int SIZE = 5;
int array[SIZE] = { 1, 2, 3, 4, 5 };
// iterate over elements in order with traversal by index
for (int i = 0; i < SIZE; ++i) {
cout << array[i] << endl;
}
The fact that the elements are stored contiguously makes random access efficient: it translates directly to pointer arithmetic, so that it takes the same amount of time to access an element in the middle of the array as one at the beginning.
cout << *(array + i) << endl;
However, arrays have significant limitations: their size is fixed upon creation, and for non-dynamic arrays, the size must be known at compile time.
A vector is a class template that provides an abstraction over a
dynamically allocated array. Like UnsortedSet
, it grows as needed
by swapping out an old array for a new one. A vector also provides an
interface for iterating over the elements in order:
vector<int> data;
// add elements to the vector
for (int i = 1; i <= 5; ++i) {
data.push_back(i);
}
// iterate over elements in order with traversal by iterator
for (vector<int>::iterator it = data.begin(); it != vec.end(); ++it) {
cout << *it << endl;
}
We will cover iterators and traversal by iterator next time.
Sequence abstractions based on contiguous memory have the drawback of inefficient insertion at the beginning or middle of the sequence. For instance, to insert an item at a particular position, we need to first move the elements at that position and later out of the way, as shown in Figure 62.
Thus, insertion at the beginning or middle is a linear-time operation.
On the other hand, if we give up the abstraction of contiguous memory, we can place a new item anywhere in memory. Since the elements are no longer stored contiguously, we cannot move to the next element by just incrementing an address. Instead, we have to explicitly keep track of where the next element is through a pointer.
As illustrated in Figure 63, inserting in the middle now just involves changing the value of a pointer.
For each element in our sequence, we need to keep track of both the datum itself as well as a pointer to the next piece of the sequence. This is heterogeneous data, so we use a node class-type object to keep track of the two values:
struct Node {
int datum;
Node *next;
};
We define the Node
type as plain-old data, using the struct
keyword and accessing the members directly. For now, we will stick to
int
as the element type; later, we will convert our abstraction
into a template. The remaining Node
member is a pointer to the
next Node
in our sequence.
The sequence of Node
objects in Figure 64 represents the
elements ( 1 2 3 4 )
.
We use a null pointer as a sentinel denoting the end of the
sequence, storing that in the next
pointer of the last node. We
refer to the sequence of nodes as a linked list, as it consists
of a list of items linked together by pointers.
Rather than exposing the nodes directly as part of an abstract
data type, we define an IntList
ADT that internally maintains
a sequence of nodes. It need only keep track of the location of
the first Node
, since we can get to the others by following
next
pointers:
class IntList {
...
private:
Node *first;
}
Since the Node
objects are stored indirectly from the IntList
object, they must be in dynamic memory, and IntList
must ensure
that their memory is properly managed.
Before we consider the representation of IntList
further, we first
write some code that uses IntList
in order to work out its
interface:
int main() {
IntList list; // ( )
list.push_front(1); // ( 1 )
list.push_front(2); // ( 2 1 )
list.push_front(3); // ( 3 2 1 )
cout << list.front(); // 3
list.front() = 4; // ( 4 2 1 )
list.print(cout); // 4 2 1
list.pop_front(); // ( 2 1 )
list.pop_front(); // ( 1 )
list.pop_front(); // ( )
cout << list.empty(); // true (or 1)
}
The class should have a default constructor that makes the list empty.
It should also have a push_front()
member function to insert an
item at the beginning, as well as front()
to retrieve the first
item. The latter must support both reading and writing the first
element. The remaining member functions are print()
,
pop_front()
to remove the first item, and empty()
. The
following is the resulting interface:
class IntList {
public:
// EFFECTS: Constructs an empty list.
IntList();
// EFFECTS: Returns true if the list is empty.
bool empty() const;
// REQUIRES: the list is not empty
// EFFECTS: Returns (by reference) the first element.
int & front();
// EFFECTS: Inserts datum at the front of the list.
void push_front(int datum);
// REQUIRES: the list is not empty
// EFFECTS: Removes the first element.
void pop_front();
// MODIFIES: os
// EFFECTS: Prints the items in the list, each followed by a space.
void print(std::ostream &os) const;
...
};
Notice that Node
appears nowhere in the interface. It is an
implementation detail, so it should be defined as a private member of
IntList
. This makes it a nested class, which is a class (or
struct) defined as a member of another class.
class IntList {
...
private:
struct Node {
int datum;
Node *next;
};
Node *first;
};
Now that we have a representation, the next step is to determine the
representation invariants. We have already decided to use a null
pointer to denote the end of the sequence of nodes. Similarly, we can
store a null pointer in first
to represent an empty list. Thus,
the invariants are as follows: [1]
first
is either null or a pointer to a validNode
the
next
member of all but the last node points to another validNode
the
next
member of the last node is null
We can now proceed to implement the constructor and the remaining member functions.
The constructor must ensure that the list is initialized to satisfy
the representation invariants. Since the default constructor makes the
list empty, it must initialize first
to be null:
IntList::IntList() : first(nullptr) {}
The empty()
function just needs to check whether first
is
null:
bool IntList::empty() const {
return first == nullptr; // or just return !first;
}
For front()
, we will first assert the REQUIRES clause that the
list not be empty. If that is satisfied, the representation invariants
tell us that first
is pointing to a valid node. Its datum
member holds the first element.
int & IntList::front() {
assert(!empty());
return first->datum;
}
The return by reference is necessary to allow code such as:
list.front() = 4;
The left-hand side of an assignment must be an object, not a temporary
value, so front()
must return an object by reference.
For push_front()
, we need to consider two cases:
The list is empty, in which case
first
is null.The list is not empty, in which case
first
points to a valid node.
In both cases, we construct a new Node
in dynamic memory, set its
datum
to be the value that is being inserted, set its next
to
be the existing value of first
, and set first
to point to the
new Node
.
void IntList::push_front(int datum) {
Node *p = new Node;
p->datum = datum;
p->next = first;
first = p;
}
A more succinct way to accomplish this is with an initializer list
that directly initializes each member of the new Node
; this is
permitted since Node
is a C-style ADT.
void IntList::push_front(int datum) {
first = new Node{ datum, first };
}
In pop_front()
, we again assert the REQUIRES clause. We should
then consider two possible cases:
The list has one item, in which case it will end up as empty.
The list has more than one item, and it will not end up empty.
In the former case, the sole node is the last one, and the
representation invariants tell us that its next
is null. Thus,
setting first
to the node’s next
suitably makes the list
empty. In the second case, the first node’s next
is pointing to
the second node in the list, so first
should end up pointing at
that node when the first one is removed. Thus, in both cases, we need
to assign the value of first->next
to first
.
We also need to properly manage dynamic memory; in particular, we must delete the node that is being removed, as it is memory that no longer will be used.
Consider the following implementation of pop_front()
:
void IntList::pop_front() {
assert(!empty());
delete first;
first = first->next;
}
This code dereferences first
through the arrow operator after the
underlying object has been deleted, resulting in undefined behavior.
Let’s try to fix this by reordering the statements:
void IntList::pop_front() {
assert(!empty());
first = first->next;
delete first;
}
Now, the assignment to first
orphans what previously was the first
node; we no longer have a means to get to that object, resulting in a
memory leak. The code then proceeds to delete what used to be the
second node.
Neither ordering works correctly on its own. What we need is a
temporary to keep track of the first node, so that we can change the
value of first
but still be able to delete the node that is being
removed. The following are two correct implementations:
void IntList::pop_front() {
assert(!empty());
Node *victim = first; // temporary keeps track of old first
first = first->next;
delete victim;
}
void IntList::pop_front() {
assert(!empty());
Node *new_first = first->next; // temporary keeps track of new first
delete first;
first = new_first;
}
Figure 67 illustrates the first implementation of
pop_front()
.
Traversing a Linked List
Iterating over a linked list’s elements from outside the class
requires an iterator abstraction that we will
see next time: the Node
struct is private, so external code cannot
use Node
s to iterate through the list. Code within IntList
,
on the other hand, does have access to Node
, so it can traverse
the list by starting at the first
member and following each node’s
next
pointer until reaching the null sentinel. The following
print()
member function uses this strategy:
void IntList::print(std::ostream &os) const {
for (Node *node_ptr = first; node_ptr; node_ptr = node_ptr->next) {
os << node_ptr->datum << " ";
}
}
The loop initializes node_ptr
as a copy of first
. If the list
is empty, first
is null, so node_ptr
will be initialized to
null as well. The truth value of a null pointer is false, so the
condition of the loop will evaluate to false and the loop will exit.
(Alternatively, node_ptr
can be compared to nullptr
instead:
node_ptr != nullptr
.)
If the list is not empty, node_ptr
will not be initially null, and
its truth value will be true. The body then executes, and it uses the
->
operator to access the datum
member of the node that
node_ptr
points to. The loop update then copies the node’s
next
pointer into node_ptr
, moving node_ptr
to point to
the next node. When the iteration reaches the last node in the list,
its next
is null, so the update sets node_ptr
to be null. This
results in the loop condition being false, so the loop exits.
Linked List Big Three
Inserting an element into a list allocates a Node
in dynamic
memory, so the list must properly manage the node objects to avoid a
memory leak. The compiler-generated destructor does not free the
nodes’ memory, so we need to write a custom destructor instead. The
law of the big three then tells us that
we need to write a custom copy constructor and assignment operator as
well.
Both the destructor and the assignment operator must free the list’s
resources, and both the copy constructor and assignment operator
perform a deep copy of another list’s resources. To avoid code
duplication, we write a pop_all()
member function to free all the
elements and their nodes, and a push_all()
function to copy all
the elements from another list. These functions are not part of the
interface, so we write them as private members:
class IntList {
...
private:
// EFFECTS: Removes all the elements from this list.
void pop_all();
// EFFECTS: Adds all elements from other into this list.
void push_all(const IntList &other);
};
With these two functions, we can write the big three as follows:
IntList::IntList(const IntList &other) : IntList() {
push_all(other);
}
IntList & IntList::operator=(const IntList &rhs) {
if (this != &rhs) {
pop_all();
push_all(rhs);
}
return *this;
}
IntList::~IntList() {
pop_all();
}
The copy constructor delegates to the
default constructor to make the list empty, then calls push_all()
to copy all the elements from the other list. The assignment operator
does a self-assignment check, calls pop_all()
to remove all the
existing elements, and invokes push_all()
to copy the elements
from the other list. The destructor just calls pop_all()
to remove
all the elements and free their associated nodes.
The implementation of pop_all()
is straightforward: we already
have a pop_front()
function that removes a single item, so we just
need to repeatedly call it until the list is empty:
void IntList::pop_all() {
while (!empty()) {
pop_front();
}
}
The push_all()
function needs to iterate over each of the elements
in the other list, adding them one by one to the current list. We can
follow the same iteration pattern we used in print()
, and we have
a push_front()
function that adds a single element:
void IntList::push_all(const IntList &other) {
for (Node *node_ptr = other.first; node_ptr; node_ptr = node_ptr->next) {
push_front(node_ptr->datum);
}
}
However, this implementation ends up inserting the elements in reverse
order: if other
contains the elements ( 1 2 3 )
, the code
would insert 1, then 2 before that, then 3 before that, resulting in
( 3 2 1 )
.
Rather than inserting each element at the beginning of the list, we
need to insert at the end with a push_back()
function.
void IntList::push_all(const IntList &other) {
for (Node *node_ptr = other.first; node_ptr; node_ptr = node_ptr->next) {
push_back(node_ptr->datum);
}
}
Insertion and Removal at the End
With our current list representation, push_back()
must traverse
the entire list to insert the new element:
void IntList::push_back(int datum) {
Node *new_node = new Node{ datum, nullptr };
if (empty()) {
first = new_node;
} else {
Node *node_ptr = first;
for (; node_ptr->next; node_ptr = node_ptr->next); // find last node
node_ptr->next = new_node; // set last node's next to new_node
}
}
This is a very inefficient algorithm; the core problem is our list
implementation only keeps track of the first node, but we need to
insert after the last node. We can make push_back()
much more
efficient by changing our list representation to keep track of the
last node as well:
class IntList {
...
private:
Node *first;
Node *last;
};
We update our representation invariants so that an empty list is
represented by both last
and first
being null, and that
last
points to the last node in the sequence for a non-empty list.
We can then write push_back()
as follows:
void IntList::push_back(int datum) {
Node *new_node = new Node{ datum, nullptr };
if (empty()) {
first = last = new_node;
} else {
last = last->next = new_node;
}
}
In the case of inserting into an empty list, we need to set both
first
and last
pointing at the new node. (We would need to
modify pop_front()
to do this as well.) When inserting at the end
of a non-empty list, we have to set the next
pointer of what used
to be the last node to point to the new node, and we must also update
the last
member of the IntList
so that it points to the new
node.
Now that we have push_back()
, the logical next step is to provide
pop_back()
as well. Unfortunately, pop_back()
requires setting
the last
pointer to point to what used to be the second-to-last
node, and we can only get to that node by traversing the entire list
from the beginning. Here, the problem is that our nodes only have a
next
pointer, so they allow us to traverse in the forward
direction but not in reverse. We can modify our representation once
more to enable backwards traversal by adding a prev
pointer to
each node:
class IntList {
...
private:
struct Node {
int datum;
Node *prev;
Node *next;
};
Node *first;
Node *last;
};
This is now a doubly linked list, since each node has two links, one to the previous node and one to the next node. Our original list is a singly linked list.
The standard library provides both implementations:
std::forward_list
is a singly linked list, while std::list
is
a doubly linked list. The former uses less space than the latter,
since the nodes don’t have a pointer to the previous node, but it does
not provide the ability to iterate backwards over the list.
List Template
We can generalize the IntList
class to hold objects of other types
by making it a template. Each instantiation is still homogeneous:
List<int>
only holds int
s, List<string>
only holds
string
s, and so on.
template <typename T>
class List {
public:
List();
void empty() const;
T & front();
void push_front(const T &datum);
void pop_front();
void push_back(const T &datum);
void pop_back();
...
private:
struct Node {
T datum;
Node *prev;
Node *next;
};
Node *first;
Node *last;
};
By placing the Node
struct inside the List
template, each
instantiation of List
will have its own Node
type;
List<int>::Node
, List<string>::Node
, and List<Duck>::Node
are all distinct types.
Now that our container is a template, we pass elements by reference.
For List<int>
, the elements are small enough to copy, but for
class-type elements such as in List<string>
or List<Duck>
, we
want to avoid making unnecessary copies. A good rule of thumb is to
pass a function parameter by reference if the function parameter’s
type is a template parameter, since the latter may be instantiated
with a large class type.