The Big Three
We saw last time that copying an UnsortedSet
ultimately results in
undefined behavior. Before we fix the issue, we need to understand how
C++ copies class-type objects and what mechanisms it provides to
control the behavior of a copy.
By default, a copy just copies over the members one by one, as we saw when we first discussed class-type objects:
Person elise = { "Elise", 22, true };
Person tali = elise;
The result is shown in Figure 58.
The code above is an example of making a copy when initializing a new object. There are several forms of syntax we can use for initialization as a copy:
Person tali = elise;
Person tali(elise);
Person tali{elise};
Person tali = {elise};
Initialization as a copy also occurs when we pass an object by value:
void func(Person p);
int main() {
Person elise = { "Elise", 22, true };
func(elise);
}
The parameter p
is associated with an object that lives in the
activation record for func()
, and the object is initialized when
the function is called. It is initialized as a copy of the argument
value elise
.
Just like we can pass an object by value, we can also return an object by value, which generally makes a copy:
Person func2() {
Person elise = { "Elise", 22, true };
return elise;
}
int main() {
Person tali = func2();
}
Copy Constructor
We also previously discussed that a constructor is always called when
a class-type object is created (except for C-style ADTs when the
members are initialized directly, like elise
above). Copy
initialization of a class-type object also invokes a constructor,
specifically the copy constructor for the class. The following is an
example of explicitly defining a copy constructor:
class Example {
public:
Example(int x_in, double y_in)
: x(x_in), y(y_in) {}
Example(const Example &other)
: x(other.x), y(other.y) {
cout << "Example copy ctor: " << x << ", " << y << endl;
}
int get_x() const {
return x;
}
double get_y() const {
return y;
}
private:
int x;
double y;
};
The second constructor above is the copy constructor, and it takes a reference to an existing object as the parameter. The parameter must be passed by reference – otherwise, a copy will be done to initialize the parameter, which itself will invoke the copy constructor, which will call the copy constructor to initialize its parameter, and so on.
We have instrumented the copy constructor to print a message when it is called. Thus, the code
int main() {
Example e1(2, -1.3);
Example e2 = e1;
}
will print the following when run:
Example copy ctor: 2, -1.3
The program invokes the copy constructor to initialize e2
from
e1
.
The C++ compiler provides an implicit copy constructor if a
user-defined one is not present. The implicit copy constructor just
does a member-by-member copy, so in the case of Example
, it acts
like the following:
Example(const Example &other)
: x(other.x), y(other.y) {}
Assignment Operator
Assignment is another situation where an object is copied; unlike initialization, however, assignment copies into an existing object rather than a new one. The following is an example:
int main() {
Example e1(2, -1.3);
Example e2(3, 4.1);
e2 = e1;
}
An assignment expression consists of a left-hand-side object, the
=
operator, and a right-hand-side object or value. The expression
evaluates the right-hand side, copies it into the left-hand-side
object, and then evaluates back to the left-hand-side object. We can
then use the result in a larger expression:
cout << (e2 = e1).get_x() << endl; // prints 2
Like most operators, the assignment operator can be overloaded; the overload must be a member function of the type of the left-hand side.
class Example {
public:
Example & operator=(const Example &rhs);
...
};
The function takes in an Example
by reference to const,
corresponding to the right-hand operand of the assignment. [1] The
return value will be the left-hand-side object itself, and it needs to
be returned by reference rather than by value (the latter would make a
copy rather than returning the object itself). Thus, the return type
is Example &
.
The following is a definition of the overloaded assignment operator that just does a member-by-member copy:
Example & Example::operator=(const Example &rhs) {
x = rhs.x;
y = rhs.y;
return *this;
}
The two members are individually copied from the right-hand side. The
left-hand-side object must be returned; we need to dereference the
this
pointer to get to the object to which it is pointing.
Like the copy constructor, the compiler provides an implicit definition of the assignment operator if a user-defined one is not present. Like the implicitly defined copy constructor, the implicit assignment operator performs a member-by-member copy.
Shallow and Deep Copies
For most class types, a member-by-member copy is sufficient, and there
is no need to write a custom copy constructor or assignment operator.
However, for a type that manages a dynamic resource, a
member-by-member copy generally results in incorrect sharing of a
resource. For example, consider the following code that copies an
UnsortedSet
:
int main() {
UnsortedSet<int> s1;
for (int i = 0; i < 5; ++i) {
s1.insert(i);
}
UnsortedSet<int> s2 = s1;
cout << s1 << endl; // prints { 0, 1, 2, 3, 4 }
cout << s2 << endl; // prints { 0, 1, 2, 3, 4 }
s2.insert(5); // causes a grow
cout << s2 << endl; // prints { 0, 1, 2, 3, 4, 5 }
cout << s1 << endl; // UNDEFINED BEHAVIOR
}
The initialization of s2
calls the implicit copy constructor,
which does a member-by-member copy, as if it were written as follows:
template <typename T>
UnsortedSet<T>::UnsortedSet(const UnsortedSet &other)
: elements(other.elements), capacity(other.capacity),
num_elements(other.num_elements) {}
The result is that s1.elements
and s2.elements
point to the
same array, as depicted by Figure 59.
Inserting 5 into s2
causes a grow operation, which creates a new
array and deletes the old one. The result is shown in
Figure 60.
Then printing out s1
accesses the old, deleted array, resulting in
undefined behavior.
The fundamental problem is that the implicitly defined member-by-member copy is a shallow copy: it copies the pointer to the array of elements, rather than following it and copying the array as a whole. This violates the representation invariant that each set has its own array. Instead, we need a deep copy, where we make a copy of the underlying resource rather than having the two sets share the same resource. To obtain a deep copy, we need to provide a custom implementation of the copy constructor:
template <typename T>
class UnsortedSet {
public:
UnsortedSet(const UnsortedSet &other);
...
};
template <typename T>
UnsortedSet<T>::UnsortedSet(const UnsortedSet &other)
: elements(new T[other.capacity]), // create new array
capacity(other.capacity), // shallow copy non-resources
num_elements(other.num_elements) {
for (int i = 0; i < num_elements; ++i) { // copy over the elements
elements[i] = other.elements[i];
}
}
Rather than copying the elements
pointer, we initialize the new
set’s member to point to the start of a dynamic array of the same
capacity as other
‘s. The members that don’t represent resources
are just copied directly (capacity
and num_elements
). The body
of the constructor copies each element from the old set to the new
one. The result is that each set has its own, independent copy of
the elements, as shown in Figure 61.
The custom copy constructor provides a deep copy in the case of initializing a new set as a copy. We need a deep copy in assignment as well:
s2 = s1;
Thus, we need a custom assignment operator in addition to a custom copy constructor. The following is a first attempt at defining one:
template <typename T>
class UnsortedSet {
public:
UnsortedSet & operator=(const UnsortedSet &rhs);
...
};
template <typename T>
UnsortedSet<T> & UnsortedSet<T>::operator=(const UnsortedSet &rhs) {
delete[] elements; // delete old array
elements = new T[rhs.capacity]; // make new array of the required size
capacity = rhs.capacity; // shallow copy non-resources
num_elements = rhs.num_elements;
for (int i = 0; i < num_elements; ++i) { // copy over the elements
elements[i] = rhs.elements[i];
}
return *this; // return LHS object
}
The function first deletes the old array before creating a new one. It
may be the case that the size of the rhs
set is larger than the
capacity of the set receiving the copy, in which case creating a new
array is necessary. The function then makes a shallow copy of the
non-resources, followed by copying over each of the elements. Finally,
it returns the left-hand-side object.
While this assignment operator works in most cases, it fails in the
case of self assignment. An expression such as s2 = s2
will
delete s2.elements
before proceeding to access the elements in the
subsequent loop. Instead, the assignment should have no effect when
both operands are the same object, so we need to check for this case
before doing any work. We do so as follows:
template <typename T>
UnsortedSet<T> & UnsortedSet<T>::operator=(const UnsortedSet &rhs) {
if (this == &rhs) { // self-assignment check
return *this;
}
delete[] elements; // delete old array
elements = new T[rhs.capacity]; // make new array of the required size
capacity = rhs.capacity; // shallow copy non-resources
num_elements = rhs.num_elements;
for (int i = 0; i < num_elements; ++i) { // copy over the elements
elements[i] = rhs.elements[i];
}
return *this; // return LHS object
}
The this
pointer points to the left-hand-side operand, while the
parameter rhs
is an alias for the right-hand-side operand. We need
to obtain the address of rhs
to compare to the address stored in
the this
pointer. If the two addresses are the same, then the two
operands are the same object, so we immediately return the
left-hand-side object. (We cannot return rhs
because it is
declared as a reference to const, while the return type is not.)
The Law of the Big Three
We have seen that the implicitly defined copy constructor and assignment operator both do a shallow copy, and that this behavior is incorrect for classes that manage a dynamic resource. Instead, we need a deep copy, which requires us to write our own custom versions of the two functions.
We also saw last time that resource management requires us to write our own custom destructor as well. It is not a coincidence that we needed to write custom versions of all three of the destructor, copy constructor, and assignment operator. The Law of the Big Three (also know as the Rule of Three) is a rule of thumb in C++ that if a custom version of any of the destructor, copy constructor, or assignment operator is required, almost certainly all three need to be custom. We refer to these three members as the big three.
By default, C++ provides implicit definitions of each of the big three:
The implicitly defined destructor does no special cleanup; it is equivalent to a destructor with an empty body. Like other destructors, it does implicitly destroy the members as well as the base class, if there is one.
The implicitly defined copy constructor does a member-by-member shallow copy.
The implicitly defined assignment operator does a member-by-member shallow copy.
When a class manages a resource, however, the programmer must provide custom versions of the big three:
The destructor should free the resources.
The copy constructor should make a deep copy of the resources and shallow copy the non-resources.
The assignment operator should:
Do a self-assignment check.
Free the old resources.
Make a deep copy of the right-hand-side object’s resources.
Shallow copy the non-resources from the right-hand-side object.
Return the left-hand-side object with
*this
.
Example: Calls to the Big Three
To better understand when the big three are invoked, we instrument the big three for the following class to print a message:
class Foo {
public:
Foo(const string &str_in) : str(str_in) {
cout << "Foo ctor " << str << endl;
}
Foo(const Foo &other) : str(other.str) {
cout << "Foo copy ctor " << str << endl;
}
Foo & operator=(const Foo &rhs) {
cout << "Foo assign " << rhs.str << " to " << str << endl;
str = rhs.str;
return *this;
}
~Foo() {
cout << "Foo dtor " << str << endl;
}
private:
string str;
};
The Foo
class has a str
member variable that we will use to
distinguish between different Foo
objects. The constructors,
assignment operator, and destructor all print the value of str
.
Let us take a look at a small program that uses Foo
:
void func(const Foo &x, Foo y) {
Foo z = x;
}
int main() {
Foo a("apple");
Foo b("banana");
Foo c("craisin");
func(a, b);
Foo c2 = c;
c2 = c;
}
This prints the following when run:
Foo ctor apple
Foo ctor banana
Foo ctor craisin
Foo copy ctor banana
Foo copy ctor apple
Foo dtor apple
Foo dtor banana
Foo copy ctor craisin
Foo assign craisin to be craisin
Foo dtor craisin
Foo dtor craisin
Foo dtor banana
Foo dtor apple
In main()
, the variables a
, b
, and c
are constructed
in order, resulting in the Foo ctor
outputs. Then func()
is
called. The first parameter is passed by reference, so no copy is
made. The second parameter is passed by value, so the parameter object
is initialized as a copy of the argument. This invokes the copy
constructor, which prints Foo copy ctor banana
. Within the body of
func()
, z
is initialized as a copy of x
, resulting in
Foo copy ctor apple
. When func()
returns, the local objects
are destructed in reverse order of construction: z
is destructed
first, producing Foo dtor apple
, then y
is destructed,
printing Foo dtor banana
.
Continuing in main()
, c2
is initialized as a copy of c
,
invoking the copy constructor and printing Foo copy ctor craisin
.
In the next line, c2
already exists, so the expression is an
assignment and calls the assignment operator, which prints Foo
assign craisin to craisin
. At the end of main()
, the local
objects are destructed in reverse order: c2
, then c
, then
b
, and then a
. The last four lines of output are the result.
Destructors and Polymorphism
We saw previously that applying the delete
operator to a pointer
follows the pointer and kills the object at the given address. If
the object is of class type, its destructor is run. However, a subtle
issue arises when the static and dynamic type of the object do not
match: does the destructor use static or dynamic binding? Consider
the following example:
class Base {
public:
virtual void add(int x) = 0;
~Base() {
cout << "Base dtor" << endl;
}
};
class Derived : public Base {
public:
void add(int x) override {
items.push_back(x);
}
~Derived() {
cout << "Derived dtor" << endl;
}
private:
vector<int> items;
};
int main() {
Base *bptr = new Derived;
bptr->add(3);
delete bptr;
}
Running this code results in:
$ ./a.out
Base dtor
The destructor is statically bound, so ~Base()
is invoked rather
than ~Derived()
. This is problematic: even though Derived
itself is not managing dynamic memory, its member variable items
is. Since the destructor for Derived
was not called, the members
introduced by Derived
were also not destructed. Running Valgrind on the program illustrates the issue:
$ valgrind --leak-check=full ./a.out
==13359== Memcheck, a memory error detector
==13359== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13359== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==13359== Command: ./a.out
==13359==
Base dtor
==13359==
==13359== HEAP SUMMARY:
==13359== in use at exit: 4 bytes in 1 blocks
==13359== total heap usage: 4 allocs, 3 frees, 73,764 bytes allocated
==13359==
==13359== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==13359== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13359== by 0x1098CD: __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) (in /home/akamil/tmp/a.out)
==13359== by 0x109782: std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) (in /home/akamil/tmp/a.out)
==13359== by 0x1095BB: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (in /home/akamil/tmp/a.out)
==13359== by 0x109163: void std::vector<int, std::allocator<int> >::_M_realloc_insert<int const&>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&) (in /home/akamil/tmp/a.out)
==13359== by 0x109035: std::vector<int, std::allocator<int> >::push_back(int const&) (in /home/akamil/tmp/a.out)
==13359== by 0x108F65: Derived::add(int) (in /home/akamil/tmp/a.out)
==13359== by 0x108E63: main (in /home/akamil/tmp/a.out)
==13359==
==13359== LEAK SUMMARY:
==13359== definitely lost: 4 bytes in 1 blocks
==13359== indirectly lost: 0 bytes in 0 blocks
==13359== possibly lost: 0 bytes in 0 blocks
==13359== still reachable: 0 bytes in 0 blocks
==13359== suppressed: 0 bytes in 0 blocks
==13359==
==13359== For counts of detected and suppressed errors, rerun with: -v
==13359== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
The error message indicates that the memory allocated by the items
vector is never freed.
The solution is to use dynamic binding for the destructor instead by declaring it as virtual:
class Base {
public:
virtual void add(int x) = 0;
virtual ~Base() {
cout << "Base dtor" << endl;
}
};
class Derived : public Base {
public:
void add(int x) override {
items.push_back(x);
}
~Derived() { // virtual implicitly inherited
cout << "Derived dtor" << endl;
}
private:
vector<int> items;
};
int main() {
Base *bptr = new Derived;
bptr->add(3);
delete bptr;
}
Now the following is printed when the code is run:
$ ./a.out
Derived dtor
Base dtor
Valgrind also no longer reports an error.
In general, a base class should always declare the destructor to be virtual if the class will be used polymorphically (meaning that a base-class pointer or reference may be bound to a derived-class object). If the base class destructor has no work to do, it may be defaulted. The derived class destructor need not be explicitly defined, since “virtualness” is inherited even if the destructor is implicitly defined by the compiler.
class Base {
public:
virtual void add(int x) = 0;
virtual ~Base() = default; // use implicitly defined dtor, but make it virtual
};
class Derived : public Base {
public:
void add(int x) override {
items.push_back(x);
}
// implicitly defined dtor inherits virtual
private:
vector<int> items;
};
int main() {
Base *bptr = new Derived;
bptr->add(3);
delete bptr;
}
Valgrind does not show a memory leak:
$ valgrind --leak-check=full ./a.out
==13479== Memcheck, a memory error detector
==13479== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13479== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==13479== Command: ./a.out
==13479==
==13479==
==13479== HEAP SUMMARY:
==13479== in use at exit: 0 bytes in 0 blocks
==13479== total heap usage: 3 allocs, 3 frees, 72,740 bytes allocated
==13479==
==13479== All heap blocks were freed -- no leaks are possible
==13479==
==13479== For counts of detected and suppressed errors, rerun with: -v
==13479== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)