Container ADTs and Polymorphism

We continue our discussion of container ADTs by examining several forms of polymorphism in the context of containers. We will start by looking at operator overloading, a form of ad hoc polymorphism, and then proceed to discuss parametric polymorphism.

Operator Overloading

C++ follows the philosophy that user-defined types should have the same access to language facilities as built-in types. Since operators can be used with built-in atomic types, C++ allows operators to be applied to class types through operator overloading.

Most operators in C++ can be overloaded. An operator overload requires at least one of the operands to be of class type [1] – the behavior of operators on atomic types cannot be changed.

When the compiler encounters an operator where at least one operand is of class type, it looks for a function whose name is operator followed by the symbol for the actual operator. For example, if + is applied to two IntSets, the compiler looks for a function with name operator+ that can be applied to two IntSet objects. For most operators, the function can either be a top-level function or a member of the type of the left-most operand, if it is of class type.

The following is a member function that defines the + operation to compute the union of two IntSets:

class IntSet {
  ...
public:
  IntSet operator+(const IntSet &rhs) const;
};

IntSet IntSet::operator+(const IntSet &rhs) const {
  IntSet result = *this;
  for (int i = 0; i < rhs.num_elements; ++i) {
    result.insert(rhs.elements[i]);
  }
  return result;
}

The function first copies over the receiver object into a new local IntSet. It then iterates over the elements in the other IntSet, inserting them into the result. As we saw last time, the insert() function only inserts an element if the set does not already contain the item.

We can now call the member function directly, or we can apply the + operator instead. When we do so, the compiler finds the overloaded operator+() and calls it for us.

int main() {
  IntSet set1;
  set1.insert(32);
  set1.insert(42);
  set1.insert(7);

  IntSet set2;
  set2.insert(12);
  set2.insert(-3);
  set2.insert(42);

  IntSet set3 = set1.operator+(set2);
  set3.print(cout);           // prints { 32, 42, 7, 12, -3 }

  IntSet set4 = set1 + set2;
  set4.print(cout);           // prints { 32, 42, 7, 12, -3 }
}

In theory, we could instead define the overloaded operator as a top-level function, declared as follows:

IntSet operator+(const IntSet &lhs, const IntSet &rhs);

However, we did not define an interface for iterating through the elements of an IntSet from outside the class, so we cannot actually implement this function.

Overloaded operators can take arguments by value or by reference, like any other function. In most cases, we pass the arguments by reference to avoid making a copy.

Though most operators can be overloaded either as top-level or member functions, there are some cases where we must use a top-level function:

The first operand is of atomic type. Atomic types are not classes, so they do not have member functions.
The first operand is of class type, but we do not have access to the class definition, so we cannot define a new member function.

An example of the latter is overloading the stream-insertion operator, where we do not have access to the definition of ostream. The following overloads insertion of an IntSet:

std::ostream & operator<<(std::ostream &os, const IntSet &set) {
  set.print(os);
  return os;
}

We saw previously that inserting to a stream evaluates back to the stream object. To support this properly, our overload returns the given stream object. It must return the object by reference:

Streams cannot be copied, so the code would not compile if it returned a stream by value.
Even if streams could be copied, we want to return the original stream object itself, not a copy.
Even if a copy would work, we would end up with object slicing, since os actually will refer to an object of a class that derives from ostream.

The parameters are in the same order as the operands, from left to right. The function need only call the print() member function on the IntSet and then return the given ostream object. Then we can insert an IntSet directly into a stream:

int main() {
  IntSet set;
  set.insert(32);
  set.insert(42);
  cout << set << endl;  // prints { 32, 42 }
}

In other cases, we need to define an operator as a member function:

If the overload needs access to private members, a member function would have access because it is part of the class. [2]

[2]
In the future, we will see that we can use a friend declaration to give an outside class or function access to private members. Friend declarations are sometimes used with operator overloads.
Some operators can only be overloaded as member functions: the assignment operator (=), the function-call operator (()), the subscript operator ([]), and the arrow operator (->). (We will see examples of overloading the first two operators later.)

As an example, let’s overload the subscript operator to check whether the given value is in the set. The following does so:

class IntSet {
  ...
public:
  bool operator[](int value) const;
};

bool IntSet::operator[](int value) const {
  return contains(value);
}

The following is an example of applying the operator:

int main() {
  IntSet set;
  set.insert(32);
  cout << set[32] << endl;  // prints 1 (true)
  cout << set[42] << endl;  // prints 0 (false)
}

Parametric Polymorphism

The IntSet container only holds elements that are of type int. Suppose we wanted another container that holds char values. One solution is to copy and paste the IntSet code, then change int to char everywhere it refers to the element type:

class CharSet {
public:
  // Maximum size of a set.
  static const int MAX_SIZE = 10;

  // EFFECTS:  Initializes this set to be empty.
  CharSet();

  // REQUIRES: size() < MAX_SIZE
  // MODIFIES: *this
  // EFFECTS:  Adds value to the set, if it isn't already in the set.
  void insert(char value);

  // MODIFIES: *this
  // EFFECTS:  Removes value from the set, if it is in the set.
  void remove(char value);

  // EFFECTS:  Returns whether value is in the set.
  bool contains(char value) const;

  // EFFECTS:  Returns the number of elements.
  int size() const;

  // EFFECTS:  Prints out the set in an arbitrary order.
  void print(std::ostream &os) const;

private:
  char elements[MAX_SIZE];
  int num_elements;
  // INVARIANTS:
  // 0 <= num_elements && num_elements <= MAX_SIZE
  // the first num_elements items in elements are the items in the set
  // the first num_elements items in elements contain no duplicates
};

This is not a very satisfying solution. It leads to duplication of nearly identical code. Furthermore, if we then wanted a set of doubles, we would have to define an almost identical DoubleSet, and so on.

We already know how to avoid code duplication when we have a value that can be different: add a function parameter that allows the user to specify the value they care about. The problem here is that our entity that differs is not a value, but a type. While function parameters can represent different argument values, we need another mechanism for specifying type arguments. The mechanism that C++ provides is a template.

A template is a model for producing code. We write a generic version, parameterized by one or more template parameters. The compiler then instantiates a specific version of the code by substituting arguments for the parameters and compiling the resulting code. We specify a template and its parameters by placing a template header before the entity that we are defining.

template <typename T>
class UnsortedSet {
  ...
};

The template header can go on the same line or the previous one: whitespace generally does not matter in C++. The header begins with the template keyword, followed by a parameter list surrounded by angle brackets. Within the parameter list, we introduce a template parameter by specifying the kind of entity the parameter can represent. The typename keyword indicates that the parameter is a type parameter. [3] The parameter name then follows. Here, we have chosen the name T, since we don’t have further information about the type. (Value_type or Element_type are other common names to use with a container.)

A template may have more than one parameter, and it can also have a parameter that is of integral type. The following is how the std::array template is declared:

template <typename T, std::size_t N>
class array;

Since the entity that follows the template header is a class, it is a class template. It takes two arguments, one for the element type and the other for the size of the container, which must be a compile-time constant. We can then create a std::array of 10 ints as follows:

std::array<int, 10> items;

The syntax for using a class template is to follow the name of the template with an argument list enclosed by angle brackets. We can similarly create and use an unsorted set of chars:

UnsortedSet<char> char_set;
char_set.insert('e');
char_set.insert('a');
char_set.insert('e');
cout << char_set << endl;   // prints { e, a }

In order for this to work, we write the definition of UnsortedSet to use the template parameter T for the element type. The scope of a template parameter is the entire entity that follows; if it is a class, then the scope is the entire class definition.

template <typename T>
class UnsortedSet {
public:
  // Maximum size of a set.
  static const int MAX_SIZE = 10;

  // EFFECTS:  Initializes this set to be empty.
  UnsortedSet();

  // REQUIRES: size() < MAX_SIZE
  // MODIFIES: *this
  // EFFECTS:  Adds value to the set, if it isn't already in the set.
  void insert(const T &value);

  // MODIFIES: *this
  // EFFECTS:  Removes value from the set, if it is in the set.
  void remove(const T &value);

  // EFFECTS:  Returns whether value is in the set.
  bool contains(const T &value) const;

  // EFFECTS:  Returns the number of elements.
  int size() const;

  // EFFECTS:  Prints out the set in an arbitrary order.
  void print(std::ostream &os) const;

private:
  T elements[MAX_SIZE];
  int num_elements;
  // INVARIANTS:
  // 0 <= num_elements && num_elements <= MAX_SIZE
  // the first num_elements items in elements are the items in the set
  // the first num_elements items in elements contain no duplicates
};

The actual type argument used to instantiate UnsortedSet may be something small like int, or it may be a large class type such as string. Thus, it is good practice to pass objects of a template parameter by reference rather than by value, avoiding copying potentially large objects.

When we use UnsortedSet with a particular type argument, the compiler actually plugs the argument in for T, generating code for that instantiation and compiling it with the rest of the program. For example, UnsortedSet<string> is instantiated as follows:

class UnsortedSet<string> {
public:
  // Maximum size of a set.
  static const int MAX_SIZE = 10;

  // EFFECTS:  Initializes this set to be empty.
  UnsortedSet();

  // REQUIRES: size() < MAX_SIZE
  // MODIFIES: *this
  // EFFECTS:  Adds value to the set, if it isn't already in the set.
  void insert(const string &value);

  // MODIFIES: *this
  // EFFECTS:  Removes value from the set, if it is in the set.
  void remove(const string &value);

  // EFFECTS:  Returns whether value is in the set.
  bool contains(const string &value) const;

  // EFFECTS:  Returns the number of elements.
  int size() const;

  // EFFECTS:  Prints out the set in an arbitrary order.
  void print(std::ostream &os) const;

private:
  string elements[MAX_SIZE];
  int num_elements;
  // INVARIANTS:
  // 0 <= num_elements && num_elements <= MAX_SIZE
  // the first num_elements items in elements are the items in the set
  // the first num_elements items in elements contain no duplicates
};

Function Templates

We can also define a function as a template, resulting in a function template. For example, the following computes the maximum of two items of the same type:

template <typename T>
T max(const T &value1, const T &value2) {
  return value2 > value1 ? value2 : value1;
}

As with a class template, we define a function template by preceding it with a template header, which introduces one or more template parameters. We can then use the parameter anywhere in the function definition. Here, our function template takes two arguments of the same type and compares them. The ?: operator is a conditional. It evaluates its first operand, and if the result is true, it produces the second operand. Otherwise, it produces the third operand.

Similar to a class template, we can use a function template by following its name with an argument list enclosed by angle brackets: [4]

int main() {
  int i1 = 3;
  int i2 = -3;
  cout << max<int>(i1, i2) << endl;
  cout << max<double>(3.1, 7.5) << endl;
}

Unlike a class template, however, the compiler is able in most cases to deduce the template argument from the arguments to the function call. In the first call above, the arguments are both of type int, so the compiler can deduce that we want int as the template argument. Similarly, the arguments are both of type double in the second call, so the compiler can deduce we want double. Thus, we can leave off the explicit template arguments:

int main() {
  int i1 = 3;
  int i2 = -3;
  cout << max(i1, i2) << endl;    // deduced as max<int>(i1, i2)
  cout << max(3.1, 7.5) << endl;  // deduced as max<double>(3.1, 7.5)
}

The max() function template can only be applied to a type that supports the > operator. If we try to call it on objects that don’t support the operator, the result is a compile error:

int main() {
  Duck d1("Donald");
  Duck d2("Scrooge");
  Duck best_duck = max(d1, d2);
  cout << best_duck.get_name() << " wins!" << endl;
}

This results in an error like the following:

main.cpp: In instantiation of 'T max(const T&, const T&) [with T = Duck]':
main.cpp:20:30:   required from here
main.cpp:14:17: error: no match for 'operator>' (operand types are 'const Duck' and 'const Duck')
   return value2 > value1 ? value2 : value1;
          ~~~~~~~^~~~~~~~

The offending line of code is the one marked as “required from here” – it is the line that attempts to call max() on two Duck objects. The compiler actually instantiates max<Duck>(), but the resulting instantiation produces errors when the compiler tries to compile it.

Compiling Templates

We saw previously that the C++ compiler only needs access to declarations when compiling code that uses a class or a function defined in some other source file. However, this is not the case for class and function templates. The compiler must actually instantiate the definitions for each set of template arguments, so it must have access to the full definition of a template.

To provide the compiler with access to the full definition of a template wherever it is used, we must arrange for the header file to contain the full definition. We can just define the template directly in the header file itself; it is still good practice to separate the declarations from the definitions for the benefit of anyone using our code.

// max.hpp

// EFFECTS: Returns the maximum of value1 and value2.
template <typename T>
T max(const T &value1, const T &value2);

...

template <typename T>
T max(const T &value1, const T &value2) {
  return value2 > value1 ? value2 : value1;
}

A better organization is to separate the definitions into their own file; common convention is to use a suffix like .tpp for this file. We can then use a #include directive to pull the code into the header file:

// max.hpp

// EFFECTS: Returns the maximum of value1 and value2.
template <typename T>
T max(const T &value1, const T &value2);

#include "max.tpp"

// max.tpp

template <typename T>
T max(const T &value1, const T &value2) {
  return value2 > value1 ? value2 : value1;
}

Code that uses the max module would then just #include "max.hpp", which would transitively include max.tpp.

Include Guards

In complex programs, it is often the case that a single file inadvertently #includes the same header file more than once. For instance, it may be that module A depends on module B and thus includes its header. Then module C depends on both A and B, so it includes both their headers. This results in module C including B’s header twice, which can cause compiler errors that a function or a class is defined more than once.

To avoid problems with a header being included more than once, headers generally have include guards that arrange for the compiler to ignore the code if the header is included a second time. The following is an example of include guards:

#ifndef MAX_HPP
#define MAX_HPP

// max.hpp

// EFFECTS: Returns the maximum of value1 and value2.
template <typename T>
T max(const T &value1, const T &value2);

#include "max.tpp"

#endif /* MAX_HPP */

The #ifndef and #define directives are the opening of the include guard, and the #endif closes it. The code within is ignored if the header is included again. [5]

Member-Function Templates

Defining a member function within the definition for a class template is no different than a member function within a class. Defining a member function outside the definition of a class template differs from that of a regular class; we must inform the compiler that the function is a member of a class template.

template <typename T>
bool UnsortedSet<T>::contains(const T &value) const {
  return indexOf(value) != -1;
}

template <typename T>
void UnsortedSet<T>::insert(const T &value) {
  assert(size() < MAX_SIZE);
  if (!contains(value)) {
    elements[num_elements] = value;
    ++num_elements;
  }
}

Here, we must tell the compiler that contains() is a member of UnsortedSet<T>. However, before we can use the name T, we need to introduce it with a template header. The scope of a template header is the entity that immediately follows; thus, each member-function definition needs its own template header.

Insertion-Operator Template

Now that we have an UnorderedSet class template, we can write an insertion operator that works on any instantiation by defining it as a function template:

template <typename T>
std::ostream & operator<<(std::ostream &os, const UnsortedSet<T> &set) {
  set.print(os);
  return os;
}

The function template is parameterized by the element type of an UnorderedSet, and we use the template parameter as part of the type for the second function parameter. The compiler is able to deduce the template argument when we insert a set into a stream:

UnsortedSet<char> char_set;
char_set.insert('e');
char_set.insert('a');
char_set.insert('e');
cout << char_set << endl;   // prints { e, a }

Another Example

Let us write a function template that copies elements from an array into a set. We would like to use it as follows:

int main() {
  UnsortedSet<int> set1;
  int arr1[4] = { 1, 2, 3, 2 };
  fill_from_array(set1, arr1, 4);
  cout << set1 << endl;              // prints { 1, 2, 3 }

  UnsortedSet<char> set2;
  char arr2[3] = { 'a', 'b', 'a' };  // prints { a, b }
  fill_from_array(set2, arr2, 3);
}

The fill_from_array() template must be parameterized by the element type, and we use that type for both the set and array function parameters:

template <typename T>
void fill_from_array(UnsortedSet<T> &set, const T *arr, int size) {
  for (int i = 0; i < size; ++i) {
    set.insert(arr[i]);
  }
}

The set is passed by reference, to avoid making a copy and to allow the function to modify the original set. The array decays into a pointer when it is passed, and we declare the pointer as a pointer to const, since we do not need to modify the array elements. The body of the function template just iterates over the array, inserting each element into the set; the set ignores insertion of duplicate values, so there is no need to check whether the set already contains a value before inserting it.

Static vs. Dynamic Polymorphism

The template mechanism gives us parametric polymorphism; a template type parameter can take on the form of any type. The compiler instantiates the code at compile time, so we also refer to this as static polymorphism. Compare this with subtype polymorphism, where the program resolves virtual function calls at runtime; we therefore use dynamic polymorphism to also refer to it.

In designing an ADT, we have the choice of static polymorphism, dynamic polymorphism, or both. For instance, SortedSet and UnsortedSet share the same interface, so we could define a common base interface and make use of dynamic binding. However, dynamic polymorphism comes with a runtime performance cost; a virtual function call generally requires two extra pointer dereferences compared to a non-virtual call. For a container, a program may make many calls to insert items, remove items, check whether an item is in the container, and so on. Thus, the general convention in C++ is to avoid dynamic polymorphism with containers.

Another reason why dynamic polymorphism is not a good fit for a container is that we usually make the decision of which container to use when we write our code, not when we run it. However, it is still good practice to write our code in such a way that we can easily change the container type we are using. Ideally, we would have a single location in our code that needs to change if we decide to use something else. We’ve seen this pattern already when it comes to a value that is used in many places: define a constant or variable, and use the associated name rather than the value directly. Thus, introducing a new name for an entity is a powerful form of abstraction, allowing us to avoid hardcoding the entity everywhere in our code. C++ gives us this capability for types with type aliases.

Type Aliases

A type alias is a new name for an existing type. We can introduce a type alias with a using declaration:

using UnsortedIntSet = UnsortedSet<int>;

The syntax for introducing a type alias with a using declaration is the using keyword, followed by the name we want to introduce, the = symbol, and the existing type that we want the name to alias. The example above introduces the name UnsortedIntSet as an alias for the type UnsortedSet<int>.

A simple alias can also be defined with the typedef keyword:

typedef UnsortedSet<int> UnsortedIntSet;

The syntax is in the reverse order of a using declaration: the existing type first, then the new alias.

A using declaration (but not a typedef) can also define an alias template, which introduces a new name that is a template for a set of types. The following is an example:

template <typename T>
using Set = UnsortedSet<T>;

Similar to any template, we introduce an alias template by placing a template header with the requisite parameters before the using declaration. Then we can use Set in the same way we would a class template:

template <typename T>
using Set = UnsortedSet<T>;

template <typename T>
void fill_from_array(Set<T> &set, const T *arr, int n);

int main() {
  Set<int> set1;
  int arr1[4] = { 1, 2, 3, 2 };
  fill_from_array(set1, arr1, 4);

  Set<char> set2;
  char arr2[3] = { 'a', 'b', 'a' };
  fill_from_array(set2, arr2, 3);
}

The compiler treats Set<int> and Set<char> as UnsortedSet<int> and UnsortedSet<char>, since Set is an alias for UnsortedSet. If we decide to use sorted sets instead, there is just one place we need to change:

template <typename T>
using Set = SortedSet<T>;

Now the compiler will treat Set<int> and Set<char> as SortedSet<int> and SortedSet<char>.

The Factory Pattern

With dynamic polymorphism, we can defer the decision of which derived class to use until runtime. For example, the following code asks the user for what color bird they want, then calls a factory function to create the actual object:

int main() {
  string color;
  cin >> color;
  Bird *bird = Bird_factory(color, "Myrtle");
  cout << "Bird " << bird->get_name() << " is " << bird->get_age()
       << " and says: " << bird->talk() << endl;
  delete bird;

  cin >> color;
  bird = Bird_factory(color, "Heihei");
  cout << "Bird " << bird->get_name() << " is " << bird->get_age()
       << " and says: " << bird->talk() << endl;
  delete bird;
}

The factory function creates an object of the appropriate derived type and returns a pointer to it. It cannot create the object in local memory – a local object dies when the function that created it returns, so it would not live past the call to Bird_factory(). Instead, we create it in dynamic memory with the new operator:

Bird * Bird_factory(const string &color, const string &name) {
  if (color == "blue") {
    return new BlueBird(name);
  } else if (color == "black") {
    return new Raven(name);
  }
  ...
}

We then use the delete operator on the object’s address when we no longer need it.

We will start looking at dynamic memory in more detail next time.

Liskov Substitution Principle

When designing an inheritance hierarchy, an important property is substitutability of a derived-class object for a base-class one. This notion is formalized by the Liskov substitution principle, which states that in order for type \(S\) to be a subtype of type \(T\), the following requirement must be met:

Any property of objects of type \(T\) should also be a property of objects of type \(S\).

—Barbara Liskov, MIT

This implies that in any code that depends on \(T\)’s interface, an object of type \(S\) can be substituted without any undesirable effects. If the requirement above is not satisfied, \(S\) would be a derived type of \(T\) but not a subtype.

A classic example is a ToyDuck class that derives from Duck:

class ToyDuck : public Duck {
  ...
public:
  // EFFECTS: Prints out quack if battery level >= 10.
  void talk() const override {
    if (battery_level >= 10) {
      cout << "quack" << endl;
      --battery_level;
    }
  }
};

Code that uses a Duck may assume that a quacking noise gets printed out each time talk() is called. A ToyDuck would violate this expectation, since it does not print anything out if the ToyDuck has insufficient battery. Thus, ToyDuck is a derived type of Duck, but not a subtype.

_images/lsp.jpg — Figure 49 Liskov substitution principle. Credit: Derick Bailey, retrieved from Stack Overflow