Polymorphism

The word polymorphism literally means “many forms.” In the context of programming, polymorphism refers to the ability of a piece of code to behave differently depending on the context in which it is used. Appropriately, there are several forms of polymorphism:

  • ad hoc polymorphism, which refers to function overloading

  • parametric polymorphism in the form of templates

  • subtype polymorphism, which allows a derived-class object to be used where a base-class object is expected

The unqualified term “polymorphism” usually refers to subtype polymorphism.

We proceed to discuss ad hoc and subtype polymorphism, deferring parametric polymorphism until later.

Function Overloading

Ad hoc polymorphism refers to function overloading, which is the ability to use a single name to refer to many different functions in a single scope. C++ allows both top-level functions and member functions to be overloaded. The following is an example of overloaded member functions:

class Base {
public:
  void foo(int a);
  int foo(string b);
};

int main() {
  Base b;
  b.foo(42);
  b.foo("test");
}

When we invoke an overloaded function, the compiler resolves the function call by comparing the types of the arguments to the parameters of the candidate functions and finding the best match. The call b.foo(42) calls the member function foo() with parameter int, since 42 is an int. The call b.foo("test") calls the function with parameter string"test" actually has type const char *, but a string parameter is a better match for a const char * than int.

In C++, functions can only be overloaded when defined within the same scope. If functions of the same name are defined in a different scope, then those that are defined in a closer scope hide the functions defined in a further scope:

class Derived : public Base {
public:
  int foo(int a);
  double foo(double b);
};

int main() {
  Derived d;
  d.foo("test"); // ERROR
}

When handling the member access d.foo, under the name-lookup process we saw last time, the compiler finds the name foo in Derived. It then applies function-overload resolution; however, none of the functions with name foo can be invoked on a const char *, resulting in a compile error. The functions inherited from Base are not considered, since they were defined in a different scope.

Function overloading requires the signatures of the functions to differ, so that overload resolution can choose the overload with the most appropriate signature. Here, “signature” refers to the function name and parameter types – the return type is not part of the signature and is not considered in overload resolution.

class Person {
public:
  void greet();
  void greet(int x);            // OK
  void greet(string x);         // OK
  void greet(int x, string s);  // OK
  void greet(string s, int x);  // OK
  bool greet();                 // ERROR: signature the same as the first overload
  void greet() const;           // OK: implicit this parameter different
};

For member functions, the const keyword after the parameter list is part of the signature – it changes the implicit this parameter from being a pointer to non-const to a pointer to const. Thus, it is valid for two member-function overloads to differ solely in whether or not they are declared as const.

Subtype Polymorphism

Subtype polymorphism allows a derived-class object to be used where a base-class one is expected. In order for this to work, however, we need indirection. Consider what happens if we directly copy a Chicken object into a Bird:

int main() {
  Chicken chicken("Myrtle");
  // ...
  Bird bird = chicken;
}

While C++ allows this, the value of a Chicken does not necessarily fit into a Bird object, since a Chicken has more member variables than a Bird. The copy above results in object slicing – the members defined by Bird are copied, but the Chicken ones are not, as illustrated in Figure 37.

_images/10_slicing.svg

Figure 37 Object slicing copies only the members defined by the base class.

To avoid slicing, we need indirection through a reference or a pointer, so that we avoid making a copy:

Bird &bird_ref = chicken;
Bird *bird_ptr = &chicken;

The above initializes bird_ref as an alias for the chicken object. Similarly, bird_ptr is initialized to hold the address of the chicken object. In either case, a copy is avoided.

C++ allows a reference or pointer of a base type to refer to an object of a derived type. It allows implicit upcasts, which are conversions that go upward in the inheritance hierarchy, such as from Chicken to Bird, as in the examples above. On the other hand, implicit downcasts are prohibited:

Chicken &chicken_ref = bird_ref;   // ERROR: implicit downcast
Chicken *chicken_ptr = bird_ptr;   // ERROR: implicit downcast

The implicit downcasts are prohibited by C++ even though bird_ref and bird_ptr actually refer to Chicken objects. In the general case, they can refer to objects that aren’t of Chicken type, such as Duck or just plain Bird objects. Since the conversions may be unsafe, they are disallowed by the C++ standard.

While implicit downcasts are prohibited, we can do explicit downcasts with static_cast:

Chicken &chicken_ref = static_cast<Chicken &>(bird_ref);
Chicken *chicken_ptr = static_cast<Chicken *>(bird_ptr);

These conversions are unchecked at runtime, so we need to be certain from the code that the underlying object is a Chicken.

In order to be able to bind a base-class reference or pointer to a derived-class object, the inheritance relationship must be accessible. From outside the classes, this means that the derived class must publicly inherit from the derived class. Otherwise, the outside world is not allowed to take advantage of the inheritance relationship. Consider this example:

class A {
};

class B : A {     // default is private when using the class keyword
};

int main() {
  B b;
  A *a_ptr = &b;  // ERROR: inheritance relationship is private
}

This results in a compiler error:

main.cpp:9:16: error: cannot cast 'B' to its private base class 'A'
    A *a_ptr = &b;  // ERROR: inheritance relationship is private
               ^
main.cpp:4:13: note: implicitly declared private here
  class B : A {     // default is private when using the class keyword
            ^
1 error generated.

Static and Dynamic Binding

Subtype polymorphism allows us to pass a derived-class object to a function that expects a base-class object:

void Image_init(Image* img, istream& is);

int main() {
  Image *image = /* ... */;
  istringstream input(/* ... */);
  Image_init(image, input);
}

Here, we have passed an istringstream object to a function that expects an istream. Extracting from the stream will use the functionality that istringstream defines for extraction.

Another common use case is to have a container of base-class pointers, each of which points to different derived-class objects:

void all_talk(Bird *birds[], int length) {
  for (int i = 0; i < length; ++i) {
    array[i]->talk();
  }
}

int main() {
  Chicken c1 = /* ... */;
  Duck d = /* ... */;
  Chicken c2 = /* ... */;
  Bird *array[] = { &c1, &d, &c2 };
  all_talk(array, 3);
}

Unfortunately, given the way we defined the talk() member function of Bird last time, this code will not use the derived-class versions of the function. Instead, all three calls to talk() will use the Bird version:

$ ./main.exe
tweet
tweet
tweet

In the invocation array[i]->talk(), the declared type of the receiver, the object that is receiving the member-function call, is different from the actual runtime type. The declared or static type is Bird, while the runtime or dynamic type is Chicken when i == 0. This disparity can only exist when we have indirection, either through a reference or a pointer.

For a particular member function, C++ gives us the option of either static binding where the compiler determines which function to call based on the static type of the receiver, or dynamic binding, where the program also takes the dynamic type into account. The default is static binding, since it is more efficient and can be done entirely at compile time.

In order to get dynamic binding instead, we need to declare the member function as virtual in the base class:

class Bird {
  ...
  virtual void talk() const {
    cout << "tweet" << endl;
  }
};

Now when we call the all_talk() function above, the compiler will use the dynamic type of the receiver in the invocation array[i]->talk():

$ ./main.exe
bawwk
quack
bawwk

The virtual keyword is necessary in the base class, but optional in the derived classes. It can only be applied to the declaration within a class; if the function is subsequently defined outside of the class, the definition cannot include the virtual keyword:

class Bird {
  ...
  virtual void talk() const;
};

void Bird::talk() const {
  cout << "bawwk" << endl;
}

dynamic_cast

With dynamic binding, the only change we need to make to our code is to add the virtual keyword when declaring the base-class member function. No changes are required to the actual function calls (e.g. in all_talk()).

Consider an alternative to dynamic binding, where we manually check the runtime type of an object to call the appropriate function. In C++, a dynamic_cast conversion checks the dynamic type of the receiver object:

Chicken chicken("Myrtle");
Bird *b_ptr = &chicken;
Chicken *c_ptr = dynamic_cast<Chicken *>(b_ptr);
if (c_ptr) {  // check for null
  // do something chicken-specific
}

If the dynamic type is not actually a Chicken, the conversion results in a null pointer. Otherwise, it results in the address of the Chicken object. Thus, we can check for null after the conversion to determine if it succeeded.

There are two significant issues with dynamic_cast:

  1. It generally results in messy and unmaintainable code. For instance, we would need to modify all_talk() as follows to use dynamic_cast rather than dynamic binding:

    void all_talk(Bird * birds[], int length) {
      for (int i = 0; i < length; ++i) {
        Chicken *c_ptr = dynamic_cast<Chicken*>(birds[i]);
        if (c_ptr) {
          c_ptr->talk();
        }
        Duck *d_ptr = dynamic_cast<Duck*>(birds[i]);
        if (d_ptr) {
          d_ptr->talk();
        }
        Eagle *e_ptr = dynamic_cast<Eagle*>(birds[i]);
        if (e_ptr) {
          e_ptr->talk();
        }
        ...
      }
    }
    

    We would need a branch for every derived type of Bird, and we would have to add a new branch every time we wrote a new derived class. The code also takes time that is linear in the number of derived classes.

  2. In C++, dynamic_cast can only be applied to classes that are polymorphic, meaning that they define at least one virtual member function. Thus, we need to use virtual one way or another.

Code that uses dynamic_cast is usually considered to be poorly written. Almost universally, it can be rewritten to use dynamic binding instead.

Member Lookup Revisited

We have already seen that when a member is accessed on an object, the compiler first looks in the object’s class for a member of that name before proceeding to its base class. With indirection, the following is the full lookup process:

  1. The compiler looks up the member in the static type of the receiver object, using the lookup process we discussed before (starting in the class itself, then looking in the base class if necessary). It is an error if no member of the given name is found in the static type or its base types.

  2. If the member found is an overloaded function, then the arguments of the function call are used to determine which overload is called.

  3. If the member is a variable or non-virtual function (including static member functions, which we will see later), the access is statically bound at compile time.

  4. If the member is a virtual function, the access uses dynamic binding. At runtime, the program will look for a function of the same signature, starting at the dynamic type of the receiver, then proceeding to its base type if necessary.

As indicated above, dynamic binding requires two conditions to be met to use the derived-class version of a function:

  • The member function found at compile time using the static type must be virtual.

  • The derived-class function must have the same signature as the function found at compile time.

When these conditions are met, the derived-class function overrides the base-class one – it will be used instead of the base-class function when the dynamic type of the receiver is the derived class. If these conditions are not met, the derived-class function hides the base-class one – it will only be used if the static type of the receiver is the derived class.

As an example, consider the following class hierarchy:

class Top {
public:
  int f1() const {
    return 1;
  }

  virtual int f2() const {
    return 2;
  }
};

class Middle : public Top {
public:
  int f1() const {
    return 3;
  }

  virtual int f2() const {
    return 4;
  }
};

class Bottom : public Middle {
public:
  int f1() const {
    return 5;
  }

  virtual int f2() const {
    return 6;
  }
};

Each class has a non-virtual f1() member function; since the function is non-virtual, the derived-class versions hide the ones in the base classes. The f2() function is virtual, so the derived-class ones override the base-class versions.

The following are some examples of invoking these functions:

int main() {
  Top top;
  Middle mid;
  Bottom bot;
  Top *top_ptr = &bot;
  Middle *mid_ptr = &mid;

  cout << top.f2() << endl;       // prints 2
  cout << mid.f1() << endl;       // prints 3
  cout << top_ptr->f1() << endl;  // prints 1
  cout << top_ptr->f2() << endl;  // prints 6
  cout << mid_ptr->f2() << endl;  // prints 4
  mid_ptr = &bot;
  cout << mid_ptr->f1() << endl;  // prints 3
  cout << mid_ptr->f2() << endl;  // prints 6
}

We discuss each call in turn:

  • There is no indirection in the calls top.f1() and mid.f1(), so there is no difference between the static and dynamic types of the receivers. The former calls the Top version of f1(), resulting in 2, while the latter calls the Middle version, producing 3.

  • The static type of the receiver in top_ptr->f1() and top_ptr->f2() is Top, while the dynamic type is Bottom. Since f1() is non-virtual, static binding is used, resulting in 1. On the other hand, f2() is virtual, so dynamic binding uses the Bottom version, producing 6.

  • In the first call to mid_ptr->f2(), both the static and dynamic type of the receiver is Middle, so Middle‘s version is used regardless of whether f2() is virtual. The result is 4.

  • The assignment mid_ptr = &bot changes the dynamic type of the receiver to Bottom in calls on mid_ptr. The static type remains Middle, so the call mid_ptr->f1() results in 3. The second call to mid_ptr->f2(), however, uses dynamic binding, so the Bottom version of f2() is called, resulting in 6.

The override Keyword

A common mistake when attempting to override a function is to inadvertently change the signature, so that the derived-class version hides rather than overrides the base-class one. The following is an example:

class Chicken : public Bird {
  ...
  virtual void talk() {
    cout << "bawwk" << endl;
  }
}

int main() {
  Chicken chicken("Myrtle");
  Bird *b_ptr = &chicken;
  b_ptr->talk();
}

This code compiles, but it prints tweet when run. Under the lookup process above, the program looks for an override of Bird::talk() at runtime. However, no such override exists – Chicken::talk() has a different signature, since it is not const. Thus, the dynamic lookup finds Bird::talk() and calls it instead.

Rather than having the code compile and then behave incorrectly, we can ask the compiler to detect bugs like this with the override keyword. Specifically, we can place the override keyword after the signature of a member function to let the compiler know we intended to override a base-class member function. If the derived-class function doesn’t actually do so, the compiler will report this:

class Chicken : public Bird {
  ...
  void talk() override {
    cout << "bawwk" << endl;
  }
}

Here, we have removed the virtual keyword, since it is already implied by override – only a virtual function can be overridden, and the “virtualness” is inherited from the base class. Since we are missing the const, the compiler reports the following:

main.cpp:39:15: error: non-virtual member function marked 'override' hides
      virtual member function
  void talk() override {
              ^
main.cpp:20:16: note: hidden overloaded virtual function 'Bird::talk' declared
      here: different qualifiers (const vs none)
  virtual void talk() const {
               ^
1 error generated.

Adding in the const fixes the issue:

class Chicken : public Bird {
  ...
  void talk() const override {
    cout << "bawwk" << endl;
  }
}

int main() {
  Chicken chicken("Myrtle");
  Bird *b_ptr = &chicken;
  b_ptr->talk();
}

The code now prints bawwk.

Abstract Classes and Interfaces

In some cases, there isn’t enough information in a base class to define a particular member function, but we still want that function to be part of the interface provided by all its derived classes. In the case of Bird, for example, we may want a get_wingspan() function that returns the average wingspan for a particular kind of bird. There isn’t a default value that makes sense to put in the Bird class. Instead, we declare get_wingspan() as a pure virtual function, without any implementation in the base class:

class Bird {
  ...
  virtual int get_wingspan() const = 0;
};

The syntax for declaring a function as pure virtual is to put = 0; after its signature. This is just syntax – we aren’t actually setting its value to 0.

Since Bird is now missing part of its implementation, we can no longer create objects of Bird type. The Bird class is said to be abstract. We can still declare Bird references and pointers, however, since that doesn’t create a Bird object. We can then have such references and pointers refer to derived-class objects:

Bird bird("Big Bird");      // ERROR: Bird is abstract
Chicken chicken("Myrtle");  // OK, as long as Chicken is not abstract
Bird &bird_ref = chicken;   // OK
Bird *bird_ptr = &chicken;  // OK

In order for a derived class to not be abstract itself, it must provide implementations of the pure virtual functions in its base classes:

class Chicken : public Bird {
  ...
  int get_wingspan() const override {
    return 20;  // inches
  }
};

With a virtual function, a base class provides its derived classes with the option of overriding the function’s behavior. With a pure virtual function, the base class requires its derived classes to override the function, since the base class does not provide an implementation itself. If a derived class fails to override the function, the derived class is itself abstract, and objects of that class cannot be created.

We can also define an interface, which is a class that consists only of pure virtual functions. Such a class provides no implementation; rather, it merely defines the interface that must be overridden by its derived classes. The following is an example:

class Shape {
public:
  virtual double area() const = 0;
  virtual double perimeter() const = 0;
  virtual void scale(double s) = 0;
};

With subtype polymorphism, we end up with two use cases for inheritance:

  • implementation inheritance, where a derived class inherits functionality from a base class

  • interface inheritance, where a derived class inherits the interface of its base class, but not necessarily any implementation

Deriving from a base class that isn’t an interface results in both implementation and interface inheritance. Deriving from an interface results in just interface inheritance. The latter is useful to work with a hierarchy of types through a common interface, using a base-class reference or pointer, even if the derived types don’t share any implementation.