Abstract Data Types in C++

Now that we’ve seen the concept of abstract data types (ADTs), we proceed to examine the mechanisms C++ provides for defining an ADT. Unlike C, C++ allows the data and functions of an ADT to be defined together. It also enables an ADT to prevent access to internal implementation details, as well as to guarantee that an object is appropriately initialized when it is created.

Our convention in this course is to use the word struct to refer to C-style ADTs, as well as to use the struct keyword to define them. We use the word class to refer to C++ ADTs and use the class keyword to define them. We will discuss the technical difference between the two keywords momentarily.

A C++ class includes both member variables, which define the data representation, as well as member functions that operate on the data. The following is a Triangle class in the C++ style:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in);

  double perimeter() const {
    return this->a + this->b + this->c;
  }

  void scale(double s) {
    this->a *= s;
    this->b *= s;
    this->c *= s;
  }
};

The class has member variables for the length of each side, defining the data representation. We defer discussion of the public: and Triangle(...) lines for now. Below those lines are member functions for computing the perimeter of a triangle and scaling it by a factor.

The following is an example of creating and using a Triangle object:

int main() {
  Triangle t1(3, 4, 5);
  t1.scale(2);
  cout << t1.perimeter() << endl;
}

We initialize a triangle by passing in the side lengths as part of its declaration. We can then scale a triangle by using the same dot syntax we saw for accessing a member variable: <object>.<function>(<arguments>).

Before we discuss the details of what the code is doing, let us compare elements of the C-style definition and use of the triangle ADT with the C++ version. The following contrasts the definition of an ADT function between the two styles:

C-Style Struct

C++ Class

 void Triangle_scale(Triangle *tri,
                     double s) {
   tri->a *= s;
   tri->b *= s;
   tri->c *= s;
}
class Triangle {
  void scale(double s) {
    this->a *= s;
    this->b *= s;
    this->c *= s;
  }
};

The following compares how objects are created and manipulated:

C-Style Struct

C++ Class

Triangle t1;
Triangle_init(&t1, 3, 4, 5);
Triangle_scale(&t1, 2);
Triangle t1(3, 4, 5);

t1.scale(2);

With the C-style struct, we defined a top-level Triangle_scale() function whose first argument is a pointer to the Triangle object we want to scale. With a C++ class, on the other hand, we define a scale() member function within the Triangle class itself. There is no need to prepend Triangle_, since it is clear that scale() is a member of the Triangle class. The member function also does not explicitly declare a pointer parameter – instead, the C++ language adds an implicit this parameter that is a pointer to the Triangle object we are working on. We can then use the this pointer in the same way we used the explicit tri pointer in the C style.

As for using a Triangle object, in the C style, we had to separately create the Triangle object and then initialize it with a call to Triangle_init(). In the C++ style, object creation and initialization are combined – we will see how later. When invoking an ADT function, in the C case we have to explicitly pass the address of the object we are working on. With the C++ syntax, the object is part of the syntax – it appears on the left-hand side of the dot, so the compiler automatically passes its address as the this pointer of the scale() member function.

_images/08_triangle_scale.svg

Figure 34 Memory layout when scaling a triangle in the C and C++ styles.

The following contrasts the definitions of a function that treats the ADT object as const:

C-Style Struct

C++ Class

 double Triangle_perimeter(
          const Triangle *tri) {
   return tri->a +
          tri->b +
          tri->c;
}
class Triangle {
  double perimeter() const {
    return this->a +
           this->b +
           this->c;
  }
};

In the C style, we add the const keyword to the left of the * when declaring the explicit pointer parameter, resulting in tri being a pointer to const. In the C++ style, we don’t have an explicit parameter where we can add the const keyword. Instead, we place the keyword after the parameter list for the function. The compiler will then make the implicit this parameter a pointer to const, as if it were declared with the type const Triangle *. This allows us to call the member function on a const Triangle:

const Triangle t1(3, 4, 5);
cout << t1.perimeter() << endl;  // OK: this pointer is a pointer to const
t1.scale(2);                     // ERROR: conversion from const to non-const

As with accessing member variables, we can use the arrow operator to invoke a member function through a pointer:

Triangle t1(3, 4, 5);
const Triangle *ptr = &t1;
cout << ptr->perimeter() << endl;  // OK: this pointer is a pointer to const
ptr->scale(2);                     // ERROR: conversion from const to non-const

Implicit this->

Since member variables and member functions are both located within the scope of a class, C++ allows us to refer to members from within a member function without the explicit this-> syntax. The compiler automatically inserts the member dereference for us:

class Triangle {
  double a;
  double b;
  double c;

  ...

  double perimeter() const {
    return a + b + c; // Equivalent to: this->a + this->b + this->c
  }
};

This is also the case for invoking other member functions. For instance, the following defines and uses functions to get each side length:

class Triangle {
  double a;
  double b;
  double c;

  ...

  double side1() const {
    return a;
  }

  double side2() const {
    return b;
  }

  double side3() const {
    return c;
  }

  double perimeter() const {
    return side1() + side2() + side3();
    // Equivalent to: this->side1() + this->side2() + this->side3()
  }
};

In both cases, the compiler can tell that we are referring to members of the class and therefore inserts the this->. However, if there are names in a closer scope that conflict with the member names, we must use this-> ourselves. The following is an example:

class Triangle {
  double a;

  ...

  double set_side1(double a) {
    this->a = a;
  }
};

Here, the unqualified a refers to the parameter a, since it is declared in a narrower scope than the member variable. We can still refer to the member a by qualifying its name with this->.

In general, we should avoid declaring variables in a local scope that hide names in an outer scope. Doing so in a constructor or set function is often considered acceptable, but it should be avoided elsewhere.

Member Accessibility

The data representation of an ADT is usually an implementation detail (plain old data being an exception). With C-style structs, however, we have to rely on programmers to respect convention and avoid accessing member variables directly. With C++ classes, the language provides us a mechanism for enforcing this convention: declaring members as private prevents access from outside the class, while declaring them as public allows outside access. [1] We give a set of members a particular access level by placing private: or public: before the members – that access level applies to subsequent members until a new access specifier is encountered, and any number of specifiers may appear in a class. The following is an example:

class Triangle {
private:
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in);

  double perimeter() const {
    return a + b + c;
  }

  void scale(double s) {
    a *= s;
    b *= s;
    c *= s;
  }
};

In this example, the members a, b, and c are declared as private, while Triangle(), perimeter(), and scale() are declared as public. Private members, whether variables or functions, can be accessed from within the class, even if they are members of a different object of that class. They cannot be accessed from outside the class:

int main() {
  Triangle t1(3, 4, 5);            // OK: Triangle() is public
  t1.scale(2);                     // OK: scale() is public
  cout << t1.perimeter() << endl;  // OK: perimeter() is public

  // Die triangle! DIE!
  t1.a = -1;                       // ERROR: a is private
}

With the class keyword, the default access level is private. Thus, the private: at the beginning of the Triangle definition is redundant, and the following is equivalent:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in);

  ...
};

We have seen previously that members declared within a struct are accessible from outside the struct. In fact, the only difference between the struct and class keywords when defining a class type is the default access level: public for struct but private for class. [2] However, we use the two keywords for different conventions in this course.

Constructors

A constructor is similar to a member function, except that its purpose is to initialize a class-type object. In most cases, C++ guarantees that a constructor is called when creating an object of class type. [3] The following examples all call a constructor:

Triangle t1;                      // calls zero-argument (default) constructor
Triangle t2(3, 4, 5);             // calls three-argument constructor
Triangle t3 = Triangle(3, 4, 5);  // calls three-argument constructor
// examples with "uniform initialization syntax":
Triangle t4{3, 4, 5};             // calls three-argument constructor
Triangle t5 = {3, 4, 5};          // calls three-argument constructor
Triangle t6 = Triangle{3, 4, 5};  // calls three-argument constructor

As can be seen above, there are many forms of syntax for initializing a Triangle object, all of which call a constructor. When no arguments are provided, the zero-argument, or default, constructor is called. We will discuss this constructor in more detail later.

The following does not call a constructor:

Triangle t7();   // declares a function called t7 that returns a Triangle

In fact, it doesn’t create an object at all. Instead, it declares a function named t7 that takes no arguments and returns a Triangle. A function declaration can appear at local scope, so this is interpreted as a function declaration regardless of whether it is at local or global scope.

So far, we declared a single constructor for Triangle as follows:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in);
};

The syntax is similar to declaring a member function, except:

  • There is no return type.

  • The name of the constructor is the same as the name of the class.

Like a member function, the constructor has an implicit this parameter that points to the object being initialized. As in a member function, we can leave out this-> to access a member, as long as there is no local name that hides the member. The following is a definition of the Triangle constructor:

class Triangle {
  double a;
  double b;
  double c;

public:
  // poor constructor implementation
  Triangle(double a_in, double b_in, double c_in) {
    a = a_in;
    b = b_in;
    c = c_in;
  }
};

However, there is a problem with the definition above: the statements in the body of the constructor perform assignment, not initialization. Thus, the member variables are actually default initialized and then assigned new values. In this case, it is a minor issue, but it can be more significant in other cases. In particular there are several kinds of variables that allow initialization but not assignment:

  • arrays

  • references

  • const variables

  • class-type variables that disable assignment (e.g. streams)

Another case where initialization followed by assignment is problematic is for types where both initialization and assignment perform nontrivial operations – we lose efficiency by doing both when we only need initialization.

C++ provides two mechanisms for initializing a member variable:

  • directly in the declaration of the variable, similar to initializing a non-member variable

  • through a member-initializer list

A member-initializer list is syntax specific to a constructor. It is a list of initializations that appear between a colon symbol and the constructor body:

class Triangle {
  double a;
  double b;
  double c;

public:
  // good constructor implementation
  Triangle(double a_in, double b_in, double c_in)
    : a(a_in), b(b_in), c(c_in) {}
};

An individual initialization consists of a member-variable name, followed by an initialization expression enclosed by parentheses (or curly braces). The constructor above initializes the member a to the value of a_in, b to the value of b_in, and c to the value of c_in. The constructor body is empty, since it has no further work to do.

If a member is initialized in both its declaration and a member-initializer list, the latter takes precedence, so that the initialization in the member declaration is ignored.

Default Initialization and Default Constructors

Every object in C++ is initialized upon creation, whether the object is of class type or not. If no explicit initialization is provided, it undergoes default initialization. Default initialization does the following:

  • Objects of atomic type (e.g. int, double, pointers) are default initialized by doing nothing. This means they retain whatever value was already there in memory. Put another way, atomic objects have undefined values when they are default initialized.

  • An array is default initialized by in turn default initializing its elements. Thus, an array of atomic objects is default initialized by doing nothing, resulting in undefined element values.

  • A class-type object is default initialized by calling the default constructor, which is the constructor that takes no arguments. If no such constructor exists, or if it is inaccessible (e.g. it is private), a compile-time error results.

    An array of class-type objects is default initialized by calling the default constructor on each element. Thus, the element type must have an accessible default constructor in order to create an array of that type.

Within a class, if a member variable is neither initialized at declaration nor in the member-initializer list of a constructor, it is default initialized.

The default constructor is so named because it is invoked in default initialization, and it takes no arguments. We can define a default constructor for Triangle as follows, making the design decision to initialize the object as a 1x1x1 equilateral triangle:

class Triangle {
  double a;
  double b;
  double c;

public:
  // default constructor
  Triangle()
    : a(1), b(1), c(1) {}

  // non-default constructor
  Triangle(double a_in, double b_in, double c_in)
    : a(a_in), b(b_in), c(c_in) {}
};

A class can have multiple constructors. This is a form of function overloading, which we will return to in the future. The compiler determines which compiler to invoke based on the arguments that are provided when creating a Triangle object:

Triangle t1;            // 1x1x1 -- calls zero-argument (default) constructor
Triangle t2(3, 4, 5);   // 3x4x5 -- calls three-argument constructor

Implicit Default Constructor

If a class declares no constructors at all, the compiler provides an implicit default constructor. The behavior of this constructor is as if it were empty, so that it default initializes each member variable:

struct Person {
  string name;
  int age;
  bool is_ninja;
  // implicit default constructor
  // Person() {}  // default initializes each member variable
};

int main() {
  Person elise;           // calls implicit default constructor
  cout << elise.name; // prints nothing: default ctor for string makes it empty
  cout << elise.age;      // prints undefined value
  cout << elise.is_ninja; // prints undefined value
};

If a class declares any constructors whatsoever, no implicit default constructor is provided:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in);

  double perimeter() const {
    return a + b + c;
  }

  void scale(double s) {
    a *= s;
    b *= s;
    c *= s;
  }
};

int main() {
  Triangle t1;  // ERROR: no implicit or explicit default constructor
}

In this case, if we want our type to have a default constructor, we have to explicitly write one:

class Triangle {
  double a;
  double b;
  double c;

public:
  // explicit default constructor
  Triangle()
    : a(1), b(1), c(1) {}

  // non-default constructor
  Triangle(double a_in, double b_in, double c_in)
    : a(a_in), b(b_in), c(c_in) {}

  double perimeter() const {
    return a + b + c;
  }

  void scale(double s) {
    a *= s;
    b *= s;
    c *= s;
  }
};

int main() {
  Triangle t1;  // OK: explicit default constructor
}

Get and Set Functions

With C++ classes, member variables are usually declared private, since they are implementation details. However, many C++ ADTs provide a means of accessing the abstract data through get and set functions (also called getters and setters or accessor functions). These are provided as part of the interface as an abstraction over the underlying data. The following are examples for Triangle:

class Triangle {
  double a;
  double b;
  double c;

public:
  // EFFECTS: Returns side a of the triangle.
  double get_a() const {
    return a;
  }

  // REQUIRES: a_in > 0 && a_in < get_b() + get_c()
  // MODIFIES: *this
  // EFFECTS:  Sets side a of the triangle to a_in.
  void set_a(double a_in) {
    a = a_in;
  }
};

If the implementation changes, the interface can remain the same, so that outside code is unaffected:

class Triangle {
  double side1;  // new names
  double side2;
  double side3;

public:
  // EFFECTS: Returns side a of the triangle.
  double get_a() const {  // same interface
    return side1;         // different implementation
  }

  // REQUIRES: a_in > 0 && a_in < get_b() + get_c()
  // MODIFIES: *this
  // EFFECTS:  Sets side a of the triangle to a_in.
  void set_a(double a_in) {
    side1 = a_in;
  }
};

With a set function, we’ve introduced a new location from which the representation can be modified. We need to ensure that the representation invariants are still met. We can do so by writing and using a private function to check the invariants:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in)
    : a(a_in), b(b_in), c(c_in) {
    check_invariants();
  }

  void set_a(double a_in) {
    a = a_in;
    check_invariants();
  }

private:
  void check_invariants() {
    assert(0 < a && 0 < b && 0 < c);
    assert(a + b > c  && a + c > b && b + c > a);
  }
}

It is good practice to check the invariants anywhere the representation can be modified. Here, we have done so in both the constructor and in the set function.

Information Hiding

Good abstraction design uses encapsulation, which groups together both the data and functions of an ADT. With a class, we get encapsulation by defining both member variables and member functions.

A proper abstraction also provides information hiding, which separates interface from implementation. Access specifiers such as private allow us to prevent the outside world from accessing implementation details.

We can further hide information from the sight of the users of an ADT by physically separating the code for the interface from the code for the implementation. The standard mechanism to do so in C++ is to place declarations in header files and definitions in source files. With a class, we place a class definition that only contains member declarations in the header file:

// Triangle.hpp

// A class that represents a triangle ADT.
class Triangle {
public:
  // EFFECTS: Initializes this to a 1x1x1 triangle.
  Triangle();

  // EFFECTS: Initializes this with the given side lengths.
  Triangle(double a_in, double b_in, double c_in);

  // EFFECTS: Returns the perimeter of this triangle.
  double perimeter() const;

  // REQUIRES: s > 0
  // MODIFIES: *this
  // EFFECTS:  Scales the sides of this triangle by the factor s.
  void scale(double s);

private:
  double a;
  double b;
  double c;
  // INVARIANTS:
  // positive side lengths: a > 0 && b > 0 && c > 0
  // triangle inequality: a + b > c && a + c > b && b + c > a
};

It is also generally preferable to declare the public members of the class before private members, so that users do not have to skip over implementation details to find the public interface.

We then define the constructors and member functions outside of the class definition, in the corresponding source file. In order to define a member function outside of a class, we need two things:

  1. A declaration of the function within the class, so that the compiler (and other programmers) can tell that the member exists.

  2. Syntax in the definition that tells the compiler that the function is a member of the associated class and not a top-level function.

The latter is accomplished by prefixing the member name with the class name, followed by the scope-resolution operator:

// Triangle.cpp
#include "Triangle.hpp"

Triangle::Triangle()
  : a(1), b(1), c(1) {}

Triangle::Triangle(double a_in, double b_in, double c_in)
  : a(a_in), b(b_in), c(c_in) {}

double Triangle::perimeter() const {
  return a + b + c;
}

void Triangle::scale(double s) {
  a *= s;
  b *= s;
  c *= s;
}

This tells the compiler that the two constructors, as well as the perimeter() and scale() functions, are members of the Triangle class.

Testing a C++ ADT

We test a C++ ADT by writing test cases that live outside of the ADT itself. C++ forces us to respect the interface, since the implementation details are private:

// Triangle_tests.cpp
#include "Triangle.hpp"
#include "unit_test_framework.hpp"

TEST(test_triangle_basic) {
  Triangle t(3, 4, 5);
  ASSERT_EQUAL(t.area(), 6);
  ASSERT_EQUAL(t.get_a(), 3);  // must use get and set functions
  t.set_a(4);
  ASSERT_EQUAL(t.get_a(), 4);
}

TEST_MAIN()

Member-Initialization Order

Member variables are always initialized in the order in which they are declared in the class. This is the case regardless if some members are initialized at the declaration point and others are not, or if a constructor’s member-initializer list is out of order:

class Triangle {
  double a;
  double b;
  double c;

public:
  Triangle(double a_in, double b_in, double c_in)
    : b(b_in), c(c_in), a(a_in) {    // this ordering is ignored
  }
};

Here, a is initialized first to a_in, then b to b_in, then c to c_in; the ordering in the member-initializer list is ignored. Some compilers will generate a warning if the order differs between the member declarations and the member-initializer list:

$ g++ --std=c++17 -Wall Triangle.cpp
Triangle.cpp:8:16: warning: field 'c' will be initialized after field 'a'
      [-Wreorder]
    : b(b_in), c(c_in), a(a_in) {    // this ordering is ignored

Delegating Constructors

When a class has multiple constructors, it can be useful to invoke one constructor from another. This allows us to avoid code duplication, and it also makes our code more maintainable by reducing the number of places where we hardcode implementation details.

In order to delegate to another constructor, we must do so in the member-initializer list. The member-initializer list must consist solely of the call to the other constructor:

class Triangle {
  double a;
  double b;
  double c;

public:
  // EFFECTS: Initializes this to be an equilateral triangle with
  //          the given side length.
  Triangle(double side_in)
    : Triangle(side_in, side_in, side_in) {} // delegate to 3-argument constructor

  Triangle(double a_in, double b_in, double c_in)
    : a(a_in), b(b_in), c(c_in) {}
};

The delegation must be in the member-initializer list. If we invoke a different constructor from within the body, it does not do delegation; rather, it creates a new, temporary object and then throws it away:

Triangle(double side_in) {   // default initializes members
  Triangle(side_in, side_in, side_in);  // creates a new Triangle object that
                                        // lives in the activation record for
                                        // this constructor
}