Abstract Data Types in C++
Now that we’ve seen the concept of abstract data types (ADTs), we proceed to examine the mechanisms C++ provides for defining an ADT. Unlike C, C++ allows the data and functions of an ADT to be defined together. It also enables an ADT to prevent access to internal implementation details, as well as to guarantee that an object is appropriately initialized when it is created.
Our convention in this course is to use the word struct to refer to
C-style ADTs, as well as to use the struct
keyword to define them.
We use the word class to refer to C++ ADTs and use the class
keyword to define them. We will discuss the technical difference
between the two keywords momentarily.
A C++ class includes both member variables, which define the data
representation, as well as member functions that operate on the
data. The following is a Triangle
class in the C++ style:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in);
double perimeter() const {
return this->a + this->b + this->c;
}
void scale(double s) {
this->a *= s;
this->b *= s;
this->c *= s;
}
};
The class has member variables for the length of each side, defining
the data representation. We defer discussion of the public:
and
Triangle(...)
lines for now. Below those lines are member
functions for computing the perimeter of a triangle and scaling it by
a factor.
The following is an example of creating and using a Triangle
object:
int main() {
Triangle t1(3, 4, 5);
t1.scale(2);
cout << t1.perimeter() << endl;
}
We initialize a triangle by passing in the side lengths as part of its
declaration. We can then scale a triangle by using the same dot syntax
we saw for accessing a member variable:
<object>.<function>(<arguments>)
.
Before we discuss the details of what the code is doing, let us compare elements of the C-style definition and use of the triangle ADT with the C++ version. The following contrasts the definition of an ADT function between the two styles:
C-Style Struct |
C++ Class |
---|---|
void Triangle_scale(Triangle *tri,
double s) {
tri->a *= s;
tri->b *= s;
tri->c *= s;
}
|
class Triangle {
void scale(double s) {
this->a *= s;
this->b *= s;
this->c *= s;
}
};
|
The following compares how objects are created and manipulated:
C-Style Struct |
C++ Class |
---|---|
Triangle t1;
Triangle_init(&t1, 3, 4, 5);
Triangle_scale(&t1, 2);
|
Triangle t1(3, 4, 5);
t1.scale(2);
|
With the C-style struct, we defined a top-level Triangle_scale()
function whose first argument is a pointer to the Triangle
object
we want to scale. With a C++ class, on the other hand, we define a
scale()
member function within the Triangle
class itself.
There is no need to prepend Triangle_
, since it is clear that
scale()
is a member of the Triangle
class. The member function
also does not explicitly declare a pointer parameter – instead, the
C++ language adds an implicit this
parameter that is a pointer
to the Triangle
object we are working on. We can then use the
this
pointer in the same way we used the explicit tri
pointer
in the C style.
As for using a Triangle
object, in the C style, we had to
separately create the Triangle
object and then initialize it with
a call to Triangle_init()
. In the C++ style, object creation and
initialization are combined – we will see how later. When invoking an
ADT function, in the C case we have to explicitly pass the address of
the object we are working on. With the C++ syntax, the object is part
of the syntax – it appears on the left-hand side of the dot, so the
compiler automatically passes its address as the this
pointer of
the scale()
member function.
The following contrasts the definitions of a function that treats the ADT object as const:
C-Style Struct |
C++ Class |
---|---|
double Triangle_perimeter(
const Triangle *tri) {
return tri->a +
tri->b +
tri->c;
}
|
class Triangle {
double perimeter() const {
return this->a +
this->b +
this->c;
}
};
|
In the C style, we add the const
keyword to the left of the *
when declaring the explicit pointer parameter, resulting in tri
being a pointer to const. In the C++ style, we don’t have an explicit
parameter where we can add the const
keyword. Instead, we place
the keyword after the parameter list for the function. The compiler
will then make the implicit this
parameter a pointer to const, as
if it were declared with the type const Triangle *
. This allows us
to call the member function on a const Triangle
:
const Triangle t1(3, 4, 5);
cout << t1.perimeter() << endl; // OK: this pointer is a pointer to const
t1.scale(2); // ERROR: conversion from const to non-const
As with accessing member variables, we can use the arrow operator to invoke a member function through a pointer:
Triangle t1(3, 4, 5);
const Triangle *ptr = &t1;
cout << ptr->perimeter() << endl; // OK: this pointer is a pointer to const
ptr->scale(2); // ERROR: conversion from const to non-const
Implicit this->
Since member variables and member functions are both located within
the scope of a class, C++ allows us to refer to members from within a
member function without the explicit this->
syntax. The compiler
automatically inserts the member dereference for us:
class Triangle {
double a;
double b;
double c;
...
double perimeter() const {
return a + b + c; // Equivalent to: this->a + this->b + this->c
}
};
This is also the case for invoking other member functions. For instance, the following defines and uses functions to get each side length:
class Triangle {
double a;
double b;
double c;
...
double side1() const {
return a;
}
double side2() const {
return b;
}
double side3() const {
return c;
}
double perimeter() const {
return side1() + side2() + side3();
// Equivalent to: this->side1() + this->side2() + this->side3()
}
};
In both cases, the compiler can tell that we are referring to members
of the class and therefore inserts the this->
. However, if there
are names in a closer scope that conflict with the member names, we
must use this->
ourselves. The following is an example:
class Triangle {
double a;
...
double set_side1(double a) {
this->a = a;
}
};
Here, the unqualified a
refers to the parameter a
, since it
is declared in a narrower scope than the member variable. We can still
refer to the member a
by qualifying its name with this->
.
In general, we should avoid declaring variables in a local scope that hide names in an outer scope. Doing so in a constructor or set function is often considered acceptable, but it should be avoided elsewhere.
Member Accessibility
The data representation of an ADT is usually an implementation detail
(plain old data being an exception). With
C-style structs, however, we have to rely on programmers to respect
convention and avoid accessing member variables directly. With C++
classes, the language provides us a mechanism for enforcing this
convention: declaring members as private prevents access from
outside the class, while declaring them as public allows outside
access. [1] We give a set of members a particular access level by
placing private:
or public:
before the members – that access
level applies to subsequent members until a new access specifier is
encountered, and any number of specifiers may appear in a class. The
following is an example:
class Triangle {
private:
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in);
double perimeter() const {
return a + b + c;
}
void scale(double s) {
a *= s;
b *= s;
c *= s;
}
};
In this example, the members a
, b
, and c
are declared as
private, while Triangle()
, perimeter()
, and scale()
are
declared as public. Private members, whether variables or functions,
can be accessed from within the class, even if they are members of a
different object of that class. They cannot be accessed from outside
the class:
int main() {
Triangle t1(3, 4, 5); // OK: Triangle() is public
t1.scale(2); // OK: scale() is public
cout << t1.perimeter() << endl; // OK: perimeter() is public
// Die triangle! DIE!
t1.a = -1; // ERROR: a is private
}
With the class
keyword, the default access level is private. Thus,
the private:
at the beginning of the Triangle
definition is
redundant, and the following is equivalent:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in);
...
};
We have seen previously that members declared within a struct
are
accessible from outside the struct. In fact, the only difference
between the struct
and class
keywords when defining a class
type is the default access level: public for struct
but private
for class
. [2] However, we use the two keywords for different
conventions in this course.
This also applies to inheritance. We will see next time that
private inheritance is the default for a class
, and we will
need to use the public
keyword to override this. The
default for a struct
is public inheritance.
Constructors
A constructor is similar to a member function, except that its purpose is to initialize a class-type object. In most cases, C++ guarantees that a constructor is called when creating an object of class type. [3] The following examples all call a constructor:
The exception is aggregate initialization, where an initializer list is used to directly initialize the members of a class-type object. This is only possible for aggregates, which are class types that have a restricted set of features. Our pattern of C-style structs obey the rules for an aggregate, though we define init functions instead of using aggregate initialization. Our convention of C++ classes results in class types that are not aggregates, so objects of such types can only be initialized through a constructor. We can still use initializer-list syntax for a non-aggregate – it will call a constructor with the values in the initializer list as arguments.
Triangle t1; // calls zero-argument (default) constructor
Triangle t2(3, 4, 5); // calls three-argument constructor
Triangle t3 = Triangle(3, 4, 5); // calls three-argument constructor
// examples with "uniform initialization syntax":
Triangle t4{3, 4, 5}; // calls three-argument constructor
Triangle t5 = {3, 4, 5}; // calls three-argument constructor
Triangle t6 = Triangle{3, 4, 5}; // calls three-argument constructor
As can be seen above, there are many forms of syntax for initializing
a Triangle
object, all of which call a constructor. When no
arguments are provided, the zero-argument, or default, constructor
is called. We will discuss this constructor in more detail later.
The following does not call a constructor:
Triangle t7(); // declares a function called t7 that returns a Triangle
In fact, it doesn’t create an object at all. Instead, it declares a
function named t7
that takes no arguments and returns a
Triangle
. A function declaration can appear at local scope, so
this is interpreted as a function declaration regardless of whether it
is at local or global scope.
So far, we declared a single constructor for Triangle
as follows:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in);
};
The syntax is similar to declaring a member function, except:
There is no return type.
The name of the constructor is the same as the name of the class.
Like a member function, the constructor has an implicit this
parameter that points to the object being initialized. As in a member
function, we can leave out this->
to access a member, as long as
there is no local name that hides the member. The following is a
definition of the Triangle
constructor:
class Triangle {
double a;
double b;
double c;
public:
// poor constructor implementation
Triangle(double a_in, double b_in, double c_in) {
a = a_in;
b = b_in;
c = c_in;
}
};
However, there is a problem with the definition above: the statements in the body of the constructor perform assignment, not initialization. Thus, the member variables are actually default initialized and then assigned new values. In this case, it is a minor issue, but it can be more significant in other cases. In particular there are several kinds of variables that allow initialization but not assignment:
arrays
references
const
variablesclass-type variables that disable assignment (e.g. streams)
Another case where initialization followed by assignment is problematic is for types where both initialization and assignment perform nontrivial operations – we lose efficiency by doing both when we only need initialization.
C++ provides two mechanisms for initializing a member variable:
directly in the declaration of the variable, similar to initializing a non-member variable
through a member-initializer list
A member-initializer list is syntax specific to a constructor. It is a list of initializations that appear between a colon symbol and the constructor body:
class Triangle {
double a;
double b;
double c;
public:
// good constructor implementation
Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {}
};
An individual initialization consists of a member-variable name,
followed by an initialization expression enclosed by parentheses (or
curly braces). The constructor above initializes the member a
to
the value of a_in
, b
to the value of b_in
, and c
to
the value of c_in
. The constructor body is empty, since it has no
further work to do.
If a member is initialized in both its declaration and a member-initializer list, the latter takes precedence, so that the initialization in the member declaration is ignored.
Default Initialization and Default Constructors
Every object in C++ is initialized upon creation, whether the object is of class type or not. If no explicit initialization is provided, it undergoes default initialization. Default initialization does the following:
Objects of atomic type (e.g.
int
,double
, pointers) are default initialized by doing nothing. This means they retain whatever value was already there in memory. Put another way, atomic objects have undefined values when they are default initialized.An array is default initialized by in turn default initializing its elements. Thus, an array of atomic objects is default initialized by doing nothing, resulting in undefined element values.
A class-type object is default initialized by calling the default constructor, which is the constructor that takes no arguments. If no such constructor exists, or if it is inaccessible (e.g. it is private), a compile-time error results.
An array of class-type objects is default initialized by calling the default constructor on each element. Thus, the element type must have an accessible default constructor in order to create an array of that type.
Within a class, if a member variable is neither initialized at declaration nor in the member-initializer list of a constructor, it is default initialized.
The default constructor is so named because it is invoked in default
initialization, and it takes no arguments. We can define a default
constructor for Triangle
as follows, making the design decision to
initialize the object as a 1x1x1 equilateral triangle:
class Triangle {
double a;
double b;
double c;
public:
// default constructor
Triangle()
: a(1), b(1), c(1) {}
// non-default constructor
Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {}
};
A class can have multiple constructors. This is a form of
function overloading, which we will
return to in the future. The compiler determines which compiler to
invoke based on the arguments that are provided when creating a
Triangle
object:
Triangle t1; // 1x1x1 -- calls zero-argument (default) constructor
Triangle t2(3, 4, 5); // 3x4x5 -- calls three-argument constructor
Implicit Default Constructor
If a class declares no constructors at all, the compiler provides an implicit default constructor. The behavior of this constructor is as if it were empty, so that it default initializes each member variable:
struct Person {
string name;
int age;
bool is_ninja;
// implicit default constructor
// Person() {} // default initializes each member variable
};
int main() {
Person elise; // calls implicit default constructor
cout << elise.name; // prints nothing: default ctor for string makes it empty
cout << elise.age; // prints undefined value
cout << elise.is_ninja; // prints undefined value
};
If a class declares any constructors whatsoever, no implicit default constructor is provided:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in);
double perimeter() const {
return a + b + c;
}
void scale(double s) {
a *= s;
b *= s;
c *= s;
}
};
int main() {
Triangle t1; // ERROR: no implicit or explicit default constructor
}
In this case, if we want our type to have a default constructor, we have to explicitly write one:
class Triangle {
double a;
double b;
double c;
public:
// explicit default constructor
Triangle()
: a(1), b(1), c(1) {}
// non-default constructor
Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {}
double perimeter() const {
return a + b + c;
}
void scale(double s) {
a *= s;
b *= s;
c *= s;
}
};
int main() {
Triangle t1; // OK: explicit default constructor
}
Get and Set Functions
With C++ classes, member variables are usually declared private, since
they are implementation details. However, many C++ ADTs provide a
means of accessing the abstract data through get and set functions
(also called getters and setters or accessor functions). These
are provided as part of the interface as an abstraction over the
underlying data. The following are examples for Triangle
:
class Triangle {
double a;
double b;
double c;
public:
// EFFECTS: Returns side a of the triangle.
double get_a() const {
return a;
}
// REQUIRES: a_in > 0 && a_in < get_b() + get_c()
// MODIFIES: *this
// EFFECTS: Sets side a of the triangle to a_in.
void set_a(double a_in) {
a = a_in;
}
};
If the implementation changes, the interface can remain the same, so that outside code is unaffected:
class Triangle {
double side1; // new names
double side2;
double side3;
public:
// EFFECTS: Returns side a of the triangle.
double get_a() const { // same interface
return side1; // different implementation
}
// REQUIRES: a_in > 0 && a_in < get_b() + get_c()
// MODIFIES: *this
// EFFECTS: Sets side a of the triangle to a_in.
void set_a(double a_in) {
side1 = a_in;
}
};
With a set function, we’ve introduced a new location from which the representation can be modified. We need to ensure that the representation invariants are still met. We can do so by writing and using a private function to check the invariants:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {
check_invariants();
}
void set_a(double a_in) {
a = a_in;
check_invariants();
}
private:
void check_invariants() {
assert(0 < a && 0 < b && 0 < c);
assert(a + b > c && a + c > b && b + c > a);
}
}
It is good practice to check the invariants anywhere the representation can be modified. Here, we have done so in both the constructor and in the set function.
Information Hiding
Good abstraction design uses encapsulation, which groups together both the data and functions of an ADT. With a class, we get encapsulation by defining both member variables and member functions.
A proper abstraction also provides information hiding, which
separates interface from implementation. Access specifiers such as
private
allow us to prevent the outside world from accessing
implementation details.
We can further hide information from the sight of the users of an ADT by physically separating the code for the interface from the code for the implementation. The standard mechanism to do so in C++ is to place declarations in header files and definitions in source files. With a class, we place a class definition that only contains member declarations in the header file:
// Triangle.hpp
// A class that represents a triangle ADT.
class Triangle {
public:
// EFFECTS: Initializes this to a 1x1x1 triangle.
Triangle();
// EFFECTS: Initializes this with the given side lengths.
Triangle(double a_in, double b_in, double c_in);
// EFFECTS: Returns the perimeter of this triangle.
double perimeter() const;
// REQUIRES: s > 0
// MODIFIES: *this
// EFFECTS: Scales the sides of this triangle by the factor s.
void scale(double s);
private:
double a;
double b;
double c;
// INVARIANTS:
// positive side lengths: a > 0 && b > 0 && c > 0
// triangle inequality: a + b > c && a + c > b && b + c > a
};
It is also generally preferable to declare the public members of the class before private members, so that users do not have to skip over implementation details to find the public interface.
We then define the constructors and member functions outside of the class definition, in the corresponding source file. In order to define a member function outside of a class, we need two things:
A declaration of the function within the class, so that the compiler (and other programmers) can tell that the member exists.
Syntax in the definition that tells the compiler that the function is a member of the associated class and not a top-level function.
The latter is accomplished by prefixing the member name with the class name, followed by the scope-resolution operator:
// Triangle.cpp
#include "Triangle.hpp"
Triangle::Triangle()
: a(1), b(1), c(1) {}
Triangle::Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {}
double Triangle::perimeter() const {
return a + b + c;
}
void Triangle::scale(double s) {
a *= s;
b *= s;
c *= s;
}
This tells the compiler that the two constructors, as well as the
perimeter()
and scale()
functions, are members of the
Triangle
class.
Testing a C++ ADT
We test a C++ ADT by writing test cases that live outside of the ADT itself. C++ forces us to respect the interface, since the implementation details are private:
// Triangle_tests.cpp
#include "Triangle.hpp"
#include "unit_test_framework.hpp"
TEST(test_triangle_basic) {
Triangle t(3, 4, 5);
ASSERT_EQUAL(t.area(), 6);
ASSERT_EQUAL(t.get_a(), 3); // must use get and set functions
t.set_a(4);
ASSERT_EQUAL(t.get_a(), 4);
}
TEST_MAIN()
Member-Initialization Order
Member variables are always initialized in the order in which they are declared in the class. This is the case regardless if some members are initialized at the declaration point and others are not, or if a constructor’s member-initializer list is out of order:
class Triangle {
double a;
double b;
double c;
public:
Triangle(double a_in, double b_in, double c_in)
: b(b_in), c(c_in), a(a_in) { // this ordering is ignored
}
};
Here, a
is initialized first to a_in
, then b
to b_in
,
then c
to c_in
; the ordering in the member-initializer list is
ignored. Some compilers will generate a warning if the order differs
between the member declarations and the member-initializer list:
$ g++ --std=c++17 -Wall Triangle.cpp
Triangle.cpp:8:16: warning: field 'c' will be initialized after field 'a'
[-Wreorder]
: b(b_in), c(c_in), a(a_in) { // this ordering is ignored
Delegating Constructors
When a class has multiple constructors, it can be useful to invoke one constructor from another. This allows us to avoid code duplication, and it also makes our code more maintainable by reducing the number of places where we hardcode implementation details.
In order to delegate to another constructor, we must do so in the member-initializer list. The member-initializer list must consist solely of the call to the other constructor:
class Triangle {
double a;
double b;
double c;
public:
// EFFECTS: Initializes this to be an equilateral triangle with
// the given side length.
Triangle(double side_in)
: Triangle(side_in, side_in, side_in) {} // delegate to 3-argument constructor
Triangle(double a_in, double b_in, double c_in)
: a(a_in), b(b_in), c(c_in) {}
};
The delegation must be in the member-initializer list. If we invoke a different constructor from within the body, it does not do delegation; rather, it creates a new, temporary object and then throws it away:
Triangle(double side_in) { // default initializes members
Triangle(side_in, side_in, side_in); // creates a new Triangle object that
// lives in the activation record for
// this constructor
}