# Abstract Data Types in C¶

Recall that abstraction is the idea of separating *what* something is
from *how* it works, by separating interface from implementation.
Previously, we saw procedural abstraction, which applies abstraction to computational processes.
With procedural abstraction, we use functions based on their signature
and documentation without having to know details about their
definition.

The concept of abstraction can be applied to data as well. An
*abstract data type (ADT)* separates the interface of a data type from
its implementation, and it encompasses both the data itself as well as
functionality on the data. An example of an ADT is the `string`

type
in C++, used in the following code:

```
string str1 = "hello";
string str2 = "jello";
cout << str1 << endl;
if (str1.length() == str2.length()) {
cout << "Same length!" << endl;
}
```

This code creates two strings and initializes them to represent
different values, prints out one of them, and compares the lengths of
both – all without needing to any details about the implementation of
`string`

. Rather, it relies solely on the interface provided by the
`string`

abstraction.

A `string`

is an example of a full-featured C++ ADT, providing
customized initialization, overloaded operations such as the
stream-insertion operator, member functions, and so on. We will start
with the simpler model of C ADTs, deferring C++ ADTs until next time.

The C language only has support for structs with data members (i.e. member variables). While this is sufficient to represent the data of an ADT, the functions that operate on the ADT must be defined separately from the struct. The following is the data definition of an ADT to represent triangles:

```
// A triangle ADT.
struct Triangle {
double a;
double b;
double c;
};
int main() {
Triangle t1 = { 3, 4, 5 };
Triangle t2 = { 2, 2, 2 };
}
```

The `Triangle`

struct contains three member variables, one for each
side of the triangle, each represented by a `double`

. The example in
`main()`

creates and initializes two `Triangle`

structs, resulting
in the memory layout in Figure 37.

An ADT also includes functions that operate on the data. We can define functions to compute the perimeter of a triangle or to modify it by scaling each of the sides by a given factor:

```
// REQUIRES: tri points to a valid Triangle
// EFFECTS: Returns the perimeter of the given Triangle.
double Triangle_perimeter(const Triangle *tri) {
return tri->a + tri->b + tri->c;
}
// REQUIRES: tri points to a valid Triangle; s > 0
// MODIFIES: *tri
// EFFECTS: Scales the sides of the Triangle by the factor s.
void Triangle_scale(Triangle *tri, double s) {
tri->a *= s;
tri->b *= s;
tri->c *= s;
}
```

Our naming convention for functions that are part of a C-style ADT is
to prepend the function name with the name of the ADT, `Triangle`

in
this case. The first parameter is a pointer to the actual `Triangle`

object the function works on. If the object need not be modified, we
declare the pointer as a pointer to const.

The following demonstrates how to use the `Triangle`

ADT functions:

```
int main() {
Triangle t1 = { 3, 4, 5 };
Triangle_scale(&t1, 2); // sides are now 6, 8, 10
cout << Triangle_perimeter(&t1) << endl; // prints 24
}
```

The code creates a `Triangle`

as a local variable and initializes it
with sides 3, 4, and 5. It then scales the sides by a factor of 2 by
calling `Triangle_scale()`

. Since that function takes a pointer to
the actual triangle, we use the address-of operator to obtain and pass
the address of `t1`

, as shown in Figure 38.

The function scales each side of the triangle, resulting in `t1`

having sides of 6, 8, and 10. We then call `Triangle_perimeter()`

on
the address of `t1`

, which computes the value 24.

In this example, the code in `main()`

need not worry about the
implementation of `Triangle_scale()`

or `Triangle_perimeter()`

.
Instead, it relies on abstraction, using the functions for what they
do rather than how they do it. However, in initializing `t1`

itself,
the code *is* relying on implementation details – specifically, that
a `Triangle`

is implemented as three `double`

members that
represent the lengths of the sides. If the implementation were to
change to represent a triangle as two sides and the angle between
them, for instance, then the behavior of the code in `main()`

would
change, and it would no longer print 24. Thus, we need to abstract the
initialization of a `Triangle`

, avoiding having to initialize each
member directly. We do so by defining a `Triangle_init()`

function:

```
// REQUIRES: tri points to a Triangle object
// MODIFIES: *tri
// EFFECTS: Initializes the triangle with the given side lengths.
void Triangle_init(Triangle *tri, double a_in,
double b_in, double c_in) {
tri->a = a_in;
tri->b = b_in;
tri->c = c_in;
}
int main() {
Triangle t1;
Triangle_init(&t1, 3, 4, 5);
Triangle_scale(&t1, 2);
cout << Triangle_perimeter(&t1) << endl;
}
```

The user of the `Triangle`

ADT creates an object without an explicit
initialization and then calls `Triangle_init()`

on its address to
initialize it, providing the side lengths. After that call, the
`Triangle`

has been properly initialized and can be used with the
other ADT functions. Now if the implementation of `Triangle`

changes, as long as the interface remains the same, the code in
`main()`

will work as before. The code within the ADT, in the
`Triangle_...`

functions, will need to change, but outside code that
uses the ADT will not. The following illustrates an implementation of
`Triangle`

that represents a triangle by two sides and an angle:

```
// A triangle ADT.
struct Triangle {
double side1;
double side2;
double angle;
};
// REQUIRES: tri points to a Triangle object
// MODIFIES: *tri
// EFFECTS: Initializes the triangle with the given side lengths.
void Triangle_init(Triangle *tri, double a_in,
double b_in, double c_in) {
tri->side1 = a_in;
tri->side2 = b_in;
tri->angle = std::acos((std::pow(a_in, 2) + std::pow(b_in, 2) -
std::pow(c_in, 2)) /
(2 * a_in * b_in));
}
// REQUIRES: tri points to a valid Triangle
// EFFECTS: Returns the first side of the given Triangle.
double Triangle_side1(const Triangle *tri) {
return tri->side1;
}
// REQUIRES: tri points to a valid Triangle
// EFFECTS: Returns the second side of the given Triangle.
double Triangle_side2(const Triangle *tri) {
return tri->side2;
}
// REQUIRES: tri points to a valid Triangle
// EFFECTS: Returns the third side of the given Triangle.
double Triangle_side3(const Triangle *tri) {
return std::sqrt(std::pow(tri->side1, 2) +
std::pow(tri->side2, 2) -
2 * tri->side1 * tri->side2 * std::acos(tri->angle));
}
// REQUIRES: tri points to a valid Triangle
// EFFECTS: Returns the perimeter of the given Triangle.
double Triangle_perimeter(const Triangle *tri) {
return Triangle_side1(tri) + Triangle_side2(tri) + Triangle_side3(tri);
}
// REQUIRES: tri points to a valid Triangle; s > 0
// MODIFIES: *tri
// EFFECTS: Scales the sides of the Triangle by the factor s.
void Triangle_scale(Triangle *tri, double s) {
tri->side1 *= s;
tri->side2 *= s;
}
```

Here, we have added *accessor* or *getter* functions for each of the
sides, allowing a user to obtain the side lengths without needing to
know implementation details. Even within the ADT itself, we have used
`Triangle_side3()`

from within `Triangle_perimeter()`

to avoid
code duplication.

The REQUIRES clauses of the ADT functions make a distiction between
`Triangle`

objects and *valid* `Triangle`

objects. The former
refers to an object that is of type `Triangle`

but may not have been
properly initialized, while the latter refers to a `Triangle`

object
that has been initialized by a call to `Triangle_init()`

. Except for
`Triangle_init()`

, the ADT functions all work on valid
`Triangle`

s.

Now that we have a full definition of a C-style ADT, we adhere to the
following convention for working with one: the user of a C-style ADT
may only interact with the ADT through its interface, meaning the
functions defined as part of the ADT’s interface. The user is
generally **prohibited** from accessing struct member variables
directly, as those are implementation details of the ADT. **This
convention also holds in testing an ADT**, since tests should only
exercise the behavior of an ADT and not its implementation.

## Representation Invariants¶

When designing an abstract data type, we must build a data
representation on top of existing types. Usually, there will be cases
where the underlying data representation permits combinations of
values that do not make sense for our ADT. For example, not every
combination of three `double`

s represents a valid triangle – a
`double`

may have a negative value, but a triangle may not have a
side with negative length. The space of values that represent valid
instances of a triangle abstraction is a subset of the set of values
that can be represented by three `double`

s, as illustrated in
Figure 39.

Thus, when designing an ADT, we need to determine the set of values
that are valid for the ADT. We do so by specifying *representation
invariants* for our ADT, which describe the conditions that must be
met in order to make an object valid. For a triangle represented as a
`double`

for each side, the following representation invariants must
hold:

The length of each side must be positive.

The

*triangle inequality*must hold: the sum of any two sides must be strictly greater than the remaining side.

Often, we document the representation invariants as part of the ADT’s data definition:

```
// A triangle ADT.
struct Triangle {
double a;
double b;
double c;
// INVARIANTS: a > 0 && b > 0 && c > 0 &&
// a + b > c && a + c > b && b + c > a
};
```

We then enforce the invariants when constructing or modifying an ADT object by encoding them into the REQUIRES clauses of our functions. We can use assertions to check for them as well, where possible:

```
// REQUIRES: tri points to a Triangle object;
// each side length is positive (a > 0 && b > 0 && c > 0);
// the sides meet the triangle inequality
// (a + b > c && a + c > b && b + c > a)
// MODIFIES: *tri
// EFFECTS: Initializes the triangle with the given side lengths.
void Triangle_init(Triangle *tri, double a, double b, double c) {
assert(a > 0 && b > 0 && c > 0); // positive lengths
assert(a + b > c && a + c > b && b + c > a); // triangle inequality
tri->a = a;
tri->b = b;
tri->c = c;
}
// REQUIRES: tri points to a valid Triangle; s > 0
// MODIFIES: *tri
// EFFECTS: Scales the sides of the Triangle by the factor s.
void Triangle_scale(Triangle *tri, double s) {
assert(s > 0); // positive lengths
tri->a *= s;
tri->b *= s;
tri->c *= s;
}
```

## Plain Old Data¶

As mentioned above, we adhere to the convention of only interacting
with an ADT through its interface. Usually, this means that we do not
access the data members of an ADT in outside code. However,
occasionally we have the need for an ADT that provides no more
functionality than grouping its members together. Such an ADT is just
*plain old data (POD)* 1, without any functions that operate on
that data, and we define its interface to be the same as its
implementation.

- 1
We use the term “plain old data” in the generic sense and not as the specific C++ term. C++ has a generalization of POD types called aggregates. Technically, the

`Person`

struct we saw last time is an aggregate but not a POD. What we mention here for POD types generally applies to aggregates as well.

The following is an example of a `Pixel`

struct used as a POD:

```
// A pixel that represents red, green, and blue color values.
struct Pixel {
int r; // red
int g; // green
int b; // blue
};
int main() {
Pixel p = { 255, 0, 0 };
cout << p.r << " " << p.g << " " << p.b << endl;
}
```

The `Pixel`

ADT consists of just a data representation with no
further functionality. Since it is a POD, its interface and
implementation are the same, so it is acceptable to access its members
directly.

## Abstraction Layers¶

As with procedural abstraction, data abstraction is also defined in layers, with each layer interacting solely with the interface of the layer below and not its implementation. For example, we can represent an image using three matrices, one for each color channel. Any code that uses an image relies on the image interface, without needing to know that it is implemented over three matrices. Each matrix in turn can be represented using a single-dimensional array. Code that uses a matrix relies on the 2D abstraction provided by the interface without needing to know that it is implemented as a 1D array under the hood.

## Testing an ADT¶

As mentioned previously, code outside of an ADT’s implementation must interact with the ADT solely through its interface, including test code. Modifying an ADT’s implementation should not require modifying its test code – we should be able to immediately run our regression tests in order to determine whether or not the ADT still works.

Adhering to the interface often means that we can’t test each ADT
function individually. For instance, we cannot test
`Triangle_init()`

in isolation; instead, we can test it in
combination with the side accessors (e.g. `Triangle_side1()`

) to
determine whether or not the initialization works correctly. Instead
of testing individual functions, we test individual *behaviors*, such
as initialization.

As another example, let’s proceed to design and test an ADT to represent a coordinate in two-dimensional space, using the principle of test-driven development that we saw previously. We will use polar coordinates, which represent a coordinate by the radius from the origin and angle from the horizontal axis, and we reflect this in the name of the ADT and its interface.

We start by determining the interface of the ADT:

```
// A set of polar coordinates in 2D space.
struct Polar;
// REQUIRES: p points to a Polar object
// MODIFIES: *p
// EFFECTS: Initializes the coordinate to have the given radius and
// angle in degrees.
void Polar_init(Polar* p, double radius, double angle);
// REQUIRES: p points to a valid Polar object
// EFFECTS: Returns the radius portion of the coordinate as a
// nonnegative value.
double Polar_radius(const Polar* p);
// REQUIRES: p points to a valid Polar object
// EFFECTS: Returns the angle portion of the coordinate in degrees as
// a value in [0, 360).
double Polar_angle(const Polar* p);
```

We then proceed to write some test cases, following the principles of test-driven development:

```
// Basic test of initializing a Polar object.
TEST(test_init_basic) {
Polar p;
Polar_init(&p, 5, 45);
ASSERT_EQUAL(Polar_radius(&p), 5);
ASSERT_EQUAL(Polar_angle(&p), 45);
}
```

We can then proceed to define a data representation. As part of this
process, we should consider what representation invariants our ADT
should have. For our `Polar`

ADT, a reasonable set of invariants is
that the radius is nonnegative, and the angle is in the range
\([0, 360)\) (using degrees rather than radians) 2:

```
struct Polar {
double r;
double phi;
// INVARIANTS: r >= 0 && phi >= 0 && phi < 360
};
```

- 2
A complete set of invariants would likely also specify a canonical representation of the origin. For example, it may specify that the if the radius is 0, then so is the angle.

Now that we have a data representation, we can make an initial attempt at implementing the functions as well:

```
void Polar_init(Polar* p, double radius, double angle) {
p->r = radius;
p->phi = angle;
}
double Polar_radius(const Polar* p) {
return p->r;
}
double Polar_angle(const Polar* p) {
return p->phi;
}
```

We can run our existing test cases to get some confidence that our
code is working. In addition, the process of coming up with a data
representation, representation invariants, and function definitions
often suggests new test cases. For instance, the following test cases
check that the representation invariants are met when `Polar_init()`

is passed values that don’t directly meet the invariants:

```
// Tests initialization with a negative radius.
TEST(test_negative_radius) {
Polar p;
Polar_init(&p, -5, 225);
ASSERT_EQUAL(Polar_radius(&p), 5);
ASSERT_EQUAL(Polar_angle(&p), 45);
}
// Tests initialization with an angle >= 360.
TEST(test_big_angle) {
Polar p;
Polar_init(&p, 5, 405);
ASSERT_EQUAL(Polar_radius(&p), 5);
ASSERT_EQUAL(Polar_angle(&p), 45);
}
```

Given our initial implementation, these test cases will fail. We can attempt to fix the problem as follows:

```
void Polar_init(Polar* p, double radius, double angle) {
p->r = std::abs(radius); // set radius to its absolute value
p->phi = angle;
if (radius < 0) { // rotate angle by 180 degrees if radius
p->phi = p->phi + 180; // was negative
}
}
```

Running our test cases again, we find that both
`test_negative_radius`

and `test_big_angle`

still fail: the angle
value returned by `Polar_angle()`

is out of the expected range. We
can fix this as follows:

```
void Polar_init(Polar* p, double radius, double angle) {
p->r = std::abs(radius); // set radius to its absolute value
p->phi = angle;
if (radius < 0) { // rotate angle by 180 degrees if radius
p->phi = p->phi + 180; // was negative
}
p->phi = std::fmod(p->phi, 360); // mod angle by 360
}
```

Now both test cases succeed. However, we may have thought of another test case through this process:

```
// Tests initialization with a negative angle.
TEST(test_negative_angle) {
Polar p;
Polar_init(&p, 5, -45);
ASSERT_EQUAL(Polar_radius(&p), 5);
ASSERT_EQUAL(Polar_angle(&p), 315);
}
```

Unfortunately, this test case fails. We can try another fix:

```
void Polar_init(Polar* p, double radius, double angle) {
p->r = std::abs(radius); // set radius to its absolute value
p->phi = angle;
if (radius < 0) { // rotate angle by 180 degrees if radius
p->phi = p->phi + 180; // was negative
}
p->phi = std::fmod(p->phi, 360); // mod angle by 360
if (p->phi < 0) { // rotate negative angle by 360
p->phi += 360;
}
}
```

Our test cases now all pass.

# Streams in C++¶

Previously, we learned about the standard input and output streams, as well as file streams. We examine the relationship between streams more closely now, as well as how to write unit tests using string streams.

A *stream* is an abstraction over a *source* of input, from which we
can read data, or a *sink* of output, to which we can write data.
Streams support the abstraction of character-based input and output
over many underlying resources, including the console, files, the
network, strings, and so on.

In C++, input streams generally derive from `istream`

3. We will
see what this means specifically when we look at inheritance and polymorphism in
the future. For our purposes right now, this means that we can pass
different kinds of input-stream objects to a function that takes in a
reference to an `istream`

. Similarly, output streams generally
derive from `ostream`

, and we can pass different kinds of
output-stream objects to a function that takes in a reference to an
`ostream`

.

- 3
The

`istream`

type is actually an alias for`basic_istream<char>`

, which is an input stream that supports input using the`char`

type. The same goes for`ostream`

and`basic_ostream<char>`

.

To write data into an output stream, we use the *insertion operator*
`<<`

. The actual data written out depends on both the value itself
as well as its type. For instance, if we use a `string`

as the
right-hand-side operand, the insertion operation will write the
characters from the string into the stream:

```
int i = 123;
cout << i; // writes the characters 123
double d = 12.3;
cout << d; // writes the characters 12.3
char c = 'c';
cout << c; // writes the character c
string s = "Hello";
cout << s; // writes the characters Hello
```

Expressions that apply an operator generally evaluate to a value. In the case of stream insertion, the result is the actual stream object itself. This allows us to chain insertion operations:

```
cout << i << d << endl;
// equivalent to ((cout << i) << d) << endl;
// ^^^^^^^^^
// evaluates back to the cout object
```

To read data from an input stream, we use the *extraction operator*
`>>`

, with an object on the right-hand side. The characters are
interpreted according to the type of the object. For built-in types,
whitespace is generally skipped when extracting.

```
char c;
cin >> c; // reads a single character; does not skip whitespace
string s;
cout >> s; // reads in one "word", delimited by whitespace
int i;
cin >> i; // attempts to parse the next characters as an integer value
double d;
cin >> d; // attempts to parse the next characters as a floating-point value
```

As with the insertion operator, an expression that applies the extraction operator evaluates back to the stream itself, allowing extraction operations to be chained:

```
cin >> c >> s >> i >> d;
```

## String Streams¶

When writing unit tests, we often want the tests to be standalone
without requiring access to external data. For tests that work with
streams, we can use string streams rather than standard input/output
or file streams. To use a string stream, we `#include <sstream>`

. We
can then use an `istringstream`

as an input stream, and an
`ostringstream`

as an output stream.

The following is an example of using an `istringstream`

to represent
input data for testing a function that takes in an input stream:

```
TEST(test_image_basic) {
// A hardcoded PPM image
string input = "P3\n2 2\n255\n255 0 0 0 255 0 \n";
input += "0 0 255 255 255 255 \n";
// Use istringstream for simulated input
istringstream ss_input(input);
Image *img = new Image;
Image_init(img, ss_input);
ASSERT_EQUAL(Image_width(img), 2);
Pixel red = { 255, 0, 0 };
ASSERT_TRUE(Pixel_equal(Image_get_pixel(img, 0, 0), red));
delete img;
}
```

We start with a `string`

that contains the actual input data and
then construct an `istringstream`

from that. We can then pass that
`istringstream`

object to a function that has a parameter of type
`istream &`

. When that function extracts data, the result will be
the data from the string we used to construct the `istringstream`

.

We can similarly use an `ostringstream`

to test a function that
takes an output stream:

```
TEST(test_matrix_basic) {
Matrix *mat = new Matrix;
Matrix_init(mat, 3, 3);
Matrix_fill(mat, 0);
Matrix_fill_border(mat, 1);
// Hardcoded correct output
string output_correct = "3 3\n1 1 1 \n1 0 1 \n1 1 1 \n";
// Capture output in ostringstream
ostringstream ss_output;
Matrix_print(mat, ss_output);
ASSERT_EQUAL(ss_output.str(), output_correct);
delete mat;
}
```

We default construct an `ostringstream`

object and pass it to a
function with a parameter of type `ostream &`

. The `ostringstream`

will internally capture the data that the function inserts into it. We
can then call `.str()`

on the `ostringstream`

to obtain a
`string`

that contains that data, which we can then compare to
another `string`

that contains the expected output.