Introduction

Welcome to EECS 280: Programming and Introductory Data Structures! This course covers several fundamental concepts in programming, including basic principles such as procedural and data abstraction, resource management, and basic data structures. The following is the official course description:

Techniques and algorithm development and effective programming, top-down analysis, structured programming, testing, and program correctness. Program language syntax and static and runtime semantics. Scope, procedure instantiation, recursion, abstract data types, and parameter passing methods. Structured data types, pointers, linked data structures, stacks, queues, arrays, records, and trees.

This course, and Computer Science in general, is not about computers, as stated by Hal Abelson, author of the seminal textbook Structure and Interpretation of Computer Programs:

[Computer science] is not really about computers – and it’s not about computers in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes…and geometry isn’t really about using surveying instruments.

Instead, the purpose of this course is to examine the generalizable concepts in programming. To understand what a generalizable concept is, let’s take a diversion from programming and consider a concept in baking. How many eggs do you put in a cake (assuming a single-layer, eight-inch cake)? Eggs serve several functions in a cake, including providing structure, influencing the texture and smoothness, providing moisture, and contributing to the taste of the cake. As an experiment, Summer Stone of The Cake Blog varied the number of eggs in a recipe between zero and four, finding the following:

0 eggs: The cake was short and dense, and it fell apart when cut. It also tasted very much like flour.
1 egg: The cake was dense and compact, but held together when cut and had a pleasant taste.
2 eggs: The cake had a greater height and lighter texture and also had a pleasant taste.
3 eggs: The cake was even taller and lighter, with a slightly spongy texture, and still had a pleasant taste.
4 eggs: The cake was short, dense, and rubbery and tasted very much like egg.

The generalizable concept here is that eggs provide structure to a cake, and that more eggs results in a lighter cake, up to a point. Thus, the structure and texture of a cake can be manipulated by varying the number of eggs, as long as there aren’t too few or too many.

The topics that we will cover in this course concern the generalizable concepts in programming, including procedural abstraction, data abstraction, dynamic resource management, object orientation, and many others. We will use C++ as our vehicle for learning these concepts, and we will implement several large programming projects to develop our experience and understanding of the concepts. However, learning C++ and implementing projects aren’t the end goal; rather, they are the mechanisms we use to explore the topics.

Machine Model

A computer program consists of source code that determines what the program does. The program itself is run on a machine, and the program directs the machine on what computation should be preferred. In order to understand a program, it is important to have a machine model that helps us reason about how the source code relates to what happens at runtime.

As an example, consider the following C++ program:

int main() {
  int x = 3;
  double y = 4.1;
  int z = x;
  x = 5;
}

When the program runs, execution starts at main(). For each of the variables declared within main(), the value of the variable is stored in some memory location. Consider a basic machine model where memory is represented as a big array, with each index in the array corresponding to a memory location that can hold a value, as shown in Figure 1.

_images/01_memory1.svg — Figure 1 Basic machine model, with memory as a linear structure with slots for different objects.

Here, the program is using memory index 6 for x, index 2 for y, and index 4 for z. (Later, we will see that a program uses a more systematic method for locating the local variables of a function.) The contents of memory illustrated above are after the initialization of x and y but before the initialization of z. The location for x has a representation of the int value 3, the location for y has a representation of the double value 4.1, and the location for z has some indeterminate value.

When the program proceeds to initialize z, it copies the value 3 from the memory for x into the memory for z, as demonstrated in Figure 2.

_images/01_memory2.svg — Figure 2 Initializing an object as a copy of another.

Finally, the assignment x = 5 modifies the value of x to be 5, as Figure 3 illustrates.

_images/01_memory3.svg — Figure 3 Modifying the value of an object.

In order to discuss the conceptual spaces for source code and runtime in more detail, we need some terminology. In source code, a name refers to some entity such as a variable, function, or type. A variable is a name that refers to an object in memory. A name has a scope, which determines what region of code can use that name to refer to an entity. For example, the scope of y in the code above begins at the declaration of y and ends at the end of the function definition for main(). Attempting to use y outside this region will result in a compiler error. A declaration is what introduces a name into the program and begins its scope.

At runtime, an object is a piece of data in memory, and it is located at some address in memory (corresponding to the index in our basic machine model above). An object has a lifetime during which it is legal to use that object. More specifically, an object is created at some point in time, and at some later point in time it is destroyed. The storage duration of an object determines its lifetime. There are three options that we will see in C++:

static: the lifetime is essentially the whole program
automatic (also called local): the lifetime is tied to a particular scope, such as a block of code
dynamic: the object is explicitly created and destroyed by the programmer

The former two durations are controlled by the compiler, while the latter is specified by the programmer. We will restrict ourselves to static and automatic storage duration until later in the course.

A variable is not the same thing as a memory object: a variable is a concept associated with source code, while an object is associated with the runtime. The same variable can refer to different objects at different times, such as a local variable in a function that is called more than once. An object that has dynamic storage duration is not associated with a variable at all.

An important consideration in the design of a language is the semantics of an initialization or assignment of the form x = y. Does this change which object x is referring to, or does it modify the value of the object that x is referring to? The first option is known as reference semantics, while the second is value semantics.

In C++, the default is value semantics. Consider the following program:

int x = 42;    // initialize value of x to 42
int y = 99;    // initialize value of y to 99
x = y;         // assign value of y to value of x

The assignment in the last line copies the value stored in the memory object associated with y into the memory object for x, as shown in Figure 4.

_images/01_value.svg — Figure 4 Assignment copies a value from the right-hand side into the left-hand-side object.

C++ supports reference semantics only when initializing a new variable. Placing an ampersand (&) to the left of the new variable name causes the name to be associated with an existing object. The following is an example with a local variable:

int x = 42;  // initialize value of x to 42
int z = 3;   // initialize value of z to 3
int &y = x;  // y and x are now names for the same object
x = 24;      // assigns 24 to object named x/y
y = z;       // Does NOT re-bind y to a different object
             // Value semantics used here.

The declaration int &y = x; introduces y as a new name for the object associated with x. Any subsequent modification to this object is reflected through both names, regardless of the name used to perform the modification. Figure 5 shows the effects of the assignments in the code above.

_images/01_reference.svg — Figure 5 A reference is another name for an existing object.

Since C++ only supports reference semantics in initialization, the association between a variable and a memory object can never be broken, except when the variable goes out of scope.