Machine Model II

Pointer Errors

A pointer is an atomic type, since it cannot be subdivided into smaller objects. As with other atomic types, a variable of pointer type that isn’t explicitly initialized is default initialized to an undefined value:

int x = 3;
int *ptr;  // undefined value
ptr = &x;  // well-defined value -- the address of x

Dereferencing a default-initialized pointer results in undefined behavior – the program may crash, or it may not; reading the dereferenced value can result in zero, or some other random value. Undefined behavior is notoriously difficult to debug, as the behavior can be different on different machines or in different runs of the program on the same machine. Tools like Valgrind or an address sanitizer can help detect undefined behavior.

The following is another example of default initializing pointers:

int main() {
  int *x;
  int *y;
  int a = -1;
  x = &a;
  cout << *x << endl;  // prints -1
  *x = 42;
  cout << a << endl;   // prints 42
  *y = 13;             // UNDEFINED BEHAVIOR
}

_images/03_pointers_uninitialized.svg — Figure 13 Dereferencing an uninitialized pointer results in undefined behavior.

Figure 13 illustrates the execution of this code. Both x and y are default initialized to indeterminate values. The code proceeds to assign the address of a into x, so x now has a well-defined value, and it is valid to dereference x. Assigning to the object it is pointing at changes the value of that object, the one named by a. Dereferencing y, on the other hand, produces undefined behavior, since its value is a junk address. The program may crash, or it may not. It may overwrite a memory location that is in use by something else, or some location that is not in use. With undefined behavior, anything is possible.

A null pointer is a pointer that holds an address value of 0x0. No object can be located at that address, making the null value a useful value for a pointer that does not point to a valid object. In C++, the nullptr literal represents the null value, and it can be used with any pointer type:

int *ptr1 = nullptr;
double *ptr2 = nullptr;
cout << (ptr1 == nullptr) << endl;  // prints 1 (i.e. true)

Dereferencing a null pointer also results in undefined behavior. However, in most implementations, doing so will crash the program, which is generally easier to debug than other forms of undefined behavior.

A null pointer is sometimes used to indicate the lack of a value, as in the following:

// EFFECTS: Returns a pointer to the first string in vec of the given
//          length, or a null pointer if no such string is in vec.
string * find_by_length(vector<string> &vec, int length) {
  for (size_t i = 0; i < vec.size(); ++i) {
    if (vec[i].size() == length) {
      return &vec[i];
    }
  }
  return nullptr; // no string of given length
}

int main() {
  vector<string> v = { "hello", "world", "from",
                       "EECS", "280" };
  string *found = find_by_length(v, 3);
  if (found) { // null pointer has false value
    cout << "found string: " << *found << endl;
  } else {
    cout << "no such string" << endl;
  }
}

In this example, the find_by_length() function returns a pointer to a string of the given length. However, if no such string exists, the function returns a null pointer to indicate this. The caller must check whether the return value is null before attempting to dereference it.

Since a pointer is an object in its own right, it can live past the lifetime of the object it is pointing at. The following is an example:

int * get_address(int x) {
  return &x;
}

void print(int val) {
  cout << val << endl;
}

int main() {
  int a = 3;
  int *ptr = get_address(a);
  print(42);
  cout << *ptr << endl;       // UNDEFINED BEHAVIOR
}

_images/03_pointers_address_of_local.svg — Figure 14 A pointer may refer to a dead object, in which case dereferencing it produces undefined behavior.

In this code, the parameter of the get_address() function is passed by value. So the x parameter is a new object in the activation record for get_address(), and it is initialized as a copy of a. The function returns the address of x, and that value is placed in the ptr object in main(). However, x dies along with the activation record of get_address(), so ptr is now pointing at a dead object. The code then calls print(), and it so happens that its activation record is placed in the same location previously used by get_address(), as shown in Figure 14. At this point, ptr happens to point at the val object, whose value is 42. When the print() function returns, its activation record is also reclaimed. Proceeding to dereference ptr produces undefined behavior. It so happens in this implementation that 42 is printed, but other implementations may have different behavior.

We can fix the code above by passing the parameter to get_address() by reference:

int * get_address(int &x) {
  return &x;
}

void print(int val) {
  cout << val << endl;
}

int main() {
  int a = 3;
  int *ptr = get_address(a);
  print(42);
  cout << *ptr << endl;       // prints 3
}

_images/03_address_of_reference_parameter.svg — Figure 15 Example of taking the address of a reference parameter.

Now x aliases the object a in main(), as shown in Figure 15. Thus, get_address() returns the address of a, which is still alive when *ptr is printed.

Function Calls and the Call Stack

Previously, we saw a basic machine model in which the program places each object at a different location in memory. We now examine a more structured model, stack-based memory management, that is used by many language implementations.

In most implementations, the data for a function call are collectively stored within an activation record, which contains space for each of the function’s parameters and local variables, temporary objects, the return address, and other items that are needed by the function. In this course, we will generally only consider the parameters and local variables in an activation record, ignoring the other pieces of data there.

In stack-based memory management, activation records are stored in a data structure called a stack. A stack works just like a stack of pancakes: when a new pancake is made, it is placed on top of the stack, and when a pancake is removed from the stack, it is the top pancake that is taken off. Thus, the last pancake to be made is the first to be eaten, resulting in last-in, first-out (LIFO) behavior. Activation records are similarly stored: when an activation record is created, it is placed on top of the stack, and the first activation record to be destroyed is the last one that was created. This gives rise to an equivalent term stack frame for an activation record.

As an example, consider the following program:

void bar() {
}

void foo() {
  bar();
}

int main() {
  foo();
}

When the program is run, the main() function is called, so an activation record is created and added to the top of the stack. Then main() calls foo(), which places an activation record for foo() on the top of the stack. Then bar() is called, so its activation record is put on the stack. When bar() returns, its activation record is removed from the stack. Then foo() completes, removing its activation record. Finally, the activation record for main() is destroyed when the function returns. Figure 16 shows the state of the stack after each call and return.

_images/02_stack.svg — Figure 16 A stack that stores activation records.

In many implementations, the stack is actually stored upside down in memory, so that it grows downward rather than upward as shown in Figure 17. However, it still has the same LIFO behavior as a right-side-up stack.

_images/02_stack_downward.svg — Figure 17 A stack that grows downward rather than upward.

As a more complex example, consider the following program:

int plus_one(int x) {
  return x + 1;
}

int plus_two(int x) {
  return plus_one(x + 1);
}

int main() {
  int result = 0;
  result = plus_one(0);
  result = plus_two(result);
  cout << result;             // prints 3
}

At program startup, the main() function is called, creating an activation record that holds the single local variable result. The declaration of result initializes its value to 0, and the program proceeds to call plus_one(0). This creates an activation record for plus_one() that holds the parameter x. The program initializes the value of x to the argument value 0 and runs the body of plus_one(). The body computes the value of x + 1 by obtaining the value of x and adding 1 to it, resulting in a value of 1, which the function then returns. The return value replaces the original call to plus_one(0), and the activation record for plus_one is discarded before main() proceeds. The code then assigns the return value of 1 to result. Figure 18 illustrates the activation records up to this point.

_images/02_call_stack_example_a.svg — Figure 18 Activation record for `plus_one()`.

The program then proceeds to call plus_two(result). First, result is evaluated to obtain the value 1. Then an activation record is created for plus_two(), with space for its parameter x. Observe that this activation record is located in memory where the previous activation record for plus_one() was – the latter is no longer in use, so the memory can be reused. After the new activation record is created, the parameter x is initialized with the argument value 1. Then the program runs the body of plus_two().

The body of plus_two() in turn calls plus_one(x + 1). This evaluates x + 1 to obtain the value 2, creates an activation record for plus_one(), initializes the value of x in the new activation record to be 2, and then runs the body of plus_one(). The state of memory at this point is shown in Figure 19.

_images/02_call_stack_example_b.svg — Figure 19 State of stack in second call to `plus_one()`.

Observe that the new activation record for plus_one() is distinct from the previous one – each invocation of a function gets its own activation record. In addition, there are now two variables x in the program. Within the scope of plus_one(), x refers to the object located in the activation record for plus_one(), and its value is 2. Within plus_two(), x refers to the object in the activation record for plus_two(), and its value is 1.

The invocation of plus_one() computes 3 as its return value, so that value replaces the call to plus_one(), and the activation record for plus_one() is discarded. plus_two() returns that same value 3, so the value 3 replaces the call to plus_two() in main(), and the activation record for plus_two() is discarded. Then main() proceeds to assign the value 3 to result and print it out. Finally, when main() returns, its activation record too is discarded.

Function-Call Process

To summarize, the following steps occur in a function call:

For pass-by-value parameters, the argument expressions are evaluated to obtain their values.

For a pass-by-reference parameter, the corresponding argument expression is evaluated to obtain an object [1] rather than its value.

[1]
C++ allows references to const to bind to values (i.e. rvalues in programming-language terms) rather than objects (lvalues). So a reference of type const int & can bind to just the value 3, as in const int &ref = 3;.

The order in which arguments are evaluated is unspecified in C++.
A new and unique activation record is created for the call, with space for the function’s parameters, local variables, and metadata. The activation record is pushed onto the stack.
The parameters are passed, using the corresponding arguments to initialize the parameters. For a pass-by-value parameter, the corresponding argument value is copied into the parameter. For a pass-by-reference parameter, the parameter is initialized as an alias of the argument object.
The body of the called function is run. This transfer of control is often called active flow, since the code actively tells the computer which function to run.
When the called function returns, if the function returns a value, that value replaces the function call in the caller.
The activation record for the called function is destroyed. In simple cases, implementations will generally just leave in memory what is already there, simply marking the memory as no longer in use.
Execution proceeds from the point of the function call in the caller. This transfer of control is often called passive flow, since the code does not explicitly tell the computer which function to run.

The following program is an example of pass by reference:

void swap(int &x, int &y) {
  int tmp = x;
  x = y;
  y = tmp;
}

int main() {
  int a = 3;
  int b = 7;
  cout << a << ", " << b << endl;  // prints 3, 7
  swap(a, b);
  cout << a << ", " << b << endl;  // prints 7, 3
}

The program starts by creating an activation record for main(), with space for the local variables a and b. It initializes a to 3 and b to 7 and then prints out their values. The program then calls swap(a, b), which evaluates the expressions a and b to obtain their objects, creates an activation record for swap(), and initializes the parameters from the argument objects. Since the two parameters are references, the activation record does not contain user-accessible memory for the parameters. (The metadata for the function, however, may include information about the parameters.) The activation record does contain explicit space for the local variable tmp, since it is not a reference. Figure 20 illustrates the activation record.

_images/02_pass_by_reference.svg — Figure 20 Activation record for a function that uses pass by reference.

Here, the reference parameters are depicted with a dotted line between the names and the objects they reference.

The program proceeds to run the body of swap(). First, tmp is initialized to the value of x – since x refers to the same object as a in main(), the value 3 of that object is copied into the memory for tmp. Then the assignment x = y copies the value 7 from the object associated with y (b in main()) to the object associated with x (a in main()). Finally, the assignment y = tmp copies the value 3 from tmp into the object associated with y (b in main()). When swap() returns, its activation record is destroyed, and execution continues in main(). The latter prints out the new values of 7 and 3 for a and b, respectively.

A function can also return an object by reference, though we need to ensure that the lifetime of the object extends beyond the function invocation. The following is an example:

int & larger(int &x, int &y) {
  if (y > x) {
    return y;
  }
  return x;
}

int main() {
  int a = 3;
  int b = 7;
  cout << a << ", " << b << endl;  // prints 3, 7
  larger(a, b) = -1;
  cout << a << ", " << b << endl;  // prints 3, -1
}

The call larger(a, b) returns the object a or b whose value is larger, and we then assign the new value -1 to this object. If larger() were to return by value instead, attempting to assign to the resulting value would result in a compiler error.

Returning by reference is most common in data structures, where we wish to access an element by reference. For instance, this is how the indexing (or subscript) operator works on a vector:

std::vector<int> data = /* ... */;
data[0] = -1;

This operator is implemented using operator overloading, which we will discuss later.