Arrays

C++ has several different categories of objects:

  • Atomic types are built into the language, and these types are atomic because their objects cannot be subdivided into smaller objects. Atomic types are also known as primitive types. Examples of atomic types include basic numeric types such as int, double, bool, and char, as well as pointer types (e.g. int *, string *).

  • Arrays are contiguous sequences of objects of the same type. They are therefore composed of homogeneous subobjects.

  • Class-type objects are objects composed of member subobjects, each which may be of a different type. Class-type objects are thus heterogeneous. We will discuss class-type objects later in the course.

An array is simple collection of objects, built into C++ and many other languages. An array has the following properties:

  • It has a fixed size, set when the array is created. This size never changes as long as the array is alive.

  • An array holds elements that are of the same type.

  • The elements of an array are stored in a specific order, with the index of the first element being 0.

  • The elements are stored contiguously in memory, one after another.

  • Accessing any element of an array takes constant time, regardless of whether the element is at the beginning, middle, or end of the array.

An array variable can be declared by placing square brackets to the right of the variable name, with a compile-time constant between the brackets, denoting the number of elements. For example, the following declares array to be an array of four int elements:

int array[4];

The following uses a named constant to declare array2 to be an array of four ints:

const int SIZE = 4;
int array2[SIZE];

In both cases, we did not provide an explicit initialization. Thus, array and array2 are default initialized by default initializing each of their elements. Since their elements are of atomic type int, they are default initialized to undefined values.

We can explicitly initialize an array with an initializer list, a list of values in curly braces:

int array[4] = { 1, 2, 3, 4 };

This initializes the element at index 0 to 1, the element at index 1 to 2, and so on.

If the initializer list contains fewer values than the size of the array, the remaining array elements are implicitly initialized. For atomic elements, these remaining elements are initialized to zero values 1. Thus, the following initializes the first two elements of array2 to 1 and 2, respectively, and the last two elements to 0:

int array2[4] = { 1, 2 };
1

This is referred to as value initialization, which is distinct from default initialization.

The following results in every element in array3 being initialized to 0:

int array3[4] = {};

Here, we have provided an empty initializer list, so that the first zero elements (i.e. none of them) are explicitly initialized while the remaining elements (i.e. all of them) are implicitly initialized to 0.

If the size of the array is the same as the size of the initializer list, we can elide the size of the array in its declaration:

int array[] = { 1, 2, 3, 4 };

Figure 20 illustrates the layout of array in memory.

_images/04_array_layout.svg

Figure 20 Layout of an array in memory.

This diagram assumes that an int takes up four bytes in memory, which is the case on most modern machines.

Individual array elements can be accessed with square brackets, with an index between the brackets. Indexing starts at 0, up through the size of the array minus one. For example, the following increments each element in array by one and prints out each resulting value:

for (int i = 0; i < 4; ++i) {
  ++array[i];
  cout << array[i] << endl;
}

Arrays and Pointers

Arrays in C++ are objects. However, in most contexts, there isn’t a value associated with an array as a whole 2. The individual elements (if they are not of array type), have values, but not the array as a whole. Instead, when we use an array in a context where a value is required, the compiler converts the array into a pointer to the first element in the array:

2

The system of values in C++ is very complicated and beyond the scope of this course. In the context of this course, we use the term value to mean something called an rvalue in programming-language terms. There are a handful of ways to construct an array rvalue in C++, but none that we will encounter in this course.

int array[] = { 1, 2, 3, 4 };
cout << &array[0] << endl; // prints 0x1000 assuming the figure above
cout << array << endl;     // prints 0x1000 assuming the figure above
*array = -1;
cout << array[0] << endl;  // prints -1

In this example, assuming the layout in Figure 20 where the first element is at address 0x1000, printing array to standard output just prints out the address 0x1000 – it converts array to a pointer to its first element, and it is the pointer’s value that is then printed. Similarly, dereferencing the array first turns it into a pointer to the first element, followed by the dereference that gives us the first element itself.

The tendency of arrays to decay into pointers results in significant limitations when using an array. For instance, we cannot assign one array to another – the right-hand side of an assignment requires a value, which in the case of an array will become a pointer, which is then incompatible with the left-hand side array:

int arr1[4] = { 1, 2, 3, 4 };
int arr2[4] = { 5, 6, 7, 8 };
arr2 = arr1;  // error: LHS is an array, RHS is a pointer

As discussed before, by default, C++ passes parameters by value. This is also true if the parameter is an array. Since an array decays to a pointer when its value is required, this implies that an array is passed by value as a pointer to its first element. Thus, an array parameter to a function is actually equivalent to a pointer parameter, regardless of whether or not the parameter includes a size:

void func1(int arr[4]); // parameter equivalent to int *arr

void func2(int arr[5]); // parameter equivalent to int *arr

void func3(int arr[]);  // parameter equivalent to int *arr

void func4(int *arr);

int main() {
  int arr1[4] = { 1, 2, 3, 4 };
  int arr2[5];
  int x = -3;

  func1(arr1);  // OK: arr1 turns into pointer, as does parameter of func1
  func2(arr1);  // OK: arr1 turns into pointer, as does parameter of func2
                //     compiler ignores size in func2 parameter
  func3(arr1);  // OK: arr1 turns into pointer, as does parameter of func3
  func4(arr1);  // OK: arr1 turns into pointer, matches parameter of func4

  func1(arr2);  // OK: arr2 turns into pointer, as does parameter of func1
                //     compiler ignores size in func1 parameter
  func2(arr2);  // OK: arr2 turns into pointer, as does parameter of func2
  func3(arr2);  // OK: arr2 turns into pointer, as does parameter of func3
  func4(arr2);  // OK: arr2 turns into pointer, matches parameter of func4

  func1(&x);    // OK: parameter of func1 turns into pointer
  func2(&x);    // OK: parameter of func2 turns into pointer
  func3(&x);    // OK: parameter of func3 turns into pointer
  func4(&x);    // OK: matches parameter of func4
}

This means that a function that takes an array as a parameter cannot guarantee that the argument value corresponds to an array of matching size, or even that it is a pointer into an array. Instead, we need another mechanism for passing size information to a function; we will come back to this momentarily.

Pointer Arithmetic

C++ supports certain arithmetic operations on pointers:

  • An integral value can be added to or subtracted from a pointer, resulting in a pointer that is offset from the original one.

  • Two pointers can be subtracted, resulting in an integral value that is the distance between the pointers.

Pointer arithmetic is in terms of number of elements rather than number of bytes. For instance, if an int takes up four bytes of memory, then adding 2 to an int * results in a pointer that is two ints forward in memory, or a total of eight bytes:

int array[] = { 4, 3, 2, 1 };
int *ptr1 = array;      // pointer to first element
int *ptr2 = &array[2];  // pointer to third element
int *ptr3 = ptr + 2;    // pointer to third element
int *ptr4 = array + 2;  // pointer to third element
++ptr1;                 // move pointer to second element

In initializing ptr4, array is converted to a pointer to its first element, since the + operator requires a value, and the result is two ints forward in memory, producing a pointer to the third element. The last line increments ptr1 to point to the next int in memory. The result is shown in Figure 21.

_images/04_pointer_arithmetic.svg

Figure 21 Pointer arithmetic is in terms of whole objects, not bytes.

The following demonstrates subtracting pointers:

cout << ptr2 - ptr1 << endl;  // prints 1

Since ptr2 is one int further in memory than ptr, the difference ptr2 - ptr is 1.

Pointer arithmetic is one reason why each C++ type has its own pointer type – in order to be able to do pointer arithmetic, the compiler needs to use the size of the pointed-to type, so it needs to know what that type is. For example, implementations generally represent double objects with eight bytes, so adding 2 to a double * moves 16 bytes forward in memory. In general, for a pointer of type T *, adding N to it moves N * sizeof(T) bytes forward in memory 3.

3

sizeof is an operator that can be applied to a type to obtain the number of bytes used to represent that type. When applied to a type, the parentheses are mandatory (e.g. sizeof(int)). The operator can also be applied to a value, in which case it results in the size of the compile-time type of that value. Parentheses are not required in this case (e.g. sizeof 4 or sizeof x).

Pointers can also be compared with the comparison operators, as in the following using the pointers declared above:

cout << (ptr1 == ptr2) << endl;        // false (prints as 0)
cout << (ptr2 == ptr3) << endl;        // true (prints as 1)
cout << (ptr1 < ptr2) << endl;         // true
cout << (*ptr1 < *ptr2) << endl;       // false (compares element values)
++ptr1;
cout << (ptr1 == ptr2) << endl;        // true
cout << (array == &array[0]) << endl;  // true (LHS turns into pointer)

Arithmetic is generally useful only on pointers to array elements, since only array elements are guaranteed to be stored contiguously in memory. Similarly, comparisons are generally only well-defined on pointers into the same array or on pointers constructed from arithmetic operations on the same pointer.

Array Indexing

Array indexing in C++ is actually implemented using pointer arithmetic. If one of the operands to the subscript ([]) operator is an array and the other is integral, then the operation is equivalent to pointer arithmetic followed by a dereference:

int arr[4] = { 1, 2, 3, 4 };
cout << *(arr + 2) << endl;  // prints 3: arr+2 is pointer to 3rd element
cout << arr[2] << endl;      // prints 3: equivalent to *(arr + 2)
cout << 2[arr] << endl;      // prints 3: equivalent to *(2 + arr);
                             //           but don't do this!

Thus, if arr is an array and i is integral, then arr[i] is equivalent to *(arr + i):

  1. The subscript operation requires the value of arr, so it turns into a pointer to its first element.

  2. Pointer arithmetic is done to produce a pointer i elements forward in memory.

  3. The resulting pointer is dereferenced, resulting in the element at index i.

Because the subscript operation is equivalent to pointer arithmetic, it can be applied to a pointer equally as well:

int arr[4] = { 1, 2, 3, 4 };
int *ptr = arr + 1;      // pointer to second element
cout << ptr[2] << endl;  // prints 4: constructs a pointer that is 2
                         //           elements forward in memory, then
                         //           dereferences that

There are several implications of the equivalence between array indexing and pointer arithmetic. First, it is what makes array access a constant time operation – no matter the index, accessing an element turns into a single pointer addition followed by a single dereference. The equivalence is also what makes passing arrays by value work – the result is a pointer, which we can still subscript into since it just does pointer arithmetic followed by a dereference. Finally, it allows us to work with subsets of an array. For instance, the following code prints out just the middle elements of an array:

void print_array(int array[], int size) {
  for (int i = 0; i < size; ++i) {
    cout << array[i] << " ";
  }
}

int main() {
  int array[4] = { 3, -1, 5, 2 };
  print_array(arr + 1, 2);      // prints out just -1 5
}
_images/04_array_pass_by_value.svg

Figure 22 Passing a subset of an array to a function.

The print_array() function receives a pointer to the array’s second element as well as a size of 2, as shown in Figure 22. Thus, it only prints out the second and third elements; as far as the function knows, it is working with an array of size 2 that starts at the address 0x1004.

More on Array Decay

An array only decays into a pointer when its value is required. When an array object’s value is not required, it does not decay into a pointer. For example, the address-of (&) operator requires an object but not its value – thus, applying & to an array produces a pointer to the whole array, not a pointer to an individual element nor a pointer to a pointer 4.

4

A pointer to an array of 4 ints can be declared using the syntax int (*ptr_to_arr)[4];. The address value stored in a pointer to an array is generally the same address as that of the array’s first element.

Another example is applying the sizeof operator to an array. The operator produces the size of the whole array in bytes 5, as opposed to applying it to a pointer, which just produces the size of a pointer (generally eight bytes on modern systems):

int x = 42;
int arr[5] = { 1, 2, 3, 4, 5 };
int *ptr = arr;
cout << sizeof x << endl;    // 4 (on most machines)
cout << sizeof arr << endl;  // 20
cout << sizeof ptr << endl;  // 8
5

Thus, the expression sizeof array / sizeof *array recovers the number of elements, as long as array is still an array.

Once an array has turned into a pointer, the resulting pointer loses all information about the size of the array, or even that it is a pointer into an array. Thus, we need another mechanism for keeping track of the size of an array, such as when we pass the array to a function (if it is passed by value, it turns into a pointer which retains no information about the array’s size).

The End of an Array

If a program dereferences a pointer that goes past the bounds of the array, the result is undefined behavior 6. If we are lucky, the program will crash, indicating we did something wrong and giving us an opportunity to debug it. In the worst case, the program may compute the right result when we run it on our own machine but misbehave when run on a different platform (e.g. the autograder).

6

Constructing a pointer that is out of bounds is not a problem; we often construct pointers that are just past the end of an array, as we will see in a moment. It is dereferencing such a pointer that results in undefined behavior.

There are two general strategies for keeping track of where an array ends.

  1. Keep track of the length separately from the array. This can be done with either an integer size or by constructing a pointer that is just past the end of an array (by just adding the size of the array to a pointer to the array’s first element).

  2. Store a special sentinel value at the end of the array, which allows an algorithm to detect that it has reached the end.

The first strategy is what we used in defining the print_array() function above. As demonstrated there, the stored size may be smaller than the size of the array, resulting in the function operating on a subset of the array.

We will see the second strategy next time when we look at strings.

Array Traversal

The print_array() function above also demonstrates how to traverse through an array using an index that starts at 0 up to the size of the array, exclusive. The following is another example:

int const SIZE = 3;                   // constant to represent array size
int array[SIZE] = { -1, 7, 3 };
for (int i = 0; i < SIZE; ++i) {
  cout << *(array + i) << endl;               // using pointer arithmetic
  cout << array[i] << endl;                   // using subscript (better)
}
_images/04_traversal_by_index.svg

Figure 23 Traversal by index uses an index to walk through an array.

This pattern of accessing elements is called traversal by index – we use an integer index in the range \([0, SIZE)\) where \(SIZE\) is the size of the array, and we use the index to obtain the corresponding element. We can use the subscript operator or do pointer arithmetic ourselves. (The former is generally considered better, since it is more familiar and clearer to most programmers. However, you will often see both arr[0] and *arr used to access the first element.) The actual syntax we use is irrelevant to the pattern – what makes this traversal by index is that we use an integer index to access the array, and we traverse through the elements by modifying the index.

Another pattern we can use is traversal by pointer, which walks a pointer across the elements of an array:

int const SIZE = 3;                   // constant to represent array size
int array[SIZE] = { -1, 7, 3 };
int *end = array + SIZE;              // pointer just past the end of arr
for (int *ptr = array; ptr < end; ++ptr) {     // walk pointer across arr
  cout << *ptr << endl;                  // dereference to obtain element
}
_images/04_traversal_by_pointer.svg

Figure 24 Traversal by pointer uses a pointer to walk through an array.

Here, we start by constructing a pointer that is just past the end of the array: the last element is at arr + SIZE - 1, so we need to end our traversal when the pointer we are using reaches arr + SIZE. We then use another pointer that starts at the first element, dereference it to obtain an element, and then increment it to move on to the next element. The syntax we use to dereference an element is irrelevant to the pattern (it can be *ptr or ptr[0]) – what makes this traversal by pointer is that we use a pointer to each element to access the array, and we traverse through the elements by modifying that pointer.

Traversal by index is the more common pattern when working with general arrays. However traversal by pointer is applicable to two important cases:

  • It is the dominant pattern when working with C-style strings, which we will see next time.

  • It is the basis for the more general traversal by iterator, which we will see later in the course.

Thus, both patterns are important to programming in C++.

Aside from providing us insight about memory and how objects are stored, arrays are a fundamental abstraction that can be used to build more complex abstractions. Later in the course, we will see how to use arrays to build data structures such as vectors and sets.