Introduction

Welcome to EECS 280: Programming and Introductory Data Structures! This course covers several fundamental concepts in programming, including basic principles such as procedural and data abstraction, resource management, and basic data structures. The following is the official course description:

Techniques and algorithm development and effective programming, top-down analysis, structured programming, testing, and program correctness. Program language syntax and static and runtime semantics. Scope, procedure instantiation, recursion, abstract data types, and parameter passing methods. Structured data types, pointers, linked data structures, stacks, queues, arrays, records, and trees.

This course, and Computer Science in general, is not about computers, as stated by Hal Abelson, author of the seminal textbook Structure and Interpretation of Computer Programs:

[Computer science] is not really about computers – and it’s not about computers in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes…and geometry isn’t really about using surveying instruments.

Instead, the purpose of this course is to examine the generalizable concepts in programming. To understand what a generalizable concept is, let’s take a diversion from programming and consider a concept in baking. How many eggs do you put in a cake (assuming a single-layer, eight-inch cake)? Eggs serve several functions in a cake, including providing structure, influencing the texture and smoothness, providing moisture, and contributing to the taste of the cake. As an experiment, Summer Stone of The Cake Blog varied the number of eggs in a recipe between zero and four, finding the following:

  • 0 eggs: The cake was short and dense, and it fell apart when cut. It also tasted very much like flour.

  • 1 egg: The cake was dense and compact, but held together when cut and had a pleasant taste.

  • 2 eggs: The cake had a greater height and lighter texture and also had a pleasant taste.

  • 3 eggs: The cake was even taller and lighter, with a slightly spongy texture, and still had a pleasant taste.

  • 4 eggs: The cake was short, dense, and rubbery and tasted very much like egg.

The generalizable concept here is that eggs provide structure to a cake, and that more eggs results in a lighter cake, up to a point. Thus, the structure and texture of a cake can be manipulated by varying the number of eggs, as long as there aren’t too few or too many.

The topics that we will cover in this course concern the generalizable concepts in programming, including procedural abstraction, data abstraction, dynamic resource management, object orientation, and many others. We will use C++ as our vehicle for learning these concepts, and we will implement several large programming projects to develop our experience and understanding of the concepts. However, learning C++ and implementing projects aren’t the end goal; rather, they are the mechanisms we use to explore the topics.

C++ Fundamentals

C++ is a programming language, a language designed for expressing programs at a higher level than machine code, which is what directly runs on a computer’s central processing unit (CPU). For example, Intel and AMD processors (typically used in Windows machines) implement the x86_64 instruction set architecture (ISA), which results in machine code that looks like the following:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], 3
mov     DWORD PTR [rbp-8], 4
mov     edx, DWORD PTR [rbp-4]
mov     eax, DWORD PTR [rbp-8]
add     eax, edx

Apple CPUs, as well as most mobile processors, implement the AArch64 ISA, with code like:

sub     sp, sp, #16
mov     w0, 3
str     w0, [sp, 12]
mov     w0, 4
str     w0, [sp, 8]
ldr     w1, [sp, 12]
ldr     w0, [sp, 8]
add     w0, w1, w0

Machine code is extremely low level, so rather than writing programs in it directly, we prefer to use a higher-level programming language that allows us to write more readable code that can work on many different CPU architectures. We can then use a compiler to translate the program into the equivalent machine code to run directly on a CPU. Examples of compilers for C++ include g++ (the GNU C++ compiler), Clang, and MSVC (Microsoft Visual C++).

Alternatively, we can run a higher-level program through an interpreter, which rather than translating the program into machine code, directly emulates the behavior of the program. Lobster is an example of an interpreter for (a subset of) C++.

To illustrate the compilation process, let’s start with a small C++ program that prints to standard output (typically displaying in a console/terminal):

#include <iostream>

using namespace std;

int main() {
  // print a greeting
  cout << "Hello World!" << endl;

  // Declare some variables
  int x = 10 + 5;
  int y = 0;
  y = x;
  x = 20;

  // Print out the result
  cout << "x = " << x << ", y = " << y << endl;
}

We will momentarily discuss what each part of the program means, but first let’s compile and run the code. Assuming that the code is located in the file hello.cpp, we can compile it from the terminal (or command line) with g++:

g++ --std=c++17 hello.cpp -o hello.exe

The items following g++ are arguments (or command-line arguments to be precise) to the compiler:

  • --std=c++17 tells the compiler to use the C++17 version of the language

  • hello.cpp is our program

  • -o hello.exe tells the compiler to name the resulting machine-code file, or executable, as hello.exe

After running the above command, we can now run the resulting executable from the command line:

./hello.exe

Here, ./ tells the terminal that the executable is in the current directory (or folder). The terminal runs the program and displays the printed output:

Hello World!
x = 20, y = 15

Returning to the program source code, the first line (#include <iostream>) imports functionality from the C++ standard library, allowing us to write output to the screen and read input from the keyboard. In particular, it defines cout, which we can use with the insertion operator << to print values to standard output. The cout variable is actually defined inside the std (short for “standard”) namespace, which is a way to organize names to avoid conflicts with other libraries. There are several ways we can use a name defined within a namespace:

  • with a qualified name that includes the namespace, as in std::cout:

    #include <iostream>
    
    int main() {
      // print a greeting
      std::cout << "Hello World!" << std::endl;
    }
    
  • by first importing each name we want to use directly via using declarations, after which we can use the name without qualification:

    #include <iostream>
    
    using std::cout;
    using std::endl;
    
    int main() {
      // print a greeting
      cout << "Hello World!" << endl;
    }
    
  • by importing all the names from a namespace at once:

    #include <iostream>
    
    using namespace std;
    
    int main() {
      // print a greeting
      cout << "Hello World!" << endl;
    }
    

Be careful with the last option, as there can be many names defined within a namespace, and importing them all increases the likelihood of there being a conflict with names defined outside of the namespace. For simplicity, we will use it here.

After the initial two lines, we have a definition of a main() function, which is the entry point of a C++ program. The function returns an integer, represented by the int type, and that comes before the function name itself. For now, we define main() to not take any arguments. (If the function takes arguments, there would be function parameters between the parentheses; we will return to this later.) Then we have the body of the function, enclosed by a pair of matching curly braces. Within the body, we have a sequence of statements, which are executed in order. The first line of the body is actually a comment rather than a statement:

// print a statement

This does not do anything, but it serves as documentation for someone reading the code. We can write a comment by starting it with double slashes, in which case the rest of the line comprises the comment. Alternatively, we can use the sequence /* to open a comment and */ to close it, which allows a comment to span multiple lines or part of a line:

int main() /* a comment here */ {
  /* here is a
     multiline
     comment */
}

After the initial comment, we have a statement that prints to standard output:

cout << "Hello World!" << endl;

We use the insertion operator << to insert the string "Hello World!" – a string is a sequence of characters, and we can write one directly in our program by enclosing it in double quotes. (Be aware that unlike some other languages, C++ strings cannot span multiple lines in the source code.) After inserting the string, we chain another insertion of endl, which is defined by <iostream> in the std namespace. Inserting endl writes a newline, and it also flushes the output, meaning that it forces the output to appear immediately. (Without flushing, the output can be buffered until a later time, which can result in better performance at the expense of delaying output.) Finally, we end the statement with a semicolon, which is required at the end of simple statements in C++. Unlike Python and some other languages, whitespace is not generally significant in C++ programs, so the end of a line does not automatically end a statement. We can see what happens if we forget a semicolon:

#include <iostream>

using namespace std;

int main() {
  // print a greeting
  cout << "Hello World!" << endl // oops
}

Attempting to compile the program results in the following:

hello.cpp:7:33: error: expected ';' after expression
  cout << "Hello World!" << endl // oops
                                ^
                                ;
1 error generated.

This is a compile error, and it is the compiler informing us that we have a mistake in our code. The compiler does not produce an executable in such a case, and we have to fix our error and recompile.

Continuing where we left off in our program, we have the following line:

int x = 10 + 5;

This is a variable declaration, which introduces a new variable, and in C++, it has three components:

  • the type of the variable (int for an integer in this case)

  • the name of the variable (x here)

  • an optional initialization expression (10 + 5 above), which specifies the initial value of the variable

Unlike languages like Python, C++ is statically typed, meaning that the type of a variable must be known at compile time, generally by explicitly specifying the type in the variable’s declaration. And the type of the variable never changes – for as long as it exists, it will have the type specified in its declaration.

A variable’s initialization is an expression, which is a fragment of code that evaluates to some value. (An expression may also have a side effect, which modifies the state of the program in some way, such as by printing to the screen or changing the value of an existing variable.) Expressions may be composed of the following elements:

  • literals, which specify a value directly in source code, such as 42 or "Hello World!"

  • variables

  • functions calls, such as sqrt(x)

  • operators such as + and <<

In the case of the variable declaration above, the initialization expression 10 + 5 evaluates to the value 15, which then becomes the initial value of the variable x.

We will see later when initialization expressions are required. For now, we note that for a variable of type int, without an initialization expression, the variable would have an undefined value, which would result in undefined behavior if we were to use its value. Undefined behavior means that we cannot rely on the outcome – the program might work, or it might crash, or it might empty your bank account. Thus, undefined behavior should be avoided at all costs.

Examining the following lines,

int y = 0;
y = x;
x = 20;

we see that the variable y is introduced with initial value 0. Then we have an assignment, which copies the value from the right-hand side into the left – in this case, the value of x, which is 15, is copied into y. The next line then assigns the value 20 to x. This, when the last line

cout << "x = " << x << ", y = " << y << endl;

executes, we see that x has the value 20 and y the value 15.

A variable can only be used when it is in scope. A scope is a region of source code where names are meaningful, and such a region often corresponds to a block of code delimited by curly braces. A variable is in scope starting at its declaration until the end of the scope region in which it is defined. For example, consider the following code:

int main() {
  int x = 5; // x is now in scope

  cout << y << endl; // COMPILE ERROR -- y is not in scope

  int y = -3; // y is now in scope

  if (x > y) {
    int a = x - y; // a is now in scope
    cout << a << endl;
  } // a is no longer in scope

  cout << a << endl; // COMPILE ERROR -- a is not in scope
}

The variable y is not in scope until its declaration, so attempting to use it in the second statement is an error. Below, the curly braces associated with the conditional define a new scope region, in which the variable a is introduced. Outside of the curly braces, a is no longer in scope, so using it in the last statement is erroneous. The compiler checks for us whether or not variables are in scope, reporting errors if we try to use one that is not.

We have seen a simple program, how to compile and run it, and some fundamental C++ elements. We proceed to discuss more complex C++ features, such as type conversions, standard-library types, functions, and control flow.