Introduction
Welcome to EECS 280: Programming and Introductory Data Structures! This course covers several fundamental concepts in programming, including basic principles such as procedural and data abstraction, resource management, and basic data structures. The following is the official course description:
Techniques and algorithm development and effective programming, top-down analysis, structured programming, testing, and program correctness. Program language syntax and static and runtime semantics. Scope, procedure instantiation, recursion, abstract data types, and parameter passing methods. Structured data types, pointers, linked data structures, stacks, queues, arrays, records, and trees.
This course, and Computer Science in general, is not about computers, as stated by Hal Abelson, author of the seminal textbook Structure and Interpretation of Computer Programs:
[Computer science] is not really about computers – and it’s not about computers in the same sense that physics is not really about particle accelerators, and biology is not about microscopes and Petri dishes…and geometry isn’t really about using surveying instruments.
Instead, the purpose of this course is to examine the generalizable concepts in programming. To understand what a generalizable concept is, let’s take a diversion from programming and consider a concept in baking. How many eggs do you put in a cake (assuming a single-layer, eight-inch cake)? Eggs serve several functions in a cake, including providing structure, influencing the texture and smoothness, providing moisture, and contributing to the taste of the cake. As an experiment, Summer Stone of The Cake Blog varied the number of eggs in a recipe between zero and four, finding the following:
0 eggs: The cake was short and dense, and it fell apart when cut. It also tasted very much like flour.
1 egg: The cake was dense and compact, but held together when cut and had a pleasant taste.
2 eggs: The cake had a greater height and lighter texture and also had a pleasant taste.
3 eggs: The cake was even taller and lighter, with a slightly spongy texture, and still had a pleasant taste.
4 eggs: The cake was short, dense, and rubbery and tasted very much like egg.
The generalizable concept here is that eggs provide structure to a cake, and that more eggs results in a lighter cake, up to a point. Thus, the structure and texture of a cake can be manipulated by varying the number of eggs, as long as there aren’t too few or too many.
The topics that we will cover in this course concern the generalizable concepts in programming, including procedural abstraction, data abstraction, dynamic resource management, object orientation, and many others. We will use C++ as our vehicle for learning these concepts, and we will implement several large programming projects to develop our experience and understanding of the concepts. However, learning C++ and implementing projects aren’t the end goal; rather, they are the mechanisms we use to explore the topics.
C++ Fundamentals
C++ is a programming language, a language designed for expressing programs at a higher level than machine code, which is what directly runs on a computer’s central processing unit (CPU). For example, Intel and AMD processors (typically used in Windows machines) implement the x86_64 instruction set architecture (ISA), which results in machine code that looks like the following:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 3
mov DWORD PTR [rbp-8], 4
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
Apple CPUs, as well as most mobile processors, implement the AArch64 ISA, with code like:
sub sp, sp, #16
mov w0, 3
str w0, [sp, 12]
mov w0, 4
str w0, [sp, 8]
ldr w1, [sp, 12]
ldr w0, [sp, 8]
add w0, w1, w0
Machine code is extremely low level, so rather than writing programs
in it directly, we prefer to use a higher-level programming language
that allows us to write more readable code that can work on many
different CPU architectures. We can then use a compiler to translate
the program into the equivalent machine code to run directly on a CPU.
Examples of compilers for C++ include g++
(the GNU C++ compiler),
Clang, and MSVC (Microsoft Visual C++).
Alternatively, we can run a higher-level program through an interpreter, which rather than translating the program into machine code, directly emulates the behavior of the program. Lobster is an example of an interpreter for (a subset of) C++.
To illustrate the compilation process, let’s start with a small C++ program that prints to standard output (typically displaying in a console/terminal):
#include <iostream>
using namespace std;
int main() {
// print a greeting
cout << "Hello World!" << endl;
// Declare some variables
int x = 10 + 5;
int y = 0;
y = x;
x = 20;
// Print out the result
cout << "x = " << x << ", y = " << y << endl;
}
We will momentarily discuss what each part of the program means, but
first let’s compile and run the code. Assuming that the code is
located in the file hello.cpp
, we can compile it from the terminal
(or command line) with g++
:
g++ --std=c++17 hello.cpp -o hello.exe
The items following g++
are arguments (or command-line arguments to be precise) to the compiler:
--std=c++17
tells the compiler to use the C++17 version of the languagehello.cpp
is our program-o hello.exe
tells the compiler to name the resulting machine-code file, or executable, ashello.exe
After running the above command, we can now run the resulting executable from the command line:
./hello.exe
Here, ./
tells the terminal that the executable is in the current
directory (or folder). The terminal runs the program and displays
the printed output:
Hello World!
x = 20, y = 15
Returning to the program source code, the first line (#include
<iostream>
) imports functionality from the C++ standard library,
allowing us to write output to the screen and read input from the
keyboard. In particular, it defines cout
, which we can use with
the insertion operator <<
to print values to standard output.
The cout
variable is actually defined inside the std
(short
for “standard”) namespace, which is a way to organize names to avoid
conflicts with other libraries. There are several ways we can use a
name defined within a namespace:
with a qualified name that includes the namespace, as in
std::cout
:#include <iostream> int main() { // print a greeting std::cout << "Hello World!" << std::endl; }
by first importing each name we want to use directly via using declarations, after which we can use the name without qualification:
#include <iostream> using std::cout; using std::endl; int main() { // print a greeting cout << "Hello World!" << endl; }
by importing all the names from a namespace at once:
#include <iostream> using namespace std; int main() { // print a greeting cout << "Hello World!" << endl; }
Be careful with the last option, as there can be many names defined within a namespace, and importing them all increases the likelihood of there being a conflict with names defined outside of the namespace. For simplicity, we will use it here.
After the initial two lines, we have a definition of a main()
function, which is the entry point of a C++ program. The function
returns an integer, represented by the int
type, and that comes
before the function name itself. For now, we define main()
to not
take any arguments. (If the function takes arguments, there would be
function parameters between the parentheses; we will return to this
later.) Then we have the body of the function, enclosed by a pair of
matching curly braces. Within the body, we have a sequence of
statements, which are executed in order. The first line of the body
is actually a comment rather than a statement:
// print a statement
This does not do anything, but it serves as documentation for someone
reading the code. We can write a comment by starting it with double
slashes, in which case the rest of the line comprises the comment.
Alternatively, we can use the sequence /*
to open a comment and
*/
to close it, which allows a comment to span multiple lines or
part of a line:
int main() /* a comment here */ {
/* here is a
multiline
comment */
}
After the initial comment, we have a statement that prints to standard output:
cout << "Hello World!" << endl;
We use the insertion operator <<
to insert the string "Hello
World!"
– a string is a sequence of characters, and we can write
one directly in our program by enclosing it in double quotes. (Be
aware that unlike some other languages, C++ strings cannot span
multiple lines in the source code.) After inserting the string, we
chain another insertion of endl
, which is defined by
<iostream>
in the std
namespace. Inserting endl
writes a
newline, and it also flushes the output, meaning that it forces the
output to appear immediately. (Without flushing, the output can be
buffered until a later time, which can result in better performance
at the expense of delaying output.) Finally, we end the statement with
a semicolon, which is required at the end of simple statements in C++.
Unlike Python and some other languages, whitespace is not generally
significant in C++ programs, so the end of a line does not
automatically end a statement. We can see what happens if we forget a
semicolon:
#include <iostream>
using namespace std;
int main() {
// print a greeting
cout << "Hello World!" << endl // oops
}
Attempting to compile the program results in the following:
hello.cpp:7:33: error: expected ';' after expression
cout << "Hello World!" << endl // oops
^
;
1 error generated.
This is a compile error, and it is the compiler informing us that we have a mistake in our code. The compiler does not produce an executable in such a case, and we have to fix our error and recompile.
Continuing where we left off in our program, we have the following line:
int x = 10 + 5;
This is a variable declaration, which introduces a new variable, and in C++, it has three components:
the type of the variable (
int
for an integer in this case)the name of the variable (
x
here)an optional initialization expression (
10 + 5
above), which specifies the initial value of the variable
Unlike languages like Python, C++ is statically typed, meaning that the type of a variable must be known at compile time, generally by explicitly specifying the type in the variable’s declaration. And the type of the variable never changes – for as long as it exists, it will have the type specified in its declaration.
A variable’s initialization is an expression, which is a fragment of code that evaluates to some value. (An expression may also have a side effect, which modifies the state of the program in some way, such as by printing to the screen or changing the value of an existing variable.) Expressions may be composed of the following elements:
literals, which specify a value directly in source code, such as
42
or"Hello World!"
variables
functions calls, such as
sqrt(x)
operators such as
+
and<<
In the case of the variable declaration above, the initialization
expression 10 + 5
evaluates to the value 15, which then becomes
the initial value of the variable x
.
We will see later when initialization expressions are required. For
now, we note that for a variable of type int
, without an
initialization expression, the variable would have an undefined
value, which would result in undefined behavior if we were to use
its value. Undefined behavior means that we cannot rely on the outcome
– the program might work, or it might crash, or it might empty your
bank account. Thus, undefined behavior should be avoided at all costs.
Examining the following lines,
int y = 0;
y = x;
x = 20;
we see that the variable y
is introduced with initial value 0.
Then we have an assignment, which copies the value from the
right-hand side into the left – in this case, the value of x
,
which is 15, is copied into y
. The next line then assigns the
value 20 to x
. This, when the last line
cout << "x = " << x << ", y = " << y << endl;
executes, we see that x
has the value 20 and y
the value 15.
A variable can only be used when it is in scope. A scope is a region of source code where names are meaningful, and such a region often corresponds to a block of code delimited by curly braces. A variable is in scope starting at its declaration until the end of the scope region in which it is defined. For example, consider the following code:
int main() {
int x = 5; // x is now in scope
cout << y << endl; // COMPILE ERROR -- y is not in scope
int y = -3; // y is now in scope
if (x > y) {
int a = x - y; // a is now in scope
cout << a << endl;
} // a is no longer in scope
cout << a << endl; // COMPILE ERROR -- a is not in scope
}
The variable y
is not in scope until its declaration, so
attempting to use it in the second statement is an error. Below, the
curly braces associated with the conditional define a new scope
region, in which the variable a
is introduced. Outside of the
curly braces, a
is no longer in scope, so using it in the last
statement is erroneous. The compiler checks for us whether or not
variables are in scope, reporting errors if we try to use one that is
not.
We have seen a simple program, how to compile and run it, and some fundamental C++ elements. We proceed to discuss more complex C++ features, such as type conversions, standard-library types, functions, and control flow.