Hunter Liu's Website

17. Week 10 Tuesday: Pointers!

≪ 16. Week 9 Thursday: Tic-Tac-Toe | Table of Contents | 18. Week 10 Thursday: const and Pointers; C-Style Arrays ≫

Suppose you are designing a struct that represents a human being. After populating it with a bunch of member variables, you decide that you want to keep track of an individual human’s spouse. So you write the following code:

1struct Human { 
2    // some member variables... 
3    // ... 
4
5    Human spouse; 
6};  

Conceptually, this code is extremely suspicious. It says to C++, “A Human is some data, plus another Human. That Human then contains more data and another Human. That Human then contains…” In other words, there is an infinite matryoshka of nested Humans created the moment one tries to declare a single Human. In fact, the compiler recognises this and gives a compilation error.

A more reasonable attempt would be to make the member variable spouse a referece to a Human instead:

1struct Human { 
2    // some member variables... 
3    // ... 
4
5    Human& spouse; 
6};  

No more infinite matryoshkas. But now there’s a problem: when a Human variable is declared, the spouse member variable must be initialised… as a reference to another Human. So how do we define the first Human?

Moreover, what happens if a Human is single? What if they want a divorce or want to marry someone else? What if they are polyamorous? None of these very realistic scenarios can be modeled by a reference.

Remark 1.

In fact, the compiler still recognises this scenario as conceptually impossible and will throw a build error.

The fix is to use a pointer rather than a reference for the spouse member variable! (You can also do vector<Human> spouses;, but this secretly uses pointers under the hood.)

This was a brief example to illustrate the necessity of the concept of a pointer. We’ll spend the remainder of this class diving into the mechanics of how pointers work.

Revisiting the Box Model

Let’s consider the very simple program

1int i = 7; 
2int& r =  i; 

This says to C++, “create an integer box named i and set it equal to 7. Then add the label r to that box.” There’s a bit more that the compiler keeps track of under the hood. When it creates the box i, your operating system says to the program, “Okay, I gave you an integer-sized box at the address 123 Ram Street.” These addresses are actually huge numbers such as 0x7fff0e28eab4 which are way too cumbersome for humans to handle.

Thence, every time you refer to the box i, C++ implicitly understands that you’re referring to the box at 123 Ram Street, or whatever address it received from the operating system. Likewise, the variable name is “bound” to the address 123 Ram Street when we say int& r = i. This is not so different from what we’ve been doing with the box model this whole quarter, we’re just giving an extra name to the locations of these boxes.

Definition 2.

A pointer is a box whose contents are memory addresses.

Let’s consider the code

1int i = 7; 
2int& r = i; 
3int* p = &i; 

int* p says we’re creating a new box called p whose contents are the address of an integer rather than an int. The &i means “get the address of i”. Continuing as before, the box p may be located at 456 Hard Drive, and its contents would be 123 Ram Street. So the code

1cout << p << endl; 

actually prints out the address 123 Ram Street. (I’m lying, it prints out some garbage like 0x7fff0e28eab4.)

To actually make use of the address, we use the dereferencing operator:

1cout << *p << endl; 

This prints out 7, since the contents of the box at 123 Ram Street are 7.

We can also modify *p like its a variable:

1int i = 7; 
2int* p = &i; 
3*p = 15; 
4cout << i << endl; 

With addresses as before, the line *p = 15 says to go to 123 Ram Street, then place the number 15 in the box that’s there.

Exercise 3.

Predict the output of the following code:

 1#include <iostream> 
 2
 3using namespace std; 
 4
 5int main() {
 6    char ch1 = '3'; 
 7    char ch2 = 'a'; 
 8
 9    char* p1 = &ch1; 
10    char* p2 = &ch2; 
11
12    *p1 += 1; 
13    *p2 = '7'; 
14
15    p2 = p1; 
16    *p1 = *p2 + 2; 
17
18    cout << ch1 << ' ' << ch2 << endl; 
19    cout << *p1 << ' ' << *p2 << endl; 
20    return 0; 
21} 

Problem 4.

Predict the output of the following code:

 1#include <iostream>
 2
 3using namespace std;
 4
 5void foo(int* p, int& r) {
 6    *p += 5;
 7    p = &r;
 8    *p *= 2;
 9}
10
11int main() {
12    int a = -3;
13    int b = 17;
14
15    int* ptr = &b;
16
17    foo(ptr, a);
18    cout << a << endl;
19    cout << b << endl;
20    cout << *ptr << endl;
21
22    return 0;
23}

All The Things That Go Wrong

Let’s now consider some of the horrible things that can happen when we start using pointers.

The first problem we’ve already seen before with references. Consider the following code:

 1int* p; 
 2
 3cout << "Enter an integer: " << endl; 
 4int i; cin >> i; 
 5
 6if(i < 0) {
 7    int a = -1; 
 8    p = &a; 
 9} else {
10    int b = 1; 
11    p = &b; 
12} 
13
14cout << *p << endl; 

Let’s trace through the program when the user enters 5.

  1. An uninitialised pointer p is created.
  2. The user enters 5, which is stored in i.
  3. The if block is skipped, and in the else block…
    • An integer b is created at 123 Ram Street, initialised to the value 1.
    • The address 123 Ram Street is stored in the pointer p.
    • The else block ends, so the box b is destroyed.
  4. The contents of the box at address 123 Ram Street are printed.

But there’s no longer a box at 123 Ram Street when the last step is run! This causes undefined behaviour.

In much the same way, trying to use an unitialised pointer is undefined behaviour. Just like how unitialised integers have unpredictable garbage, uninitialised pointers contain a practically random memory address, and trying to access that address can cause a crash or unpredictable output.

1int* p; 
2cout << *p << endl; 

Let’s revisit the motivating example: what do we do if our Human doesn’t have a spouse? Pointers, unlike references, allow us to make a Human* before the spouse exists. But how do we say “a pointer to a nonexistent human”? There is a special memory address that represents “nowhere”, called the nullptr.

1int* p = nullptr; 
2cout << *p << endl; 

This is undefined behaviour! Usually nullptr is used in exactly this situation —
it represents an address that doesn’t exist — and you therefore can’t print out the box at nowhere. Always check for a nullptr if you can!