18. Week 10 Tuesday: Pointers and Arrays
≪ 17. Week 9 Tuesday: Review! | Table of Contents | 19. Week 10 Thursday: Final Review! ≫Suppose you are designing a struct that represents a human being. After populating it with a bunch of member variables, you decide that you want to keep track of an individual human’s spouse. So you write the following code:
Conceptually, this code is extremely suspicious. It says to C++, “A Human is some data, plus another Human. That Human then contains more data and another Human. That Human then contains…” In other words, there is an infinite matryoshka of nested Humans created the moment one tries to declare a single Human. In fact, the compiler recognises this and gives a compilation error.
A more reasonable attempt would be to make the member variable spouse a referece to a Human instead:
No more infinite matryoshkas. But now there’s a problem: when a Human variable is declared, the spouse member variable must be initialised…as a reference to another Human. So how do we define the first Human?
Moreover, what happens if a Human is single? What if they want a divorce or want to marry someone else? What if they are polyamorous? None of these very realistic scenarios can be modeled by a reference.
Remark 1.
In fact, the compiler still recognises this scenario as conceptually impossible and will throw a build error.
The fix is to use a pointer rather than a reference for the spouse member variable! (You can also do vector<Human> spouses;, but this secretly uses pointers under the hood.)
This was a brief example to illustrate the necessity of the concept of a pointer. We’ll spend the remainder of this class diving into the mechanics of how pointers work.
Pointer Basics and Reference Review
Throughout my live discussions, I have been describing variables as “boxes” in computer memory. Each of these boxes has a location in your computer’s physical memory, which I’ll represent with a street address. For instance, the code
creates an integer-sized box holding the value 7. This is assigned to some address — say, 123 Ram Street — and C++ remembers that the variable name i is really a human-friendly name for 123 Ram Street. This address can change every time we run the program!
Remark 2.
Computer addresses are actually just numbers — think indices in a very, very large vector of bits — but for pedagogical reasons I’ll stick with street addresses.
The second line then says that the variable name r is another human-friendly name for the same memory address 123 Ram Street. Whenever either i or r is used in the program, C++ remembers that the programmer is really referring to the integer-sized box at 123 Ram Street.
Definition 3.
A pointer is a variable that contains the memory address of another variable.
To obtain the address of a variable, use the & operator (also known as the “address-of” operator):
int i = 7;
int& r = i;
int* p = &i;
The third line says to create a box large enough to hold one memory address, and to store the address of i at that location. p is a variable itself, so it has a memory address as well, say 42 Hard Drive. The box at 42 Hard Drive contains the address 123 Ram Street!
We should read the line int* p = &i; as “create an int pointer called p and set it equal to the address of i.” We must specify that the address assigned to p is the address of an integer!
Contrast this with the behaviour of a referece. int& r = i; tells C++ to make another alias for the address 123 Ram Street, whereas int* p = &i; tells C++ to write down the address 123 Ram Street somewhere in computer memory.
To access the box “pointed to” by p, we need to use the dereferencing operator:
The second-to-last line says “go to the address stored in the variable p, then put a 15 in that box.” In this case, it will store 15 in the box at 123 Ram Street, which is exactly the same box that the variable name i refers to. Thus, the code above prints 15.
Null Pointers and Runtime Errors
Not every address will be “on the map”, so to speak.
This snippet says to make an integer pointer called p, then print out whatever is at the memory address stored in p. This is undefined behaviour: p is uninitialised, so we won’t know what address is stored there. It may be in use by another program, or it may not even exist! Chances are, it’ll crash your program.
There are cases, however, when you want to say that “p points to nothing”, or store the memory location “nowhere”. (Think back to when a Human is single!) This is when you would use a nullptr, which stands for the address “nowhere”!
Dereferencing a null pointer results in undefined behaviour, usually a crash (i.e. runtime error).
More sinister is the undefined behaviour of a “dangling pointer”, which we’ve seen before in the form of references:
1int* p;
2if(5 < 7) {
3 int a = 5;
4 p = &a;
5} else {
6 int b = 7;
7 p = &b;
8}
9cout << *p << endl;
Here, the if block will be run. C++ will create an integer called a at some location, say 123 Ram Street, and store this address in the variable p. But when the if statement ends, the variable a goes out of scope! C++ then tells the operating system that the address 123 Ram Street is no longer in use, possibly resulting in that box getting demolished. When we print out the contents of the box at 123 Ram Street, it’s undefined behaviour — we don’t know who or what lives at 123 Ram Street anymore.
Problem 4.
Determine the output of the following code:
C-Style Arrays
The C-style array is just a glorified pointer. When declaring a C-style array, you need to specify the type, the name of the array, and in brackets the size of the array. When defining a C-style array, you can use a list of values in curly braces (similar to a vector!).
1int arr[5] = {1, 3, 5, 7, 9};
The above code creates an array called arr with five values, listed in curly braces. You can then access these using the square bracket notation; elements are indexed just like strings and vectors:
1cout << arr[0] << ' ' << arr[2] << endl;
This prints out 1 7.
Under the hood, C++ creates a small neighbourhood with five contiguous houses on the same street, say 120 Ram Street up through 124 Ram Street. These five addresses hold the numbers 1, 3, 5, 7, 9, respectively. arr remembers where the residential block begins: it is actually an integer pointer containing the value 120 Ram Street. The notation arr[0] is actually the same thing as *arr — go to 120 Ram Street and access that box. arr[2] says to “go two houses down the street from 120 Ram Street nd access that box”.
One can do “pointer arithmetic” in a similar vein of thought:
1int* p = arr + 4;
In this case, since arr is 120 Ram Street, adding four to the address gives us…124 Ram Street. So *p gets us the index 4 item in the array arr, and vice versa!
I strongly encourage you to write out street addresses when working through exam problems centred on pointers and C-style arrays! To demonstrate why this is a good idea, let’s tie things off with the problem
Problem 5.
Determine the output of the following code:
1#include <iostream>
2
3using namespace std;
4
5int main() {
6 char arr[8] =
7 {'A', 'g', 'Q', 'g', 's', 'c', 'G', 'f'};
8 char c = 'c';
9
10 // ?
11 char* p = arr + 4;
12 for(int i = 0; i < 8; i++) {
13 if(p != &c) {
14 p++;
15 }
16
17 if(p == arr + 8) {
18 p = &c;
19 }
20
21 *p += 2;
22 }
23
24 // prints out the contents of arr
25 for(int i = 0; i < 8; i++) {
26 cout << arr[i];
27 }
28 cout << endl;
29 cout << c << endl;
30
31 return 0;
32}
Problem 6.
Predict the output of the following code: