Hunter Liu's Website

12. Week 6 Thursday: Vectors

≪ 11. Week 6 Tuesday: Passing by Reference | Table of Contents | 13. Week 7 Tuesday: Function Overloading and Default Arguments ≫

Consider the following problem:

Example 1. Standard Deviation

Write a C++ program that accepts a positive integer \(n\) as input from the user. Then, allow the user to input \(n\) more decimal numbers and determine the standard deviation of the user’s inputs.

Sample Run: 
INPUT   3
INPUT   5.1 5.1 5.1 
OUTPUT  0 

INPUT   6
INPUT   1 2 3 4 5 6
OUTPUT  1.70783

When determining just the average of a list of numbers, there’s a way to do this “in parallel” to reading the user’s input. Set up a running sum, then add the inputs to the running sum as you recieve them; there’s no need to “remember” all the inputs.

In this problem, however, there is a mathematical barrier. To compute the standard devation, one needs to know the mean, and you can’t know the mean until the user has inputted all of the data! We therefore need a way to store the user’s input.

This is what vectors are for: the represent “lists” of variables that can be added to or deleted from.

We should first write some pseudocode to describe, in moderate detail, how to compute the standard deviation.

  1. Ask the user for a positive integer input n.
  2. Create a vector of doubles called inputs.
  3. Repeat n times: ask the user for a number, then append it to inputs.
  4. Compute the mean as follows:
    • Initialise a running sum to 0
    • For each double i in inputs, add i to the running sum.
    • The mean is the running sum divided by n.
  5. Now compute the standard deviation as follows:
    • Initialise a running sum to 0
    • For each double i in inputs, add (i - mean) * (i - mean) to the running sum.
    • Divide the running sum by n, then take the squareroot to get the standard deviation.
  6. Print the standard deviation to the screen.

Wow, that’s a lot of steps! This is one of the longer pieces of pseudocode we’ve written this quarter; directly writing this in C++ may be daunting and challenging. Having have written out these steps, however, we can simply translate them into code, step by step:

 1#include <cmath>        // needed for sqrt 
 2#include <iostream>     // needed for cin, cout, endl 
 3#include <vector>       // needed for vector 
 4
 5using namespace std; 
 6
 7int main() {
 8    // 1. take n as user input 
 9    int n; 
10    cin >> n; 
11
12    // 2. create a vector of user inputs 
13    vector<double> inputs; 
14
15    // 3. obtain n inputs from the user
16    for(int i = 0; i < n; i++) {
17        double input; 
18        cin >> input; 
19        inputs.push_back(input); 
20    }
21
22    // 4. compute the mean
23    double running_sum = 0; 
24    for(int i = 0; i < n; i++) {
25        running_sum += inputs.at(i); 
26    }
27    double mean = running_sum / n; 
28
29    // 5. compute the standard deviation. We'll reset the
30    // running sum to 0 and reuse the variable 
31    running_sum = 0; 
32    for(int i = 0; i < n; i++) {
33        running_sum += pow(inputs.at(i) - mean, 2); 
34    }
35    double std_dev = sqrt(running_sum / n); 
36
37    // 6. print out the result 
38    cout << std_dev << endl; 
39
40    return 0; 
41}

The key idea here is that we used a vector to “remember” all the numbers the user inputted. Note that we only need one vector variable as opposed to declaring a new variable for each user input! We then looked back at the user inputs, which we can access after the user already inputted, and computed our statistics.

Remark 2. The Importance of Parallelisation

Parallelising code is sometimes unintuitive, and it can make code more confusing and hard to follow. However, it serves two main purposes:

  1. Time Optimisation — Passing over a list of data points several times to perform separate tasks is usually slower than passing over it just once and doing everything in parallel. Part of this is due to a phenomenon called cache locality.
  2. Space Optimisation — Storing all of your user’s inputs takes space, and your computer’s RAM is limited. Being able to process data as you read it instead of processing it after you’ve copied it all into memory can save an immense amount of memory.

These are completely unimportant for PIC 10A, but they’re of real practical significance. If and when you analyse datasets containing thousands or millions of data points, these two types of optimisation will make a significant difference. Sometimes, your dataset will be too large to fit in memory altogether, and this is not uncommon when performing image or video analysis.

Vectors, References, References to Vectors…

I have been using the word “remember” rather frequently when talking about vectors, and there are some very subtle interactions vectors have with memory that are important to think about.

Vectors occupy a contiguous chunk of memory in your computer. Consider the following snippet of code:

1vector<int> v = {1, 4, 7, 9}; 

The four integers in the vector are stored in neat boxes side-by-side in your computer, in order. Each of these boxes are treated as individual int variables (in this example), hence can be passed by value or passed by reference into functions. Consider the following snippet of code, for instance:

 1// squares a variable. 
 2void square(int& n) {
 3    n *= n; 
 4} 
 5
 6int main() {
 7    vector<int> v = {1, 3, 5}; 
 8    square(v.at(2)); 
 9    cout << v.at(2) << endl; 
10    return 0; 
11} 

This outputs 25: the changes made to n are reflected by v.at(2)!

Remark 3.

A string variable behaves exactly like a vector<char> variable.

As you add more and more elements to your vector, this block of memory it occupies gets longer and longer. However, your operating system will use pieces of memory from all over its RAM, and if you keep adding data to your vector ad infinitum, you’ll probably eventually run into a piece of memory that’s in use.

As long as you have enough memory on your computer, this does not cause any runtime errors. If you run out of memory, your computer will probably crash. This is classified as a “budget issue” in the professional world rather than a “runtime error”. This shifts the blame away from engineering teams.

The dudes at C++ thought of this long before we did: whenever a vector grows too long to fit in a contiguous block, C++ will automatically find a bigger, more open piece of memory and move the whole vector there, where it can grow and flourish with the other fauna and flora.

However, this very behaviour can cause unpredictable behaviour when mixed with references. Consider, for instance, the following code:

 1void foo(vector<int>& vec, int& n) {
 2    for(int i = 0; i < 1000000; i++) {
 3        vec.push_back(i); 
 4    }
 5
 6    cout << n << endl; 
 7} 
 8
 9int main() {
10    vector<int> v = { 1 }; 
11    foo(v, v.at(0)); 
12
13    return 0; 
14} 

The foo function adds a million numbers to the provided vector, then prints out the provided integer. On my computer, this usually prints out something that’s not 1.

This is because of this movement of memory phenomenon described earlier: the vector has moved, but the reference to its first element, n, has not. How the heck was n supposed to know that another variable has moved? In fact, every time I run this program, I get a different number. This is an example of undefined behaviour!

Note that your computer may still print out 1 — operating systems do not always “clean” reclaimed memory, and you may also just have enough memory to accommodate a million integers. Be very cautious when you’re using a reference to a vector’s contents.

Here’s a similar example:

 1void bar(vector<int>& vec, int& n) {
 2    vec.clear(); // empties the contents of vec. 
 3    cout << n << endl; 
 4} 
 5
 6int main() {
 7    vector<int> v = {1, 5, 8, 9}; 
 8    bar(v, v.at(3)); 
 9    return 0; 
10} 

Does this behave differently from the prior example? What does this suggest about removing elements from a vector?

Practise Problems

Here are some additional problems that you can use vectors to solve. Write a solution that does use vectors. Some of these problems can be solved without using vectors at all, usually by some kind of “parallelisation” like what we described above. Determine wich problems can be solved without vectors, then do so.

Problem 4. Seconds

Write a C++ program that takes a positive integer \(n\) as input from the user, followed by \(n\) additional numbers as input. Then, determine the second-largest number the user entered, ignoring repeats. If the user enters only one number, just print out that number.

Sample Run: 
INPUT   5
INPUT   1 2 3 4 5 
OUTPUT  4 

INPUT   5
INPUT   4 4 4 5 5 
OUTPUT  4 

INPUT   1
INPUT   0
OUTPUT  0

Problem 5. Gone Fishin'

You’re the chief of fishing at a local park, and you’re in charge of keeping the fish pond stocked with live fish, of course. At the end of each year, you remove all of the previous year’s fish and reintroduce a set of new, younger, healthier fish.

The fish grow at a rate of 0.5% daily. A fish that’s 100 pounds the first day will be 100.5 pounds the next; this compounds. Of course, the fish you get weigh different amounts, and you need to have a rough idea of the average weight for the fishing competition in March.

Write a C++ program that allows the user to input an integer \(n\), followed by \(n\) inputs that represent different fishes’ weights (in pounds). Determine the average weight of the fish three months (92 days) later.

Sample Run 
INPUT   1
INPUT   100
OUTPUT  158.22594

INPUT   3
INPUT   50 55 42
OUTPUT  77.53071

Problem 6. Histogram

Write a C++ program that accepts a string of digits from the user as input (i.e., a string containing only the characters 0123456789). Then, create a histogram that displays how many times each digit was used. For instance, if the user’s input was 11347777889, the histogram may look like

       *
       *
 *     **
 * **  ***
----------
0123456789
Hint
This is a challenging problem! Try to construct a vector of integers that represents the height of each “bar” in the histogram first, then determine how you would print out the histogram.