Hunter Liu's Website

13. Week 7 Tuesday: Vectors!

≪ 12. Week 6 Thursday: Function Scope and Passing by Reference | Table of Contents | 14. Week 7 Thursday: Midterm Practise ≫

Consider the following problem:

Example 1. Standard Deviation

Write a C++ program that accepts a positive integer \(n\) as input from the user. Then, allow the user to input \(n\) more decimal numbers and determine the standard deviation of the user’s inputs.

Sample Run: 
INPUT   3
INPUT   5.1 5.1 5.1 
OUTPUT  0 

INPUT   6
INPUT   1 2 3 4 5 6
OUTPUT  1.70783

About a week ago, we talked about finding the average of an arbitrary number of user-inputted numbers. We did this by setting up a running sum, then dividing by the number of inputs at the end. As such, we didn’t need to store any of the user’s inputs, as we did all of our computations “in parallel” to the user’s inputs.

In this problem, however, there is a mathematical barrier to doing so. To compute the standard devation, one needs to know the mean, and you can’t know the mean until the user has inputted all of the data! We therefore need a way to store the user’s input.

Vectors as Lists

The answer to this is the C++ vector library. A vector is a list of variables of a fixed type which can grow or shrink based on what you want to do to it. To declare a vector, you must first include the vector library, then use the following syntax:

1vector< [type] > my_vector; 

Replace the [type] with whatever variable type you want your vector to hold. For instance, vector<int> my_vector; will create a list of integer variables, vector<double> my_vector; will create a list of double variables, etc.

To give it some values to start with, we can use the following syntax:

1vector< [type] > my_vector = {value, ..., value}; 

This will create a list of variables whose values are provided. For instance, we may write

1vector<int> faces = {1, 2, 3, 4, 5, 6}; 

as a list of integer variables holding the 6 different faces of a die. Another example would be

1vector<string> names = {"John", "Bob", "Gill"}; 

In the same way that strings can “index” the characters within them, so too can vectors “index” the elements they contain. In other words, vectors keep track of which position each element is on the list, and we can use this to access specific elements of the list if we need to. In this last example, the vector names will contain

VALUE   "John"  "Bob"   "Gill"
INDEX   0       1       2

To access an element at a specific index, it’s the same syntax as with a string. names.at(1) has a value of “Bob”, since “Bob” is the string at the index 1.

We can add and remove values from our list using the push_back and pop_back functions. push_back adds a value to the end of the list while pop_back removes the value at the very end of the list. Finally, we can check how many elements are in the list by using the size function. For instance:

1vector<string> names = {"Alice", "Bob", "Clare"}; 
2                            // Vector contents
3                            // "Alice"  "Bob"   "Clare"
4names.pop_back();           // "Alice"  "Bob"
5names.push_back("Candice"); // "Alice"  "Bob"   "Candice"
6names.push_back("Dylan");   // "Alice"  "Bob"   "Candice"   "Dylan"
7
8// there are 4 names in the vector of names.
9cout << names.size() << endl; 

Common Mistakes 2.

As with strings, vectors are indexed by 0: the first element has index 0, the second has index 1, etc. In addition, one must be careful of using pop_back on an empty vector! Doing so will cause a runtime error, and your program will crash.

Finishing the Example

Let’s apply this tool to the example we mentioned at the beginning. We should write some pseudocode to describe, in moderate detail, how to compute the standard deviation.

  1. Ask the user for a positive integer input n.
  2. Create a vector of doubles called inputs.
  3. Repeat n times: ask the user for a number, then append it to inputs.
  4. Compute the mean as follows:
    • Initialise a running sum to 0
    • For each double i in inputs, add i to the running sum.
    • The mean is the running sum divided by n.
  5. Now compute the standard deviation as follows:
    • Initialise a running sum to 0
    • For each double i in inputs, add (i - mean) * (i - mean) to the running sum.
    • Divide the running sum by n, then take the squareroot to get the standard deviation.
  6. Print the standard deviation to the screen.

Wow, that’s a lot of steps! This is one of the longer pieces of pseudocode we’ve written this quarter; directly writing this in C++ may be daunting and challenging. Having have written out these steps, however, we can simply translate them into code, step by step:

 1#include <cmath>        // needed for sqrt 
 2#include <iostream>     // needed for cin, cout, endl 
 3#include <vector>       // needed for vector 
 4
 5using namespace std; 
 6
 7int main() {
 8    // 1. take n as user input 
 9    int n; 
10    cin >> n; 
11
12    // 2. create a vector of user inputs 
13    vector<double> inputs; 
14
15    // 3. obtain n inputs from the user
16    for(int i = 0; i < n; i++) {
17        double input; 
18        cin >> input; 
19        inputs.push_back(input); 
20    }
21
22    // 4. compute the mean
23    double running_sum = 0; 
24    for(int i = 0; i < n; i++) {
25        running_sum += inputs.at(i); 
26    }
27    double mean = running_sum / n; 
28
29    // 5. compute the standard deviation. We'll reset the
30    // running sum to 0 and reuse the variable 
31    running_sum = 0; 
32    for(int i = 0; i < n; i++) {
33        running_sum += pow(inputs.at(i) - mean, 2); 
34    }
35    double std_dev = sqrt(running_sum / n); 
36
37    // 6. print out the result 
38    cout << std_dev << endl; 
39
40    return 0; 
41}

The key idea here is that we used a vector to “remember” all the numbers the user inputted. Note that we only need one vector variable as opposed to declaring a new variable for each user input! We then looked back at the user inputs, which we can access after the user already inputted, and computed our statistics.

As a follow-up, we should really be able to compute the mean while the user is still inputting the numbers. Can you modify the code above so that the for loops on lines 16 and 24 are merged into one?

Remark 3. The Importance of Parallelisation

Parallelising code is sometimes unintuitive, and it can make code more confusing and hard to follow. However, it serves two main purposes:

  1. Time Optimisation — Passing over a list of data points several times to perform separate tasks is usually slower than passing over it just once and doing everything in parallel. Part of this is due to a phenomenon called cache locality.
  2. Space Optimisation — Storing all of your user’s inputs takes space, and your computer’s RAM is limited. Being able to process data as you read it instead of processing it after you’ve copied it all into memory can save an immense amount of memory.

These are completely unimportant for PIC 10A, but they’re of real practical significance. If and when you analyse datasets containing thousands or millions of data points, these two types of optimisation will make a significant difference. Sometimes, your dataset will be too large to fit in memory altogether, and this is not uncommon when performing image or video analysis.

Here are some problems that you can use vectors to solve. Write a solution that does use vectors. Some of these problems can be solved without using vectors at all, usually by some kind of “parallelisation” like what we described above. Determine wich problems can be solved without vectors, then do so.

Problem 4. Seconds

Write a C++ program that takes a positive integer \(n\) as input from the user, followed by \(n\) additional numbers as input. Then, determine the second-largest number the user entered, ignoring repeats. If the user enters only one number, just print out that number.

Sample Run: 
INPUT   5
INPUT   1 2 3 4 5 
OUTPUT  4 

INPUT   5
INPUT   4 4 4 5 5 
OUTPUT  4 

INPUT   1
INPUT   0
OUTPUT  0

Problem 5. Gone Fishin'

You’re the chief of fishing at a local park, and you’re in charge of keeping the fish pond stocked with live fish, of course. At the end of each year, you remove all of the previous year’s fish and reintroduce a set of new, younger, healthier fish.

The fish grow at a rate of 0.5% daily. A fish that’s 100 pounds the first day will be 100.5 pounds the next; this compounds. Of course, the fish you get weigh different amounts, and you need to have a rough idea of the average weight for the fishing competition in March.

Write a C++ program that allows the user to input an integer \(n\), followed by \(n\) inputs that represent different fishes’ weights (in pounds). Determine the average weight of the fish three months (92 days) later.

Sample Run 
INPUT   1
INPUT   100
OUTPUT  158.22594

INPUT   3
INPUT   50 55 42
OUTPUT  77.53071

Problem 6. Histogram

Write a C++ program that accepts a string of digits from the user as input (i.e., a string containing only the characters 0123456789). Then, create a histogram that displays how many times each digit was used. For instance, if the user’s input was 11347777889, the histogram may look like

       *
       *
 *     **
 * **  ***
----------
0123456789
Hint
This is a challenging problem! Try to construct a vector of integers that represents the height of each “bar” in the histogram first, then determine how you would print out the histogram.