13. Week 7 Tuesday: Vectors!
≪ 12. Week 6 Thursday: Function Scope and Passing by Reference | Table of Contents | 14. Week 7 Thursday: Midterm Practise ≫Consider the following problem:
Example 1. Standard Deviation
Write a C++ program that accepts a positive integer \(n\) as input from the user. Then, allow the user to input \(n\) more decimal numbers and determine the standard deviation of the user’s inputs.
Sample Run:
INPUT 3
INPUT 5.1 5.1 5.1
OUTPUT 0
INPUT 6
INPUT 1 2 3 4 5 6
OUTPUT 1.70783
About a week ago, we talked about finding the average of an arbitrary number of user-inputted numbers. We did this by setting up a running sum, then dividing by the number of inputs at the end. As such, we didn’t need to store any of the user’s inputs, as we did all of our computations “in parallel” to the user’s inputs.
In this problem, however, there is a mathematical barrier to doing so. To compute the standard devation, one needs to know the mean, and you can’t know the mean until the user has inputted all of the data! We therefore need a way to store the user’s input.
Vectors as Lists
The answer to this is the C++ vector library. A vector is a list of variables of a fixed type which can grow or shrink based on what you want to do to it. To declare a vector, you must first include the vector library, then use the following syntax:
1vector< [type] > my_vector;
Replace the [type]
with whatever variable type you want your vector to hold. For instance, vector<int> my_vector;
will create a list of integer variables, vector<double> my_vector;
will create a list of double variables, etc.
To give it some values to start with, we can use the following syntax:
1vector< [type] > my_vector = {value, ..., value};
This will create a list of variables whose values are provided. For instance, we may write
1vector<int> faces = {1, 2, 3, 4, 5, 6};
as a list of integer variables holding the 6 different faces of a die. Another example would be
1vector<string> names = {"John", "Bob", "Gill"};
In the same way that strings can “index” the characters within them, so too can vectors “index” the elements they contain. In other words, vectors keep track of which position each element is on the list, and we can use this to access specific elements of the list if we need to. In this last example, the vector names will contain
VALUE "John" "Bob" "Gill"
INDEX 0 1 2
To access an element at a specific index, it’s the same syntax as with a string. names.at(1)
has a value of “Bob”, since “Bob” is the string at the index 1.
We can add and remove values from our list using the push_back
and pop_back
functions. push_back
adds a value to the end of the list while pop_back
removes the value at the very end of the list. Finally, we can check how many elements are in the list by using the size
function. For instance:
1vector<string> names = {"Alice", "Bob", "Clare"};
2 // Vector contents
3 // "Alice" "Bob" "Clare"
4names.pop_back(); // "Alice" "Bob"
5names.push_back("Candice"); // "Alice" "Bob" "Candice"
6names.push_back("Dylan"); // "Alice" "Bob" "Candice" "Dylan"
7
8// there are 4 names in the vector of names.
9cout << names.size() << endl;
Common Mistakes 2.
As with strings, vectors are indexed by 0: the first element has index 0, the second has index 1, etc. In addition, one must be careful of using pop_back
on an empty vector! Doing so will cause a runtime error, and your program will crash.
Finishing the Example
Let’s apply this tool to the example we mentioned at the beginning. We should write some pseudocode to describe, in moderate detail, how to compute the standard deviation.
- Ask the user for a positive integer input
n
. - Create a vector of doubles called
inputs
. - Repeat
n
times: ask the user for a number, then append it toinputs
. - Compute the mean as follows:
- Initialise a running sum to 0
- For each double
i
ininputs
, addi
to the running sum. - The mean is the running sum divided by
n
.
- Now compute the standard deviation as follows:
- Initialise a running sum to 0
- For each double
i
ininputs
, add(i - mean) * (i - mean)
to the running sum. - Divide the running sum by
n
, then take the squareroot to get the standard deviation.
- Print the standard deviation to the screen.
Wow, that’s a lot of steps! This is one of the longer pieces of pseudocode we’ve written this quarter; directly writing this in C++ may be daunting and challenging. Having have written out these steps, however, we can simply translate them into code, step by step:
1#include <cmath> // needed for sqrt
2#include <iostream> // needed for cin, cout, endl
3#include <vector> // needed for vector
4
5using namespace std;
6
7int main() {
8 // 1. take n as user input
9 int n;
10 cin >> n;
11
12 // 2. create a vector of user inputs
13 vector<double> inputs;
14
15 // 3. obtain n inputs from the user
16 for(int i = 0; i < n; i++) {
17 double input;
18 cin >> input;
19 inputs.push_back(input);
20 }
21
22 // 4. compute the mean
23 double running_sum = 0;
24 for(int i = 0; i < n; i++) {
25 running_sum += inputs.at(i);
26 }
27 double mean = running_sum / n;
28
29 // 5. compute the standard deviation. We'll reset the
30 // running sum to 0 and reuse the variable
31 running_sum = 0;
32 for(int i = 0; i < n; i++) {
33 running_sum += pow(inputs.at(i) - mean, 2);
34 }
35 double std_dev = sqrt(running_sum / n);
36
37 // 6. print out the result
38 cout << std_dev << endl;
39
40 return 0;
41}
The key idea here is that we used a vector to “remember” all the numbers the user inputted. Note that we only need one vector variable as opposed to declaring a new variable for each user input! We then looked back at the user inputs, which we can access after the user already inputted, and computed our statistics.
As a follow-up, we should really be able to compute the mean while the user is still inputting the numbers. Can you modify the code above so that the for loops on lines 16 and 24 are merged into one?
Remark 3. The Importance of Parallelisation
Parallelising code is sometimes unintuitive, and it can make code more confusing and hard to follow. However, it serves two main purposes:
- Time Optimisation — Passing over a list of data points several times to perform separate tasks is usually slower than passing over it just once and doing everything in parallel. Part of this is due to a phenomenon called cache locality.
- Space Optimisation — Storing all of your user’s inputs takes space, and your computer’s RAM is limited. Being able to process data as you read it instead of processing it after you’ve copied it all into memory can save an immense amount of memory.
These are completely unimportant for PIC 10A, but they’re of real practical significance. If and when you analyse datasets containing thousands or millions of data points, these two types of optimisation will make a significant difference. Sometimes, your dataset will be too large to fit in memory altogether, and this is not uncommon when performing image or video analysis.
Here are some problems that you can use vectors to solve. Write a solution that does use vectors. Some of these problems can be solved without using vectors at all, usually by some kind of “parallelisation” like what we described above. Determine wich problems can be solved without vectors, then do so.
Problem 4. Seconds
Write a C++ program that takes a positive integer \(n\) as input from the user, followed by \(n\) additional numbers as input. Then, determine the second-largest number the user entered, ignoring repeats. If the user enters only one number, just print out that number.
Sample Run:
INPUT 5
INPUT 1 2 3 4 5
OUTPUT 4
INPUT 5
INPUT 4 4 4 5 5
OUTPUT 4
INPUT 1
INPUT 0
OUTPUT 0
Problem 5. Gone Fishin'
You’re the chief of fishing at a local park, and you’re in charge of keeping the fish pond stocked with live fish, of course. At the end of each year, you remove all of the previous year’s fish and reintroduce a set of new, younger, healthier fish.
The fish grow at a rate of 0.5% daily. A fish that’s 100 pounds the first day will be 100.5 pounds the next; this compounds. Of course, the fish you get weigh different amounts, and you need to have a rough idea of the average weight for the fishing competition in March.
Write a C++ program that allows the user to input an integer \(n\), followed by \(n\) inputs that represent different fishes’ weights (in pounds). Determine the average weight of the fish three months (92 days) later.
Sample Run
INPUT 1
INPUT 100
OUTPUT 158.22594
INPUT 3
INPUT 50 55 42
OUTPUT 77.53071
Problem 6. Histogram
Write a C++ program that accepts a string of digits from the user as input (i.e., a string containing only the characters 0123456789
). Then, create a histogram that displays how many times each digit was used. For instance, if the user’s input was 11347777889
, the histogram may look like
*
*
* **
* ** ***
----------
0123456789