9. Week 5 Tuesday: Practise with Loops and Strings
≪ 8. Week 4 Thursday: While Loops and For Loops | Table of Contents | 10. Week 5 Thursday: More Practise with Loops ≫Imagine the following scenario: you are a secretary at a very large public school that only recently got funding to digitise their physical paperwork. You would like to enter records of every student into a computer system, which would make sorting and searching for records much easier. Moreover, there are handwriting recognition systems that can automate the entry proces. How neat! However, you notice a few issues with this transition. Consider a student named John Old McDonald
, whose name presents the following problems:
- First, middle, and last names are hard to distinguish — the computer may recognise
John Old
as his entire first name rather than a separate first and middle name. Moreover, his name is sometimes entered asJohn Old McDonald
and other times asMcDonald, John Old
. - Capitalisation — on some forms, his name is entered as
JOHN OLD MCDONALD
, which the computer believes is different fromJohn Old McDonald
. - Misspellings — your less competent predecessors may have written his name as
John Old MacDonald
rather thanMcDonald
, and you’d like to rectify this issue.
In addition, future students may have formatting issues with their names, such as the apostrophe in Smith O'Niel
or accented characters in foreign names. These could wreak havoc on sorting and searching functionality in your computer system, and using your foresight, you would like to avoid these while from the getgo.
Between immunisation records, report cards, contact information, disciplinary records, and other forms, this is way too much for you to handle! Our goal for today is to get some practise applying loops and control flow to get a handle on text processing.
While the above example may seem somewhat outlandish and contrived, tasks of normalising capitalisation, splitting phrases into individual words, or blacking out sensitive information are all relatively common when handling swaths of textual data.
Let’s demonstrate how to approach string processing in general with an example.
Example 1.
Write a C++ program that allows the user to input their full name. Then, remove any characters that aren’t spaces or letters, and convert their name to all caps. Print out the fully capitalised name.
Sample Run:
INPUT John Smith
OUTPUT JOHN SMITH
INPUT John Old McDonald
OUTPUT JOHN OLD MCDONALD
INPUT John O'Niel McDonald, Sr.
OUTPUT JOHN ONIEL MCDONALD SR
As usual, we’ll begin by producing some pseudocode. The first approach we’ll demonstrate is one where we create a fresh empty string, then append alphabetic characters from the user’s input one at a time (capitalising if necessary) and skipping over the non-space non-alphabetic characters. More specifically,
- Take a string
full_name
as input, and letN
be the length of the input. - Create an empty string called
all_caps
. - For each index
i = 0, 1, ..., N-1
, perform the following:- Let
ch
be the character at indexi
in the user’s input. - If
ch
is a space or an upper-case letter, append it toall_caps
. - If
ch
is a lower-case letter, convert it to upper-case, then append it toall_caps
. - Otherwise, don’t do anything with
ch
and keep going.
- Let
- Print out
all_caps
to the screen.
Common Mistake 2. Indexing by 0 vs. Indexing by 1
Note that on step 3, we ensured that the indices started at 0 instead of 1, and that the indices ended at N-1
instead of N
. While it’s more natural in common language to describe the “first” letter rather than the “zeroth” letter, such off-by-one errors can cause logical and runtime errors.
Let’s now convert this to code. We’ll be reliant on ASCII math and writing loops to do so. First, to take a full name as an input (which may include spaces), we should use getline
rather than a direct cin
statement.
Next, we’ll create the empty string.
1string all_caps = "";
For the loop, we’ll elect to use a for loop: we know exactly how many times the loop should run (N
times). To append a character to a string, you can either use the push_back
function or the +=
operator.
1for(int i = 0; i < N; i++) {
2 // obtaining the character at index i
3 char ch = full_name.at(i);
4
5 // space or uppercase character: just add it to all_caps
6 if(ch == ' ' || (ch >= 'A' && ch <= 'Z')) {
7 all_caps += ch;
8 }
9
10 // lowercase character: convert to uppercase, then add.
11 if(ch >= 'a' && ch <= 'z') {
12 ch += 'A' - 'a';
13 all_caps += ch;
14 }
15
16 // remaining cases: just ignore them and don't do anything.
17}
The last step, of course, is to print out all_caps
to the screen. here’s the completed code:
Full Solution
1#include <iostream> // cin, cout
2#include <string> // getline, strings
3
4using namespace std;
5
6int main() {
7 // 1. read input, define N
8 string full_name;
9 getline(cin, full_name);
10 int N = full_name.length();
11
12 // 2. declare all_caps as empty string
13 string all_caps = "";
14
15 // 3. loop through the full_name string,
16 // one character at a time.
17 for(int i = 0; i < N; i++) {
18 // obtaining the character at index i
19 char ch = full_name.at(i);
20
21 // space or uppercase character: just add it to all_caps
22 if(ch == ' ' || (ch >= 'A' && ch <= 'Z')) {
23 all_caps += ch;
24 }
25
26 // lowercase character: convert to uppercase, then add.
27 if(ch >= 'a' && ch <= 'z') {
28 ch += 'A' - 'a';
29 all_caps += ch;
30 }
31
32 // remaining cases: just ignore them and don't do anything.
33 }
34
35 // 4. print out the result.
36 cout << all_caps << endl;
37
38 return 0;
39}
There is an alternative solution. It’s somewhat unnatural to define a new string and rebuild the user’s input one character at a time; I suspect more human pseudocode would be as follows:
- Take a string
full_name
as input and defineN
as its length. - For each index
i = 0, 1, ..., N - 1
, perform the following:- Define
ch
as the character at indexi
infull_name
. - If
ch
is a lowercase letter, convert it to uppercase. - If
ch
is neither a space nor a letter, remove it from the string.
- Define
- Print
full_name
to the screen.
To remove a character from a string, we can use the erase
function for strings. This pseudocode directly modifies the user’s input, deleting unwanted characters and capitalising lowercase letters as we go. Let’s code this solution up:
1// 1. read full name from user input
2string full_name;
3getline(cin, full_name);
4int N = full_name.length();
5
6// 2. for each index 0, 1, ...., N - 1
7for(int i = 0; i < N; i++) {
8 // obtain character at index i
9 char ch = full_name.at(i);
10
11 // if ch is lowercase, convert to uppercase
12 if(ch >= 'a' && ch <= 'z') {
13 // note: we should not modify ch, since it's a copy
14 // of the character in the string full_name.
15 full_name.at(i) += 'A' - 'a';
16 }
17
18 // if ch is NOT a space and NOT an uppercase letter,
19 // delete the character from the string
20 if(ch != ' ' && (ch < 'A' || ch > 'Z')) {
21 // erases one character, starting at index i.
22 full_name.erase(i, 1);
23 }
24
25 // in remaining cases, do nothing!
26}
27
28// 3. print out results.
29cout << full_name << endl;
Problem 3.
Identify the two runtime and/or logical errors in the above code.
The two problems in the code are as follows: first, the length of the string may decrease throughout the for loop. If we ever end up deleting a character, i = N - 1
would cease to be a valid index, resulting in a runtime error. The fix to this is to remove line 4, and replace the condition in the for loop with i < full_name.length()
. This tells the for loop to check the length of the string at every iteration, making it robust to changes in string length.
The second problem is more subtle, and it’s revealed when we input john o'neil
. The output becomes JOHN OnEIL
; the for loop somehow skips over the n
following the apostrophe. The reason is that when i = 6
, we delete the apostrophe and update the index to i = 7
. But when we delete the apostrophe, the n
moves to index i = 6
! This index shifting causes the for loop to skip over the n
and leave it as-is. The fix is to add i--
after line 22 to adjust the index according to this backwards shift. This way, none of the characters get skipped inadvertently.
Warning 4. Avoid Adding or Removing Characters from Strings During Loops
As the above example demonstrates, even though directly modifying strings is a very human process, it is surprisingly delicate and subtle in implementation when we are adding or removing characters. As mentioned, this changes which characters have which index, and it gives us programmers a larger cognitive load to deal with.
This is not to say that such operations are entirely forbidden; in fact, iterators are sometimes designed with exactly these indexing issues in mind.
Problem 5. Digits
Write a C++ program that allows a user to input any line of text. Then, sum the value of every digit present in the inputted text, ignoring negative signs, and finally print out this sum. Print a zero if no digits are present.
Sample run:
INPUT 1 2 3 4 5
OUTPUT 15
INPUT 1 23 ORUCGD4OEUCOH5
OUTPUT 15
INPUT drcgoekbocuhaoelucraoeutnamkbbjkhc
OUTPUT 0
Problem 6. Phone Numbers
A phone number, for the purposes of this problem, contains three components: a country code, a 3-digit area code, and a 7 or 8-digit phone number. The country code is sometimes preceeded by a ‘+’; the area code is sometimes surrounded by parentheses; the phone number itself is sometimes hyphenated. These may are may not be separated by spaces. The following are all examples of valid phone numbers:
+1 (800) 888-8888
86 372 2649371
1247736491
1(747)9914 416
Formatting aside, we consider two phone numbers to by the same if they consist of identical digits. So, +1 (800) 888-8888
and 18008888888
are the same phone number.
Write a C++ program that accepts two valid phone numbers as input (each on its own line). Determine if the two phone numbers are the same phone number or not.
Sample Run:
INPUT +1 (800) 888-8888
INPUT 18008888888
OUTPUT Same
INPUT +1(800) 888-88-88
INPUT 1(747)991 4416
OUTPUT Different