Hunter Liu's Website

9. Week 5 Tuesday: Practise with Loops and Strings

≪ 8. Week 4 Thursday: While Loops and For Loops | Table of Contents | 10. Week 5 Thursday: More Practise with Loops ≫

Imagine the following scenario: you are a secretary at a very large public school that only recently got funding to digitise their physical paperwork. You would like to enter records of every student into a computer system, which would make sorting and searching for records much easier. Moreover, there are handwriting recognition systems that can automate the entry proces. How neat! However, you notice a few issues with this transition. Consider a student named John Old McDonald, whose name presents the following problems:

In addition, future students may have formatting issues with their names, such as the apostrophe in Smith O'Niel or accented characters in foreign names. These could wreak havoc on sorting and searching functionality in your computer system, and using your foresight, you would like to avoid these while from the getgo.

Between immunisation records, report cards, contact information, disciplinary records, and other forms, this is way too much for you to handle! Our goal for today is to get some practise applying loops and control flow to get a handle on text processing.

While the above example may seem somewhat outlandish and contrived, tasks of normalising capitalisation, splitting phrases into individual words, or blacking out sensitive information are all relatively common when handling swaths of textual data.

Let’s demonstrate how to approach string processing in general with an example.

Example 1.

Write a C++ program that allows the user to input their full name. Then, remove any characters that aren’t spaces or letters, and convert their name to all caps. Print out the fully capitalised name.

Sample Run: 
INPUT   John Smith
OUTPUT  JOHN SMITH

INPUT   John Old McDonald
OUTPUT  JOHN OLD MCDONALD

INPUT   John O'Niel McDonald, Sr. 
OUTPUT  JOHN ONIEL MCDONALD SR

As usual, we’ll begin by producing some pseudocode. The first approach we’ll demonstrate is one where we create a fresh empty string, then append alphabetic characters from the user’s input one at a time (capitalising if necessary) and skipping over the non-space non-alphabetic characters. More specifically,

  1. Take a string full_name as input, and let N be the length of the input.
  2. Create an empty string called all_caps.
  3. For each index i = 0, 1, ..., N-1, perform the following:
    • Let ch be the character at index i in the user’s input.
    • If ch is a space or an upper-case letter, append it to all_caps.
    • If ch is a lower-case letter, convert it to upper-case, then append it to all_caps.
    • Otherwise, don’t do anything with ch and keep going.
  4. Print out all_caps to the screen.

Common Mistake 2. Indexing by 0 vs. Indexing by 1

Note that on step 3, we ensured that the indices started at 0 instead of 1, and that the indices ended at N-1 instead of N. While it’s more natural in common language to describe the “first” letter rather than the “zeroth” letter, such off-by-one errors can cause logical and runtime errors.

Let’s now convert this to code. We’ll be reliant on ASCII math and writing loops to do so. First, to take a full name as an input (which may include spaces), we should use getline rather than a direct cin statement.

1string full_name; 
2getline(cin, full_name); 
3int N = full_name.length(); 

Next, we’ll create the empty string.

1string all_caps = ""; 

For the loop, we’ll elect to use a for loop: we know exactly how many times the loop should run (N times). To append a character to a string, you can either use the push_back function or the += operator.

 1for(int i = 0; i < N; i++) {
 2    // obtaining the character at index i 
 3    char ch = full_name.at(i); 
 4
 5    // space or uppercase character: just add it to all_caps
 6    if(ch == ' ' || (ch >= 'A' && ch <= 'Z')) {
 7        all_caps += ch; 
 8    } 
 9
10    // lowercase character: convert to uppercase, then add. 
11    if(ch >= 'a' && ch <= 'z') {
12        ch += 'A' - 'a'; 
13        all_caps += ch; 
14    } 
15
16    // remaining cases: just ignore them and don't do anything. 
17}

The last step, of course, is to print out all_caps to the screen. here’s the completed code:

Full Solution
 1#include <iostream>     // cin, cout 
 2#include <string>       // getline, strings 
 3
 4using namespace std; 
 5
 6int main() {
 7    // 1. read input, define N 
 8    string full_name; 
 9    getline(cin, full_name); 
10    int N = full_name.length(); 
11
12    // 2. declare all_caps as empty string 
13    string all_caps = ""; 
14
15    // 3. loop through the full_name string, 
16    //    one character at a time. 
17    for(int i = 0; i < N; i++) {
18        // obtaining the character at index i 
19        char ch = full_name.at(i); 
20
21        // space or uppercase character: just add it to all_caps
22        if(ch == ' ' || (ch >= 'A' && ch <= 'Z')) {
23            all_caps += ch; 
24        } 
25
26        // lowercase character: convert to uppercase, then add. 
27        if(ch >= 'a' && ch <= 'z') {
28            ch += 'A' - 'a'; 
29            all_caps += ch; 
30        } 
31
32        // remaining cases: just ignore them and don't do anything. 
33    }
34
35    // 4. print out the result. 
36    cout << all_caps << endl; 
37
38    return 0; 
39} 

There is an alternative solution. It’s somewhat unnatural to define a new string and rebuild the user’s input one character at a time; I suspect more human pseudocode would be as follows:

  1. Take a string full_name as input and define N as its length.
  2. For each index i = 0, 1, ..., N - 1, perform the following:
    • Define ch as the character at index i in full_name.
    • If ch is a lowercase letter, convert it to uppercase.
    • If ch is neither a space nor a letter, remove it from the string.
  3. Print full_name to the screen.

To remove a character from a string, we can use the erase function for strings. This pseudocode directly modifies the user’s input, deleting unwanted characters and capitalising lowercase letters as we go. Let’s code this solution up:

 1// 1. read full name from user input 
 2string full_name; 
 3getline(cin, full_name); 
 4int N = full_name.length(); 
 5
 6// 2. for each index 0, 1, ...., N - 1
 7for(int i = 0; i < N; i++) {
 8    // obtain character at index i 
 9    char ch = full_name.at(i); 
10
11    // if ch is lowercase, convert to uppercase 
12    if(ch >= 'a' && ch <= 'z') {
13        // note: we should not modify ch, since it's a copy
14        // of the character in the string full_name. 
15        full_name.at(i) += 'A' - 'a'; 
16    } 
17
18    // if ch is NOT a space and NOT an uppercase letter,
19    // delete the character from the string 
20    if(ch != ' ' && (ch < 'A' || ch > 'Z')) {
21        // erases one character, starting at index i. 
22        full_name.erase(i, 1); 
23    }
24
25    // in remaining cases, do nothing! 
26} 
27
28// 3. print out results. 
29cout << full_name << endl; 

Problem 3.

Identify the two runtime and/or logical errors in the above code.

The two problems in the code are as follows: first, the length of the string may decrease throughout the for loop. If we ever end up deleting a character, i = N - 1 would cease to be a valid index, resulting in a runtime error. The fix to this is to remove line 4, and replace the condition in the for loop with i < full_name.length(). This tells the for loop to check the length of the string at every iteration, making it robust to changes in string length.

The second problem is more subtle, and it’s revealed when we input john o'neil. The output becomes JOHN OnEIL; the for loop somehow skips over the n following the apostrophe. The reason is that when i = 6, we delete the apostrophe and update the index to i = 7. But when we delete the apostrophe, the n moves to index i = 6! This index shifting causes the for loop to skip over the n and leave it as-is. The fix is to add i-- after line 22 to adjust the index according to this backwards shift. This way, none of the characters get skipped inadvertently.

Warning 4. Avoid Adding or Removing Characters from Strings During Loops

As the above example demonstrates, even though directly modifying strings is a very human process, it is surprisingly delicate and subtle in implementation when we are adding or removing characters. As mentioned, this changes which characters have which index, and it gives us programmers a larger cognitive load to deal with.

This is not to say that such operations are entirely forbidden; in fact, iterators are sometimes designed with exactly these indexing issues in mind.

Problem 5. Digits

Write a C++ program that allows a user to input any line of text. Then, sum the value of every digit present in the inputted text, ignoring negative signs, and finally print out this sum. Print a zero if no digits are present.

Sample run: 
INPUT   1 2 3 4 5
OUTPUT  15

INPUT   1 23 ORUCGD4OEUCOH5
OUTPUT  15

INPUT   drcgoekbocuhaoelucraoeutnamkbbjkhc
OUTPUT  0

Problem 6. Phone Numbers

A phone number, for the purposes of this problem, contains three components: a country code, a 3-digit area code, and a 7 or 8-digit phone number. The country code is sometimes preceeded by a ‘+’; the area code is sometimes surrounded by parentheses; the phone number itself is sometimes hyphenated. These may are may not be separated by spaces. The following are all examples of valid phone numbers:

+1 (800) 888-8888 
86 372 2649371 
1247736491
1(747)9914   416

Formatting aside, we consider two phone numbers to by the same if they consist of identical digits. So, +1 (800) 888-8888 and 18008888888 are the same phone number.

Write a C++ program that accepts two valid phone numbers as input (each on its own line). Determine if the two phone numbers are the same phone number or not.

Sample Run: 
INPUT   +1 (800) 888-8888
INPUT   18008888888
OUTPUT  Same 

INPUT   +1(800)  888-88-88
INPUT   1(747)991 4416
OUTPUT  Different