Hunter Liu's Website

4. Week 2 Thursday: Characters and Strings

≪ 3. Week 2 Tuesday: Trouble with Doubles; Increments and Decrements | Table of Contents | 5. Week 3 Tuesday: Handling Input with Strings and cin ≫

Our focus today is on some of the finer details of handling text, namely in manipulating character and string variables. You’ll inevitably have to do this at some point in your programming career, whether you’re collecting and analysing a lot of textual data (e.g., what are the 100 most common words in a book) or if you’re generating it yourself. Most of these tasks require control flow — if statements and loops — but at the very least, we can get familiar with the fundamentals and what’s possible.

Characters and Strings

Characters and the ASCII Table

A string variable represents a string of text, and that’s really just a bunch of char variables (i.e. characters) that have been grouped together. In order to learn how to manipulate strings, we ought to learn how to manipulate individual characters.

char variables, unlike string variables, are declared using single quotation marks, such as char c = 'C';. We can also declare char variables using numbers? Consider the following code:

1char c = 67; 
2int n = 'C'; 
3cout << c << " " << n << endl; 

What do you think the output will be? If you run this on your computer, you should get the output C 67. This illuminates two important facts:

  1. The character 'C' and the integer 67 are the same thing. More broadly, all characters are just numbers to the computer.
  2. C++ interprets the value 67 (or the character 'C') differently depending on what type of variable it’s stored in.

Naturally, we should ask how the computer knows which numbers correspond to which letter and vice-versa. About 3 billion years ago, the leading prokaryotic cells of the time congregated and designed the American Standard Code for Information Interchange, or ASCII. This is a table describing which characters correspond to which numbers, and all standard C++ programs adhere to this code. You may look up an ASCII table online through Google, and you will not need to memorise any portion of the ASCII table for this class.

As with all numbers, we may add, subtract, and compare characters to each other. This is useful for a variety of reasons:

  1. The digits 0 through 9 occupy a contiguous block on the ASCII table. To see if a char variable c is a digit, we may use the code c >= '0' && c <= '9', as opposed to c == '0' || ... || c == '9'. We can do a similar trick for checking if c is a lower case letter or an upper case letter.
  2. To convert a character from lower case to upper case, we can serendipitously notice that 'a' has a value of 97 while 'A' has a value of 65; they differ by exactly 32. The rest of the 25 letters obey the same rule! If c is a char variable that holds a lower case letter, then to convert it to upper case, one may perform c -= 32;. If the constant 32 is too hard to remember (I think it is), you can also do c = c - 'a' + 'A';. Try to squint at the table and see why this works!

Accessing Characters within Strings

A string is a variable that holds a bunch of characters in a sequence. In order to create and work with string variables, you need to include the string library at the start of your program (i.e., #include <string>). Everything related to strings is part of the std namespace. To declare a string, you can just write the string within double quotes:

1string s = "Johnald MacDonald"; 

The entries of the string are labeled with “indices”, starting from 0:

J  o  h  n  a  l  d  _  M  a  c  D  o  n  a  l  d
0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16

In order to access the character at the index i, we can use s.at(i):

1string s = "Johnald MacDonald"; 
2cout << s.at(0) << endl;    // prints J 
3cout << s.at(7) << endl;    // prints a space 
4cout << s.at(16) << endl;   // prints a d
5cout << s.at(30) << endl;   // crashes the program. 

This last line where we tried accessing an index past the end of a string is an example of a runtime error! Such errors are most commonly caused by mislabelling the indices of your string, e.g. if you start labelling with 1 instead of with 0.

Remark 1.

An alternative way to access a string’s contents is s[0] instead of s.at(0). However, this is “unsafe” — the .at function will intentionally crash the program whenever you enter an invalid index. In contrast, it’s possible that s[3991] ends up working for reasons we will explain in a few weeks, and it can result in a program changing parts of your computer it was never supposed to access. If you’re interested, learn about buffer overflow attacks.

Some Other String Operations

There are some other operations that are above the level of character-by-character manipulations, and here are the ways we would perform them:

There are some other operations one can perform with strings, but they’re far too numerous for me to list here. If you’re curious about how to search for a substring of a string, how to remove spaces from either end of a string, etc., you may check the C++ reference, which contains documentation of every single function available for use on strings! Some key functions to know are find, rfind, getline, push_back, and pop_back.

Note that although we call expressions like "text" string literals, they are not actually strings! Thus, the manipulations that work on string variables may not actually work on string literals, as demonstrated in the following example:

Problem 2.

Determine and classify the error in the following C++ program

 1#include <iostream> 
 2#include <string> 
 3
 4using namespace std; 
 5
 6int main() {
 7    cout << "Hello" + " " + "world!" << endl; 
 8
 9    string s1 = "Hello"; 
10    string s2 = "world!"; 
11    cout << s1 + " " + s2 << endl; 
12
13    return 0; 
14} 

Some Practise With Strings

Problem 3.

Predict the output of the following code. You may look at an ASCII table.

 1#include <iostream> 
 2#include <string> 
 3
 4using namespace std; 
 5
 6int main() {
 7    string s = "Hello world!"; 
 8    char c = s.at(6); 
 9
10    cout << ++c << endl; 
11    cout << s.at(6)++ << endl; 
12    cout << c << endl; 
13    cout << s << endl; 
14
15    return 0; 
16} 
Solution
The answer is x, w, x, and Hello xorld!, each on separate lines. The variable c is a distinct copy of the character at index 6 in the string s: changes to one are not reflected by the other.

Problem 4.

Predict the output of the following code. You may use an ASCII table if necessary.

 1#include <iostream> 
 2#include <string> 
 3
 4using namespace std; 
 5
 6int main() {
 7    string s1 = "Veni, vidi, vici."; 
 8    int n = 9; 
 9    ++s1.at(n++); 
10
11    string s2 = s1.substr(12); 
12    ++s1.at(--n); 
13
14    cout << s1 << endl; 
15    cout << s2 << endl; 
16
17    return 0; 
18} 

Note that this does not produce undefined behaviour, as the increment and decrement operators are applied to different objects on lines 9 and 12!

Problem 5.

Determine and classify the error in the following snippet of code.

 1#include <iostream> 
 2#include <string> 
 3
 4using namespace std; 
 5
 6int main() {
 7    string s = "Hello world!”; 
 8    cout << s << endl; 
 9    return 0; 
10} 
Hint
The colour highlighting on my own website is off on line 7…