4. Week 2 Thursday: Characters and Strings
≪ 3. Week 2 Tuesday: Trouble with Doubles; Increments and Decrements | Table of Contents | 5. Week 3 Tuesday: Handling Input with Strings and cin ≫Our focus today is on some of the finer details of handling text, namely in manipulating character and string variables. You’ll inevitably have to do this at some point in your programming career, whether you’re collecting and analysing a lot of textual data (e.g., what are the 100 most common words in a book) or if you’re generating it yourself. Most of these tasks require control flow — if statements and loops — but at the very least, we can get familiar with the fundamentals and what’s possible.
Characters and Strings
Characters and the ASCII Table
A string
variable represents a string of text, and that’s really just a bunch of char
variables (i.e. characters) that have been grouped together. In order to learn how to manipulate strings, we ought to learn how to manipulate individual characters.
char
variables, unlike string
variables, are declared using single quotation marks, such as char c = 'C';
. We can also declare char
variables using numbers? Consider the following code:
What do you think the output will be? If you run this on your computer, you should get the output C 67
. This illuminates two important facts:
- The character
'C'
and the integer67
are the same thing. More broadly, all characters are just numbers to the computer. - C++ interprets the value
67
(or the character'C'
) differently depending on what type of variable it’s stored in.
Naturally, we should ask how the computer knows which numbers correspond to which letter and vice-versa. About 3 billion years ago, the leading prokaryotic cells of the time congregated and designed the American Standard Code for Information Interchange, or ASCII. This is a table describing which characters correspond to which numbers, and all standard C++ programs adhere to this code. You may look up an ASCII table online through Google, and you will not need to memorise any portion of the ASCII table for this class.
As with all numbers, we may add, subtract, and compare characters to each other. This is useful for a variety of reasons:
- The digits
0
through9
occupy a contiguous block on the ASCII table. To see if achar
variablec
is a digit, we may use the codec >= '0' && c <= '9'
, as opposed toc == '0' || ... || c == '9'
. We can do a similar trick for checking ifc
is a lower case letter or an upper case letter. - To convert a character from lower case to upper case, we can serendipitously notice that
'a'
has a value of97
while'A'
has a value of65
; they differ by exactly32
. The rest of the 25 letters obey the same rule! Ifc
is achar
variable that holds a lower case letter, then to convert it to upper case, one may performc -= 32;
. If the constant32
is too hard to remember (I think it is), you can also doc = c - 'a' + 'A';
. Try to squint at the table and see why this works!
Accessing Characters within Strings
A string
is a variable that holds a bunch of characters in a sequence. In order to create and work with string
variables, you need to include the string
library at the start of your program (i.e., #include <string>
). Everything related to strings is part of the std
namespace. To declare a string, you can just write the string within double quotes:
1string s = "Johnald MacDonald";
The entries of the string are labeled with “indices”, starting from 0:
J o h n a l d _ M a c D o n a l d
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
In order to access the character at the index i
, we can use s.at(i)
:
1string s = "Johnald MacDonald";
2cout << s.at(0) << endl; // prints J
3cout << s.at(7) << endl; // prints a space
4cout << s.at(16) << endl; // prints a d
5cout << s.at(30) << endl; // crashes the program.
This last line where we tried accessing an index past the end of a string is an example of a runtime error! Such errors are most commonly caused by mislabelling the indices of your string, e.g. if you start labelling with 1
instead of with 0
.
Remark 1.
An alternative way to access a string’s contents is s[0]
instead of s.at(0)
. However, this is “unsafe” — the .at
function will intentionally crash the program whenever you enter an invalid index. In contrast, it’s possible that s[3991]
ends up working for reasons we will explain in a few weeks, and it can result in a program changing parts of your computer it was never supposed to access. If you’re interested, learn about buffer overflow attacks.
Some Other String Operations
There are some other operations that are above the level of character-by-character manipulations, and here are the ways we would perform them:
- Joining two strings together:
Ifstring first = "John";
andstring last = "McDonald";
, we may want to put them into a single variable holding the full name. To do this, we would typestring full_name = first + last;
, and this (almost) does what you would expect.full_name
now holds the stringJohnMcDonald
. To add a space between the names, we may performfirst + " " + last
. - Getting a substring:
Ifstring full_name = "John Old McDonald";
and we want a variable only holding the first name, we may use thesubstr
function. The syntax isfull_name.substr([start], [length])
. Thus, we could typestring first = full_name.substr(0, 4);
, which keeps the first four characters starting from index0
. You may omit the length, in which case the substring will run to the end of the string; for instance,string last = full_name.substr(9);
keeps everything from index9
and onwards, solast
will hold"McDonald"
.
There are some other operations one can perform with strings, but they’re far too numerous for me to list here. If you’re curious about how to search for a substring of a string, how to remove spaces from either end of a string, etc., you may check the C++ reference, which contains documentation of every single function available for use on strings! Some key functions to know are find, rfind, getline, push_back, and pop_back.
Note that although we call expressions like "text"
string literals, they are not actually strings! Thus, the manipulations that work on string variables may not actually work on string literals, as demonstrated in the following example:
Problem 2.
Determine and classify the error in the following C++ program
Some Practise With Strings
Problem 3.
Predict the output of the following code. You may look at an ASCII table.
1#include <iostream>
2#include <string>
3
4using namespace std;
5
6int main() {
7 string s = "Hello world!";
8 char c = s.at(6);
9
10 cout << ++c << endl;
11 cout << s.at(6)++ << endl;
12 cout << c << endl;
13 cout << s << endl;
14
15 return 0;
16}
Solution
x
, w
, x
, and Hello xorld!
, each on separate lines. The variable c
is a distinct copy of the character at index 6
in the string s
: changes to one are not reflected by the other.
Problem 4.
Predict the output of the following code. You may use an ASCII table if necessary.
1#include <iostream>
2#include <string>
3
4using namespace std;
5
6int main() {
7 string s1 = "Veni, vidi, vici.";
8 int n = 9;
9 ++s1.at(n++);
10
11 string s2 = s1.substr(12);
12 ++s1.at(--n);
13
14 cout << s1 << endl;
15 cout << s2 << endl;
16
17 return 0;
18}
Note that this does not produce undefined behaviour, as the increment and decrement operators are applied to different objects on lines 9 and 12!