Strings, the bedrock of text manipulation, play a pivotal role in programming across various languages. In C++, the standard library offers a powerful solution – the string class. Let’s dive into the world of C++ strings and explore the tools C++ provides for efficient string processing.
Table of Contents
- The Evolution from Character Arrays
- Understanding the Basics
- What’s in a string?
- Creating and Initializing C++ Strings
- Appending, Inserting, and Concatenating Strings
- Replace String Characters
- Searching in Strings
- Removing Characters from Strings
- Comparing Strings
The Evolution from Character Arrays
In C programming, character arrays often served as the go-to for string processing. However, the constant juggling between static quoted strings, stack-based arrays, and heap-allocated arrays posed challenges. String manipulation with character arrays became a breeding ground for misunderstandings and bugs.
Enter the C++ string class. It elegantly addresses the complexities of character array manipulation, bringing an end to memory management woes during assignments and copy-constructions. With C++ strings, you can bid farewell to the intricacies of handling char* pointers and copying entire arrays – the string class takes care of it seamlessly.
Understanding the Basics
Handling text is perhaps one of the oldest of all programming applications, so it’s not surprising that the C++ string draws heavily on the ideas and terminology that have been used for this purpose. No matter which programming idiom you choose, there are really only about three things you want to do with a string:
- Create or modify the sequence of characters stored in the string.
- Detect the presence or absence of elements within the string.
- Translate between various schemes for representing string characters.
What’s in a string?
In C, a string is essentially an array of characters with a binary zero, known as the null terminator, marking its end. However, C++ strings redefine the game. They shield you from the practical details of array dimensions and null terminators. A C++ string object encapsulates vital information about its data, including its starting location, content, length, and buffer size for potential growth.
The memory layout specifics for the string class are intentionally left undefined by the C++ Standard. This flexibility allows different implementations by compiler vendors while ensuring consistent behavior for users. Whether employing reference counting or not, C++ strings mitigate common C programming pitfalls like array boundary overwrites and dangling pointers.
Creating and Initializing C++ Strings
Creating and initializing strings is a straightforward and flexible. In the example in this section, the first string, imBlank, is declared but contains no initial value. This string object has been initialized to hold ‘no characters’ and can properly report its zero length and absence of data elements through the use of class member functions. Put another way, this example illustrates that string objects let you do the following:
- Create an empty string and defer initializing it with character data.
- Initialize a string by passing a literal, quoted character array as an argument to the constructor.
- Initialize a string using the equal sign (=).
- Use one string to initialize another.
1 2 3 4 5 6 7 8 9 | #include <string> using namespace std; int main() { string imBlank; string heyMom("Where are my socks?"); string standardReply = "Beamed into deep " "space on wide angle dispersion?"; string useThisOneAgain(standardReply); } |
These are the simplest forms of string initialization, but variations offer more flexibility and control. You can do the following:
- Use a portion of either a C char array or a C++ string.
- Combine different sources of initialization data using operator+.
- Use the string object’s substr( ) member function to create a substring.
Here’s a program that illustrates these features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #include <iostream> #include <string> int main() { std::string s1("What is the sound of one clam napping?"); std::string s2("Anything worth doing is worth overdoing."); std::string s3("I saw Elvis in a UFO"); // Copy the first 8 chars std::string s4(s1, 0, 8); std::cout << s4 << std::endl; // Copy 6 chars from the middle of the source std::string s5(s2, 15, 6); std::cout << s5 << std::endl; // Copy from the middle to end std::string s6(s3, 6, 15); std::cout << s6 << std::endl; // Copy all sorts of stuff std::string quoteMe = s4 + "that" + // substr() copies 10 chars at element 20 s1.substr(20, 10) + s5 + // substr() copies up to either 100 char // or eos starting at element 5 "with" + s3.substr(5, std::string::npos) + // OK to copy a single char this way s1.substr(37, 1); std::cout << quoteMe << std::endl; return 0; } |
Here’s the output from the program:
What is
doing
Elvis in a UFO
What is that one clam doing with Elvis in a UFO?
C++ allows string initialization techniques to be mixed in a single statement, a flexible and convenient feature. Also notice that the last initializer copies just one character from the source string.
Another slightly more subtle initialization technique involves the use of the string iterators string::begin( ) and string::end( ). This technique treats a string like a container object which uses iterators to indicate the start and end of a sequence of characters. In this way you can hand a string constructor two iterators, and it copies from one to the other into the new string:
1 2 3 4 5 6 7 8 9 10 11 | #include <string> #include <iostream> #include <cassert> using namespace std; int main() { string source("demo"); string s(source.begin(), source.end()); assert(s == source); } |
You can also increment, decrement, and add integer offsets to these iterators to extract a subset of characters from the source string.
Appending, Inserting, and Concatenating Strings
One of the most valuable and convenient aspects of C++ Strings is that they grow as needed. Not only does this make string-handling code inherently more trustworthy, it also almost entirely eliminates a tedious track of the bounds of the storage in which strings live.
Appending, concatenating, and inserting strings are common operations in C++ programming. The string member functions append()
and insert()
play a crucial role in handling these scenarios. One remarkable feature of these functions is their ability to transparently reallocate storage when a string grows.
Append Operation on Strings
In the code snippet above, the append()
function is used to concatenate the string “world!” to the existing string “Hello, “. If the size of the resulting string exceeds the current allocated storage, the append()
function transparently reallocates storage to accommodate the expanded string.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <iostream> #include <string> int main() { std::string initialString = "Hello, "; // Appending to the string using append() initialString.append("world!"); std::cout << initialString << std::endl; return 0; } |
Insert Operation on Strings
Here, the insert()
function is employed to insert the substring “learning ” into the original string at position 7. Similar to append()
, if the insertion causes the string to outgrow its allocated space, insert()
takes care of reallocating storage seamlessly.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <iostream> #include <string> int main() { std::string originalString = "I love programming!"; // Inserting into the string using insert() originalString.insert(7, "learning "); std::cout << originalString << std::endl; return 0; } |
Concatenate Operation on Strings
Concatenation involves combining two or more strings to create a new string. In C++, you can perform concatenation using the +
operator or the append()
member function. Here’s an example using both methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #include <iostream> #include <string> int main() { // Using the + operator for concatenation std::string firstName = "John"; std::string lastName = "Doe"; std::string fullName = firstName + " " + lastName; std::cout << "Full Name (using + operator): " << fullName << std::endl; // Using the append() member function for concatenation std::string greeting = "Hello, "; std::string target = "world!"; greeting.append(target); std::cout << "Greeting (using append()): " << greeting << std::endl; return 0; } |
Both approaches achieve the same result, creating a new string by combining the contents of existing strings. Choose the method that aligns with your coding style or the specific requirements of your program.
Benefits of Transparent Reallocation:
- Efficiency: The transparent reallocation ensures that you can focus on the logic of string manipulation without worrying about managing memory and storage explicitly.
- Simplicity: The use of these functions simplifies code and eliminates the need for manual memory management and reduces the chances of memory related exceptions and errors.
- Flexibility: You can perform string operations without concerning themselves with the underlying details of memory allocation and deallocation.
The append()
and insert()
functions in C++ strings provide a powerful and convenient mechanism for manipulating strings while efficiently managing memory. Their ability to transparently reallocate storage when needed enhances the flexibility and simplicity of string handling in C++.
Replace String Characters
If you want the size of the string to remain unchanged, use the replace( ) function to overwrite characters. There are quite a number of overloaded versions of replace( ), but the simplest one takes three arguments: an integer indicating where to start in the string, an integer indicating how many characters to eliminate from the original string, and the replacement string. Here’s a simple example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <iostream> #include <string> using namespace std; int main() { // Replacing string characters string myString = "C++ is amazing!"; myString.replace(0, 3, "Java"); cout << myString << endl; return 0; } |
You should actually check to see if you’ve found anything before you perform a replace( ). The previous example replaces with a char*, but there’s an overloaded version that replaces with a string. Here’s a more complete demonstration replace( );
If replace doesn’t find the search string, it returns string::npos. The npos data member is a static constant member of the string class that represents a nonexistent character position.
The replace( ) algorithm only works with single objects (in this case, char objects) and will not replace quoted char arrays or string objects. Since a string behaves like an STL sequence, a number of other algorithms can be applied to it, which might solve other problems that are not directly addressed by the string member functions.
Searching in Strings
The find family of string member functions allows you to locate a character or group of characters within a given string.
The simplest use of find()
involves searching for one or more characters in a string. This overloaded version of find()
takes a parameter specifying the character(s) to search for and, optionally, a parameter indicating where in the string to begin the search. If no starting position is provided, the default is at the beginning (position 0). By incorporating find()
within a loop, you can systematically navigate through a string, enabling the discovery of all occurrences of a particular character or substring.
The following program uses the method of The Sieve of Erasthones to find prime numbers less than 50:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include <cmath> #include <cstddef> #include <string> #include <iostream> using namespace std; class SieveTest { string sieveChars; public: // Create a 50 char string and set each // element to 'P' for Prime SieveTest() : sieveChars(50, 'P') {} void run() { findPrimes(); testPrimes(); } bool isPrime(int p) { if (p == 0 || p == 1) return false; int root = int(sqrt(double(p))); for (int i = 2; i <= root; ++i) if (p % i == 0) return false; return true; } void findPrimes() { // By definition neither 0 nor 1 is prime. // Change these elements to "N" for Not Prime sieveChars.replace(0, 2, "NN"); // Walk through the array: size_t sieveSize = sieveChars.size(); int root = int(sqrt(double(sieveSize))); for (int i = 2; i <= root; ++i) // Find all the multiples: for (size_t factor = 2; factor * i < sieveSize; ++factor) sieveChars[factor * i] = 'N'; } void testPrimes() { size_t i = sieveChars.find('P'); while (i != string::npos) { cout << isPrime(i++); i = sieveChars.find('P', i); } i = sieveChars.find_first_not_of('P'); while (i != string::npos) { cout << !isPrime(i++); i = sieveChars.find_first_not_of('P', i); } } }; int main() { SieveTest t; t.run(); } |
The find( ) function allows you to walk forward through a string, detecting multiple occurrences of a character or a group of characters, and find_first_not_of( ) allows you to find other characters or substrings.
There are no functions in the string class to change the case of a string, but you can easily create these functions using the Standard C library functions toupper( ) and tolower( ), which change the case of one character at a time.
Both the upperCase( ) and lowerCase( ) functions follow the same form: they make a copy of the argument string and change the case.
Removing Characters from Strings
Removing characters from a string is straightforward with the erase()
member function in C++. This function takes two arguments: the starting position to begin removing characters (defaulting to 0), and the count of characters to remove (defaulting to string::npos
, indicating removal until the end of the string). If you specify more characters than the string contains, all remaining characters are erased. Omitting both arguments results in the removal of all characters from the string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | #include <cassert> #include <cmath> #include <cstddef> #include <fstream> #include <iostream> #include <string> #include "../require.h" using namespace std; // Function to replace all occurrences of 'from' with 'to' in 'context' string& replaceAll(string& context, const string& from, const string& to); // Function to strip HTML tags and special characters from a string string& stripHTMLTags(string& s) { static bool inTag = false; bool done = false; while (!done) { if (inTag) { // The previous line started an HTML tag // but didn't finish. Must search for '>'. size_t rightPos = s.find('>'); if (rightPos != string::npos) { inTag = false; s.erase(0, rightPos + 1); } else { done = true; s.erase(); } } else { // Look for start of tag: size_t leftPos = s.find('<'); if (leftPos != string::npos) { // See if tag close is in this line size_t rightPos = s.find('>'); if (rightPos == string::npos) { inTag = done = true; s.erase(leftPos); } else s.erase(leftPos, rightPos - leftPos + 1); } else done = true; } } // Remove all special HTML characters replaceAll(s, "<", "<"); replaceAll(s, ">", ">"); replaceAll(s, "&", "&"); replaceAll(s, " ", " "); // Additional replacements... return s; } int main(int argc, char* argv[]) { requireArgs(argc, 1, "usage: HTMLStripper InputFile"); ifstream in(argv[1]); assure(in, argv[1]); string s; while (getline(in, s)) if (!stripHTMLTags(s).empty()) cout << s << endl; return 0; } |
This example effectively removes HTML tags, including those spanning multiple lines, using the erase()
function. The inTag
static flag keeps track of whether the start of a tag is found, but the corresponding tag end is not in the same line. Multiple erase()
calls, along with replaceAll()
, are employed to clean up the HTML content.
Additionally, the usage of getline()
from the <string>
header ensures easy handling of arbitrarily long lines, eliminating concerns about the dimension of a character array, as seen with istream::getline()
.
Comparing Strings
In C++, comparing strings introduces a nuanced approach different from the straightforward comparison of numbers. Unlike numbers with constant, universally meaningful values, comparing strings requires a lexical comparison. Lexical comparison involves evaluating the numeric representation of characters based on the collating sequence of the character set in use, typically the ASCII collating sequence.
In ASCII, characters are assigned numeric values, and a lexical comparison involves comparing these values. For example, in the ASCII sequence, the space character comes first, followed by common punctuation marks, uppercase and lowercase letters, and so on. This means that when a lexical comparison reports that string s1
is ‘greater than’ string s2
, it signifies that the first differing character in s1
comes later in the alphabet than the character in the same position in s2
.
Using Overloaded Operators for String Comparison
C++ provides a set of nonmember, overloaded operator functions for straightforward string comparison. The simplest to use are the nonmember, overloaded operator functions: operator ==, operator != operator >, operator <, operator >=, and operator <=. The following example showcases their usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #include <string> #include "../TestSuite/Test.h" using namespace std; class CompStrTest : public TestSuite::Test { public: void run() { // Strings to compare string s1("This"); string s2("That"); test_(s1 == s1); test_(s1 != s2); test_(s1 > s2); test_(s1 >= s2); test_(s1 >= s1); test_(s2 < s1); test_(s2 <= s1); test_(s1 <= s1); } }; int main() { CompStrTest t; t.run(); return t.report(); } |
These operators work seamlessly for comparing entire strings or individual characters within strings. Notably, the string class overloads these operators for the direct comparison of string objects, quoted literals, and pointers to C-style strings without creating temporary string objects.
The compare() Member Function
For more sophisticated and precise comparisons, the compare()
member function offers an extensive set of overloaded versions. This function allows you to compare complete strings, substrings, and subsets of two strings. Here’s an example comparing complete strings:
You won’t find the logical not (!) or the logical comparison operators (&& and ||) among operators for a string. Neither will you find overloaded versions of the bitwise C operators &, |, ^, or ~. The overloaded nonmember comparison operators for the string class are limited to the subset that has clear, unambiguous application to single characters or groups of characters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <cassert> #include <string> using namespace std; int main() { string first("This"); string second("That"); assert(first.compare(first) == 0); assert(second.compare(second) == 0); // Lexical comparison assert(first.compare(second) > 0); assert(second.compare(first) < 0); // Swapping contents first.swap(second); assert(first.compare(second) < 0); assert(second.compare(first) > 0); return 0; } |
Indexing Mechanisms for Strings
C++ strings offer two indexing mechanisms: []
and at()
. Both produce the same result under normal circumstances. However, when accessing elements out of bounds, at()
provides safer behavior by throwing an exception (out_of_range
). Using at()
allows you to gracefully recover from invalid references:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <exception> #include <iostream> #include <string> using namespace std; int main() { string s("1234"); // at() throws an exception for out-of-bounds access try { s.at(5); } catch(exception& e) { cerr << e.what() << endl; // Output: "invalid string position" } return 0; } |
In contrast, using []
for subscripting leaves you to handle invalid references on your own. Embracing the safety of at()
is recommended for responsible programming.