Programming: Principles and Practice Using C++ (2014)

Part III: Data and Algorithms

20. Containers and Iterators

“Write programs that do one thing
and do it well. Write programs
to work together.”

—Doug McIlroy

This chapter and the next present the STL, the containers and algorithms part of the C++ standard library. The STL is an extensible framework for dealing with data in a C++ program. After a first simple example, we present the general ideals and the fundamental concepts. We discussiteration, linked-list manipulation, and STL containers. The key notions of sequence and iterator are used to tie containers (data) together with algorithms (processing). This chapter lays the groundwork for the general, efficient, and useful algorithms presented in the next chapter. As an example, it also presents a framework for text editing as a sample application.


20.1 Storing and processing data

20.1.1 Working with data

20.1.2 Generalizing code

20.2 STL ideals

20.3 Sequences and iterators

20.3.1 Back to the example

20.4 Linked lists

20.4.1 List operations

20.4.2 Iteration

20.5 Generalizing vector yet again

20.5.1 Container traversal

20.5.2 auto

20.6 An example: a simple text editor

20.6.1 Lines

20.6.2 Iteration

20.7 vectorlist, and string

20.7.1 insert and erase

20.8 Adapting our vector to the STL

20.9 Adapting built-in arrays to the STL

20.10 Container overview

20.10.1 Iterator categories


20.1 Storing and processing data

Before looking into dealing with larger collections of data items, let’s consider a simple example that points to ways of handling a large class of data-processing problems. Jack and Jill are each measuring vehicle speeds, which they record as floating-point values. Jack was brought up as a C programmer and stores his values in an array, whereas Jill stores hers in a vector. Now we’d like to use their data in our program. How might we do this?

We could have Jack’s and Jill’s programs write out the values to a file and then read them back into our program. That way, we are completely insulated from their choices of data structures and interfaces. Often, such isolation is a good idea, and if that’s what we decide to do we can use the techniques from Chapters 1011 for input and a vector<double> for our calculations.

But, what if using files isn’t a good option for the task we want to do? Let’s say that the data-gathering code is designed to be invoked as a function call to deliver a new set of data every second. Once a second, we call Jack’s and Jill’s functions to deliver data for us to process:

double* get_from_jack(int* count);  // Jack puts doubles into an array and
                                                                       // returns the number of elements in *count
vector<double>* get_from_jill();      // Jill fills the vector

void fct()
{
          int jack_count = 0;
          double* jack_data = get_from_jack(&jack_count);
          vector<double>* jill_data = get_from_jill();
          // . . . process . . .
          delete[] jack_data;
          delete jill_data;
}

The assumption is that the data is stored on the free store and that we should delete it when we are finished using it. Another assumption is that we can’t rewrite Jack’s and Jill’s code, or wouldn’t want to.

20.1.1 Working with data

Clearly, this is a somewhat simplified example, but it is not dissimilar to a vast number of real-world problems. If we can handle this example elegantly, we can handle a huge number of common programming problems. The fundamental problem here is that we don’t control the way in which our “data suppliers” store the data they give us. It’s our job to either work with the data in the form in which we get it or to read it and store it the way we like better.

What do we want to do with that data? Sort it? Find the highest value? Find the average value? Find every value over 65? Compare Jill’s data with Jack’s? See how many readings there were? The possibilities are endless, and when writing a real program we will simply do the computation required. Here, we just want to do something to learn how to handle data and do computations involving lots of data. Let’s first do something really simple: find the element with the highest value in each data set. We can do that by inserting this code in place of the “. . . process . . .” comment in fct():

// . . .
double h = –1;
double* jack_high;  // jack_high will point to the element with the highest value
double* jill_high;    // jill_high will point to the element with the highest value
for (int i=0; i<jack_count; ++i)
          if (h<jack_data[i]) {
                    jack_high = &jack_data[i];   // save address of largest element
                    h = jack_data[i];                    // update “largest element”
     }

h = –1;
for (int i=0; i< jill_data –>size(); ++i)
          if (h<(*jill_data)[i]) {
                    jill_high = &(*jill_data)[i];   // save address of largest element
                    h = (*jill_data)[i];                 // update “largest element”
     }

cout << "Jill's max: " << *jill_high
          << "; Jack's max: " << *jack_high;

// . . .

Note the ugly notation we use to access Jill’s data: (*jill_data)[i]. The function get_from_jill() returns a pointer to a vector, a vector<double>*. To get to the data, we first have to dereference the pointer to get to the vector, *jill_data, then we can subscript that. However, *jill_data[i] isn’t what we want; that means *(jill_data[i]) because [ ] binds tighter than *, so we need the parentheses around *jill_data and get (*jill_data)[i].


Image Try This

If you were able to change Jill’s code, how would you redesign its interface to get rid of the ugliness?


20.1.2 Generalizing code

Image

What we would like is a uniform way of accessing and manipulating data so that we don’t have to write our code differently each time we get data presented to us in a slightly different way. Let’s look at Jack’s and Jill’s code as examples of how we can make our code more abstract and uniform.

Obviously, what we do for Jack’s data strongly resembles what we do for Jill’s. However, there are some annoying differences: jack_count vs. jill_data–>size() and jack_data[i] vs. (*jill_data)[i]. We could eliminate the latter difference by introducing a reference:

vector<double>& v = *jill_data;
for (int i=0; i<v.size(); ++i)
          if (h<v[i]) {
                    jill_high = &v[i];
                    h = v[i];
          }

This is tantalizingly close to the code for Jack’s data. What would it take to write a function that could do the calculation for Jill’s data as well as for Jack’s? We can think of several ways (see exercise 3), but for reasons of generality which will become clear over the next two chapters, we chose a solution based on pointers:

double* high(double* first, double* last)
// return a pointer to the element in [first,last) that has the highest value
{
          double h = –1;
          double* high;
          for(double* p = first; p!=last; ++p)
                    if (h<*p) { high = p; h = *p; }
          return high;
}

Given that, we can write

double* jack_high = high(jack_data,jack_data+jack_count);
vector<double>& v = *jill_data;
double* jill_high = high(&v[0],&v[0]+v.size());

This looks better. We don’t introduce so many variables and we write the loop and the loop body only once (in high()). If we want to know the highest values, we can look at *jack_high and *jill_high. For example:

cout << "Jill's max: " << *jill_high
          << "; Jack's max: " << *jack_high;

Note that high() relies on a vector storing its elements in an array, so that we can express our “find highest element” algorithm in terms of pointers into an array.


Image Try This

We left two potentially serious errors in this little program. One can cause a crash, and the other will give wrong answers if high() is used in many other programs where it might have been useful. The general techniques that we describe below will make them obvious and show how to systematically avoid them. For now, just find them and suggest remedies.


This high() function is limited in that it is a solution to a single specific problem:

• It works for arrays only. We rely on the elements of a vector being stored in an array, but there are many more ways of storing data, such as lists and maps (see §20.4 and §21.6.1).

• It can be used for vectors and arrays of doubles, but not for arrays or vectors with other element types, such as vector<double*> and char[10].

• It finds the element with the highest value, but there are many more simple calculations that we want to do on such data.

Let’s explore how we can support this kind of calculation on sets of data in far greater generality.

Please note that by deciding to express our “find highest element” algorithm in terms of pointers, we “accidentally” generalized it to do more than we required: we can — as desired — find the highest element of an array or a vector, but we can also find the highest element in part of an array or in part of a vector. For example:

// . . .
vector<double>& v = *jill_data;
double* middle = &v[0]+v.size()/2;
double* high1 = high(&v[0], middle);                      // max of first half
double* high2 = high(middle, &v[0]+v.size());       // max of second half
// . . .

Here high1 will point to the element with the largest value in the first half of the vector and high2 will point to the element with the largest value in the second half. Graphically, it will look something like this:

Image

We used pointer arguments for high(). That’s a bit low-level and can be error-prone. We suspect that for many programmers, the obvious function for finding the element with the largest value in a vector would look like this:

double* find_highest(vector<double>& v)
{
          double h = –1;
          double* high = 0;
          for (int i=0; i<v.size(); ++i)
                    if (h<v[i]) { high = &v[i]; h = v[i]; }
          return high;
}

However, that wouldn’t give us the flexibility we “accidentally” obtained from high() — we can’t use find_highest() to find the element with the highest value in part of a vector. We actually achieved a practical benefit from writing a function that could be used for both arrays and vectors by “messing with pointers.” We will remember that: generalization can lead to functions that are useful for more problems.

20.2 STL ideals

The C++ standard library provides a framework for dealing with data as sequences of elements, called the STL. STL is usually said to be an acronym for “standard template library.” The STL is the part of the ISO C++ standard library that provides containers (such as vector, list, and map) and generic algorithms (such as sort, find, and accumulate). Thus we can — and do — refer to facilities, such as vector, as being part of both “the STL” and “the standard library.” Other standard library features, such as ostream (Chapter 10) and C-style string functions (§B.11.3), are not part of the STL. To better appreciate and understand the STL, we will first consider the problems we must address when dealing with data and the ideals we have for a solution.

Image

There are two major aspects of computing: the computation and the data. Sometimes we focus on the computation and talk about if-statements, loops, functions, error handling, etc. At other times, we focus on the data and talk about arrays, vectors, strings, files, etc. However, to get useful work done we need both. A large amount of data is incomprehensible without analysis, visualization, and searching for “the interesting bits.” Conversely, we can compute as much as we like, but it’s going to be tedious and sterile unless we have some data to tie our computation to something real. Furthermore, the “computation part” of our program has to elegantly interact with the “data part.”

Image

Image

When we talk about data in this way, we think of lots of data: dozens of Shapes, hundreds of temperature readings, thousands of log records, millions of points, billions of web pages, etc.; that is, we talk about processing containers of data, streams of data, etc. In particular, this is not a discussion of how best to choose a couple of values to represent a small object, such as a complex number, a temperature reading, or a circle. For such types, see Chapters 911, and 14.

Consider some simple examples of something we’d like to do with “a lot of data”:

• Sort the words in dictionary order.

• Find a number in a phone book, given a name.

• Find the highest temperature.

• Find all values larger than 8800.

• Find the first occurrence of the value 17.

• Sort the telemetry records by unit number.

• Sort the telemetry records by time stamp.

• Find the first value larger than “Petersen.”

• Find the largest amount.

• Find the first difference between two sequences.

• Compute the pair-wise product of the elements of two sequences.

• Find the highest temperature for each day in a month.

• Find the top ten best sellers in the sales records.

• Count the number of occurrences of “Stroustrup” on the web.

• Compute the sum of the elements.

Note that we can describe each of these tasks without actually mentioning how the data is stored. Clearly, we must be dealing with something like lists, vectors, files, input streams, etc. for these tasks to make sense, but we don’t have to know the details about how the data is stored (or gathered) to talk about what to do with it. What is important is the type of the values or objects (the element type), how we access those values or objects, and what we want to do with them.

These kinds of tasks are very common. Naturally, we want to write code performing such tasks simply and efficiently. Conversely, the problems for us as programmers are:

• There is an infinite variation of data types (“kinds of data”).

• There is a bewildering number of ways to store collections of data elements.

• There is a huge variety of tasks we’d like to do with collections of data.

To minimize the effect of these problems, we’d like our code to take advantage of commonalities among types, among the ways of storing data, and among our processing tasks. In other words, we want to generalize our code to cope with these kinds of variations. We really don’t want to hand-craft each solution from scratch; that would be a tedious waste of time.

To get an idea of what support we would like for writing our code, consider a more abstract view of what we do with data:

• Collect data into containers

• Such as vector, list, and array

• Organize data

• For printing

• For fast access

• Retrieve data items

• By index (e.g., the 42nd element)

• By value (e.g., the first record with the “age field” 7)

• By properties (e.g., all records with the “temperature field” >32 and <100)

• Modify a container

• Add data

• Remove data

• Sort (according to some criteria)

• Perform simple numeric operations (e.g., multiply all elements by 1.7)

We’d like to do these things without getting sucked into a swamp of details about differences among containers, differences in ways of accessing elements, and differences among element types. If we can do that, we’ll have come a long way toward our goal of simple and efficient use of large amounts of data.

Looking back at the programming tools and techniques from the previous chapters, we note that we can (already) write programs that are similar independently of the data type used:

• Using an int isn’t all that different from using a double.

• Using a vector<int> isn’t all that different from using a vector<string>.

• Using an array of double isn’t all that different from using a vector<double>.

Image

We’d like to organize our code so that we have to write new code only when we want to do something really new and different. In particular, we’d like to provide code for common programming tasks so that we don’t have to rewrite our solution each time we find a new way of storing the data or find a slightly different way of interpreting the data.

• Finding a value in a vector isn’t all that different from finding a value in an array.

• Looking for a string ignoring case isn’t all that different from looking at a string considering uppercase letters different from lowercase ones.

• Graphing experimental data with exact values isn’t all that different from graphing data with rounded values.

• Copying a file isn’t all that different from copying a vector.

We want to build on these observations to write code that’s

• Easy to read

• Easy to modify

• Regular

• Short

• Fast

To minimize our programming work, we would like

Image

• Uniform access to data

• Independently of how it is stored

• Independently of its type

• Type-safe access to data

• Easy traversal of data

• Compact storage of data

• Fast

• Retrieval of data

• Addition of data

• Deletion of data

• Standard versions of the most common algorithms

• Such as copy, find, search, sort, sum, . . .

The STL provides that, and more. We will look at it not just as a very useful set of facilities, but also as an example of a library designed for maximal flexibility and performance. The STL was designed by Alex Stepanov to provide a framework for general, correct, and efficient algorithms operating on data structures. The ideal was the simplicity, generality, and elegance of mathematics.

Image

The alternative to dealing with data using a framework with clearly articulated ideals and principles is for each programmer to craft each program out of the basic language facilities using whatever ideas seem good at the time. That’s a lot of extra work. Furthermore, the result is often an unprincipled mess; rarely is the result a program that is easily understood by people other than its original designer, and only by chance is the result code that we can use in other contexts.

Having considered the motivation and the ideals, let’s look at the basic definitions of the STL, and then finally get to the examples that’ll show us how to approximate those ideals — to write better code for dealing with data and to do so with greater ease.

20.3 Sequences and iterators

Image

The central concept of the STL is the sequence. From the STL point of view, a collection of data is a sequence. A sequence has a beginning and an end. We can traverse a sequence from its beginning to its end, optionally reading or writing the value of each element. We identify the beginning and the end of a sequence by a pair of iterators. An iterator is an object that identifies an element of a sequence. We can think of a sequence like this:

Image

Here, begin and end are iterators; they identify the beginning and the end of the sequence. An STL sequence is what is usually called “half-open”; that is, the element identified by begin is part of the sequence, but the end iterator points one beyond the end of the sequence. The usual mathematical notation for such sequences (ranges) is [begin:end). The arrows from one element to the next indicate that if we have an iterator to one element we can get an iterator to the next.

What is an iterator? An iterator is a rather abstract notion:

Image

• An iterator points to (refers to) an element of a sequence (or one beyond the last element).

• You can compare two iterators using == and !=.

• You can refer to the value of the element pointed to by an iterator using the unary * operator (“dereference” or “contents of”).

• You can get an iterator to the next element by using ++.

For example, if p and q are iterators to elements of the same sequence:

Image

Clearly, the idea of an iterator is related to the idea of a pointer (§17.4). In fact, a pointer to an element of an array is an iterator. However, many iterators are not just pointers; for example, we could define a range-checked iterator that throws an exception if you try to make it point outside its [begin:end) sequence or dereference end. It turns out that we get enormous flexibility and generality from having iterator as an abstract notion rather than as a specific type. This chapter and the next will give several examples.


Image Try This

Write a function void copy(int* f1, int* e1, int* f2) that copies the elements of an array of ints defined by [f1:e1) into another [f2:f2+(e1–f1)). Use only the iterator operations mentioned above (not subscripting).


Iterators are used to connect our code (algorithms) to our data. The writer of the code knows about the iterators (and not about the details of how the iterators actually get to the data), and the data provider supplies iterators rather than exposing details about how the data is stored to all users. The result is pleasingly simple and offers an important degree of independence between algorithms and containers. To quote Alex Stepanov: “The reason STL algorithms and containers work so well together is that they don’t know anything about each other.” Instead, both understand about sequences defined by pairs of iterators.

Image

In other words, my algorithms no longer have to know about the bewildering variety of ways of storing and accessing data; they just have to know about iterators. Conversely, if I’m a data provider, I no longer have to write code to serve a bewildering variety of users; I just have to implement an iterator for my data. At the most basic level, an iterator is defined by just the *, ++, ==, and != operators. That makes them simple and fast.

The STL framework consists of about ten containers and about 60 algorithms connected by iterators (see Chapter 21). In addition, many organizations and individuals provide containers and algorithms in the style of the STL. The STL is probably the currently best-known and most widely used example of generic programming (§19.3.2). If you know the basic concepts and a few examples, you can use the rest.

20.3.1 Back to the example

Let’s see how we can express the “find the element with the largest value” problem using the STL notion of a sequence:

template<typename Iterator>
Iterator high(Iterator first, Iterator last)
          // return an iterator to the element in [first:last) that has the highest value
{
          Iterator high = first;
          for (Iterator p = first; p!=last; ++p)
                    if (*high<*p) high = p;
          return high;
}

Note that we eliminated the local variable h that we had used to hold the highest value seen so far. When we don’t know the name of the actual type of the elements of the sequence, the initialization by –1 seems completely arbitrary and odd. That’s because it was arbitrary and odd! It was also an error waiting to happen: in our example –1 worked only because we happened not to have any negative velocities. We knew that “magic constants,” such as –1, are bad for code maintenance (§4.3.1, §7.6.1, §10.11.1, etc.). Here, we see that they can also limit the utility of a function and can be a sign of incomplete thought about the solution; that is, “magic constants” can be — and often are — a sign of sloppy thinking.

Note that this “generic” high() can be used for any element type that can be compared using <. For example, we could use high() to find the lexicographically last string in a vector<string> (see exercise 7).

The high() template function can be used for any sequence defined by a pair of iterators. For example, we can exactly replicate our example program:

double* get_from_jack(int* count);        // Jack puts doubles into an array and
                                                                 // returns the number of elements in *count
vector<double>* get_from_jill();             // Jill fills the vector

void fct()
{
          int jack_count = 0;
          double* jack_data = get_from_jack(&jack_count);
          vector<double>* jill_data = get_from_jill();

          double* jack_high = high(jack_data,jack_data+jack_count);
          vector<double>& v = *jill_data;
          double* jill_high = high(&v[0],&v[0]+v.size());
          cout << "Jill's high " << *jill_high << "; Jack's high " << *jack_high;
          // . . .
          delete[] jack_data;
          delete jill_data;
}

For the two calls here, the Iterator template argument type for high() is double*. Apart from (finally) getting the code for high() correct, there is apparently no difference from our previous solution. To be precise, there is no difference in the code that is executed, but there is a most important difference in the generality of our code. The templated version of high() can be used for every kind of sequence that can be described by a pair of iterators. Before looking at the detailed conventions of the STL and the useful standard algorithms that it provides to save us from writing common tricky code, let’s consider a couple of more ways of storing collections of data elements.


Image Try This

We again left a serious error in that program. Find it, fix it, and suggest a general remedy for that kind of problem.


20.4 Linked lists

Image

Consider again the graphical representation of the notion of a sequence:

Image

Compare it to the way we visualize a vector in memory:

Image

Basically, the subscript 0 identifies the same element as does the iterator v.begin(), and the subscript v.size() identifies the one-beyond-the-last element also identified by the iterator v.end().

The elements of the vector are consecutive in memory. That’s not required by STL’s notion of a sequence, and it so happens that there are many algorithms where we would like to insert an element in between two existing elements without moving those existing elements. The graphical representation of the abstract notion suggests the possibility of inserting elements (and of deleting elements) without moving other elements. The STL notion of iterators supports that.

The data structure most directly suggested by the STL sequence diagram is called a linked list. The arrows in the abstract model are usually implemented as pointers. An element of a linked list is part of a “link” consisting of the element and one or more pointers. A linked list where a link has just one pointer (to the next link) is called a singly-linked list and a list where a link has pointers to both the previous and the next link is called a doubly-linked list. We will sketch the implementation of a doubly-linked list, which is what the C++ standard library provides under the name of list. Graphically, it can be represented like this:

Image

This can be represented in code as

template<typename Elem>
struct Link {
          Link* prev;               // previous link
          Link* succ;               // successor (next) link
          Elem val;                   // the value
};

template<typename Elem> struct list {
          Link<Elem>* first;
          Link<Elem>* last;    // one beyond the last link
};

The layout of a Link is

Image

There are many ways of implementing linked lists and presenting them to users. A description of the standard library version can be found in Appendix B. Here, we’ll just outline the key properties of a list — you can insert and delete elements without disturbing existing elements — show how we can iterate over a list, and give an example of list use.

When you try to think about lists, we strongly encourage you to draw little diagrams to visualize the operations you are considering. Linked-list manipulation really is a topic where a picture is worth 1K words.

20.4.1 List operations

What operations do we need for a list?

Image

• The operations we have for vector (constructors, size, etc.), except subscripting

• Insert (add an element) and erase (remove an element)

• Something that can be used to refer to elements and to traverse the list: an iterator

In the STL, that iterator type is a member of its class, so we’ll do the same:

template<typename Elem>
class list {
          // representation and implementation details
public:
          class iterator;                 // member type: iterator

          iterator begin();            // iterator to first element
          iterator end( );              // iterator to one beyond last element

          iterator insert(iterator p, const Elem& v);        // insert v into list after p
          iterator erase(iterator p);                                    // remove p from the list

          void push_back(const Elem& v);                      // insert v at end
          void push_front(const Elem& v);                     // insert v at front
          void pop_front();          // remove the first element
          void pop_back();          // remove the last element

          Elem& front();               // the first element
          Elem& back();               // the last element

          // . . .
};

Just as “our” vector is not the complete standard library vector, this list is not the complete definition of the standard library list. There is nothing wrong with this list; it simply isn’t complete. The purpose of “our” list is to convey an understanding of what linked lists are, how a list might be implemented, and how to use the key features. For more information see Appendix B or an expert-level C++ book.

The iterator is central to the definition of an STL list. Iterators are used to identify places for insertion and elements for removal (erasure). They are also used for “navigating” through a list rather than using subscripting. This use of iterators is very similar to the way we used pointers to traverse arrays and vectors in §20.1 and §20.3.1. This style of iterators is the key to the standard library algorithms (§21.13).

Image

Why not subscripting for list? We could subscript a list, but it would be a surprisingly slow operation: lst[1000] would involve starting from the first element and then visiting each link along the way until we reached element number 1000. If we want to do that, we can do it ourselves (or use advance(); see §20.6.2). Consequently, the standard library list doesn’t provide the innocuous-looking subscript syntax.

We made list’s iterator type a member (a nested class) because there was no reason for it to be global. It is used only with lists. Also, this allows us to name every container’s iterator type iterator. In the standard library, we have list<T>::iterator, vector<T>::iterator, map<K,V>::iterator, and so on.

20.4.2 Iteration

The list iterator must provide *, ++, ==, and !=. Since the standard library list is a doubly-linked list, it also provides –– for iterating “backward” toward the front of the list:

template<typename Elem>         // requires Element<Elem>() (§19.3.3)
class list<Elem>::iterator {
          Link<Elem>* curr;              // current link
public:
          iterator(Link<Elem>* p) :curr{p} { }

          iterator& operator++() {curr = curr–>succ; return *this; }   // forward
          iterator& operator––() { curr = curr–>prev; return *this; }   // backward
          Elem& operator*() { return curr–>val; }   // get value (dereference)

          bool operator==(const iterator& b) const { return curr==b.curr; }
          bool operator!= (const iterator& b) const { return curr!=b.curr; }
};

These functions are short and simple, and obviously efficient: there are no loops, no complicated expressions, and no “suspicious” function calls. If the implementation isn’t clear to you, just have a quick look at the diagrams above. This list iterator is just a pointer to a link with the required operations. Note that even though the implementation (the code) for a list<Elem>::iterator is very different from the simple pointer we have used as an iterator for vectors and arrays, the meaning (the semantics) of the operations is identical. Basically, the list iterator provides suitable ++, ––,*, ==, and != for a Link pointer.

Now look at high() again:

template<typename Iter>  // requires Input_iterator<Iter>() (§19.3.3)
Iterator high(Iter first, Iter last)
          // return an iterator to the element in [first,last) that has the highest value
{
          Iterator high = first;
          for (Iterator p = first; p!=last; ++p)
                    if (*high<*p) high = p;
          return high;
          }

We can use it for a list:

void f()
{
          list<int> lst; for (int x; cin >> x; ) lst.push_front(x);

          list<int>::iterator p = high(lst.begin(), lst.end());
          cout << "the highest value was " << *p << '\n';
}

Here, the “value” of the Iterator argument is list<int>::iterator, and the implementation of ++, *, and != has changed dramatically from the array case, but the meaning is still the same. The template function high() still traverses the data (here a list) and finds the highest value. We can insert an element anywhere in a list, so we used push_front() to add elements at the front just to show that we could. We could equally well have used push_back() as we do for vectors.


Image Try This

The standard library vector doesn’t provide push_front(). Why not? Implement push_front() for vector and compare it to push_back().


Now, finally, is the time to ask, “But what if the list is empty?” In other words, “What if lst.begin()==lst.end()?” In that case, *p will be an attempt to dereference the one-beyond-the-last element, lst.end(): disaster! Or — potentially worse — the result could be a random value that might be mistaken for a correct answer.

Image

The last formulation of the question strongly hints at the solution: we can test whether a list is empty by comparing begin() and end() — in fact, we can test whether any STL sequence is empty by comparing its beginning and end:

Image

That’s the deeper reason for having end point one beyond the last element rather than at the last element: the empty sequence is not a special case. We dislike special cases because — by definition — we have to remember to write special-case code for them.

In our example, we could use that like this:

list<int>::iterator p = high(lst.begin(), lst.end());
if (p==lst.end())             // did we reach the end?
          cout << "The list is empty";
else
          cout << "the highest value is " << *p << '\n';

We use testing the return value against end() — indicating “not found” — systematically with STL algorithms.

Because the standard library provides a list, we won’t go further into the implementation here. Instead, we’ll have a brief look at what lists are good for (see exercises 12–14 if you are interested in list implementation details).

20.5 Generalizing vector yet again

Obviously, from the examples in §20.34, the standard library vector has an iterator member type and begin() and end() member functions (just like std::list). However, we did not provide those for our vector in Chapter 19. What does it really take for different containers to be used more or less interchangeably in the STL generic programming style presented in §20.3? First, we’ll outline the solution (ignoring allocators to simplify) and then explain it:

template<typename T>   // requires Element<T>() (§19.3.3)
class vector {
public:
          using size_type = unsigned long;
          using value_type = T;
          using iterator = T*;
          using const_iterator = const T*;

          // . . .

          iterator begin();
          const_iterator begin() const;
          iterator end();
          const_iterator end() const;

          size_type size();

          // . . .
};

Image

A using declaration creates an alias for a type; that is, for our vector, iterator is a synonym, another name, for the type we chose to use as our iterator: T*. Now, for a vector called v, we can write

vector<int>::iterator p = find(v.begin(), v.end(),32);

and

for (vector<int>::size_type i = 0; i<v.size(); ++i) cout << v[i] << '\n';

The point is that to write that, we don’t actually have to know what types are named by iterator and size_type. In particular, the code above, because it is expressed in terms of iterator and size_type, will work with vectors where size_type is not an unsigned long (as it is not on many embedded systems processors) and where iterator is not a plain pointer, but a class (as it is on many popular C++ implementations).

The standard defines list and the other standard containers similarly. For example:

template<typename T>               // requires Element<T>()  (§19.3.3)
class list {
public:
          class Link;
          using size_type = unsigned long;
          using value_type = T;
          class iterator;                  // see §20.4.2
          class const_iterator;      // like iterator, but not allowing writes to elements

          // . . .

          iterator begin();
          const_iterator begin() const;
          iterator end();
          const_iterator end() const;

          size_type size();

          // . . .
};

That way, we can write code that does not care whether it uses a list or a vector. All the standard library algorithms are defined in terms of these member type names, such as iterator and size_type, so that they don’t unnecessarily depend on the implementations of containers or exactly which kind of container they operate on (see Chapter 21).

As an alternative to saying C::iterator for some container C, we often prefer Iterator<C>. This can be achieved through a simple template alias:

template<typename C>
using Iterator = typename C::iterator;    // Iterator<C> means typename
                                                                              // C::iterator

The fact that for language-technical reasons we need to prefix C::iterator with typename to say that iterator is a type is part of the reason we prefer Iterator<C>. Similarly, we define

template<typename C>
using Value_type = typename C::value_type;

That way, we can write Value_type<C>. These type aliases are not in the standard library, but you can find them in std_lib_facilities.h.

A using declaration is a C++11 notation for and a generalization of what was known in C and C++ as a typedef (§A.16).

20.5.1 Container traversal

Using size(), we can traverse one of our vectors from its first element to its last. For example:

void print1(const vector<double>& v)
{
          for (int i = 0; i<v.size(); ++i)
                    cout << v[i] << '\n';
}

This doesn’t work for lists because list does not provide subscripting. However, we can traverse a standard library vector and list using the simpler range-for-loop (§4.6.1). For example:

void print2(const vector<double>& v, const list<double>& lst)
{
          for (double x : v)
                    cout << x << '\n';

          for (double x : lst)
                    cout << x << '\n';
}

Image

This works for both the standard library containers and for “our” vector and list. How? The “trick” is that the range-for-loop is defined in terms of begin() and end() functions returning iterators to the first and one beyond the end of our vector elements. The range-for-loop is simply “syntactic sugar” for a loop over a sequence using iterators. When we defined begin() and end() for our vector and list we “accidentally” provided what the range-for needed.

20.5.2 auto

When we have to write out loops over a generic structure, naming the iterators can be a real nuisance. Consider:

template<typename T>           // requires Element<T>()
void user(vector<T>& v, list<T>& lst)
{
          for (vector<T>::iterator p = v.begin(); p!=v.end(); ++p) cout << *p << '\n';

          list<T>::iterator q = find(lst.begin(), lst.end(),T{42});
}

The most annoying aspect of this is that the compiler obviously already knows the iterator type for the list and the size_type for the vector. Why should we have to tell the compiler what it already knows? Doing so just annoys the poor typists among us and opens opportunities for mistakes. Fortunately, we don’t have to: we can declare a variable auto, meaning use the type of the iterator as the type of the variable:

template<typename T>           // requires Element<T>()
void user(vector<T>& v, list<T>& lst)
{
          for (auto p = v.begin(); p!=v.end(); ++p) cout << *p << '\n';

          auto q = find(lst.begin(), lst.end(),T{42});
}

Here, p is a vector<T>::iterator and q is a list<T>::iterator. We can use auto in just about every definition that includes an initializer. For example:

auto x = 123;    // x is an int
auto c = 'y';      // c is a char
auto& r = x;     // r is an int&
auto y = r;        // y is an int (references are implicitly dereferenced)

Note that a string literal has the type const char*, so using auto for string literals might lead to an unpleasant surprise:

auto s1 = "San Antonio";            // s1 is a const char* (Surprise!?)
string s2 = "Fredericksburg";     // s2 is a string

When we know exactly which type we want, we can often say so as easily as we can use auto.

One common use of auto is to specify the loop variable in a range-for-loop. Consider:

template<typename C>              // requires Container<T>
void print3(const C& cont)
{
          for (const auto& x : cont)
                    cout << x << '\n';
}

Here, we use auto because it is not all that easy to name the element type of the container cont. We use const because we are not writing to the container elements, and we use & (for reference) in case the elements are so large that copying them would be costly.

20.6 An example: a simple text editor

Image

The essential feature of a list is that you can add and remove elements without moving other elements of the list. Let’s try a simple example that illustrates that. Consider how to represent the characters of a text document in a simple text editor. The representation should make operations on the document simple and reasonably efficient.

Which operations? Let’s assume that a document will fit in your computer’s main memory. That way, we can choose any representation that suits us and simply convert it to a stream of bytes when we want to store it in a file. Similarly, we can read a stream of bytes from a file and convert those to our in-memory representation. That decided, we can concentrate on choosing a convenient in-memory representation. Basically, there are five things that our representation must support well:

• Constructing it from a stream of bytes from input

• Inserting one or more characters

• Deleting one or more characters

• Searching for a string

• Generating a stream of bytes for output to a file or a screen

The simplest representation would be a vector<char>. However, to add or delete a character we would have to move every following character in the document. Consider:

This is he start of a very long document.
There are lots of . . .

We could add the t needed to get

This is the start of a very long document.
There are lots of . . .

However, if those characters were stored in a single vector<char>, we’d have to move every character from h onward one position to the right. That could be a lot of copying. In fact for a 70,000-character-long document (such as this chapter, counting spaces), we would, on average, have to move 35,000 characters to insert or delete a character. The resulting real-time delay is likely to be noticeable and annoying to users. Consequently, we “break down” our representation into “chunks” so that we can change part of the document without moving a lot of characters around. We represent a document as a list of “lines,” list<Line>, where a Line is a vector<char>. For example:

Image

Now, when we inserted that t, we only had to move the rest of the characters on that line. Furthermore, when we need to, we can add a new line without moving any characters. For example, we could insert This is a new line. after document. to get

This is the start of a very long document.
This is a new line.
There are lots of . . .

All we needed to do was to insert a new “line” in the middle:

Image

The logical reason that it is important to be able to insert new links in a list without moving existing links is that we might have iterators pointing to those links or pointers (and references) pointing to the objects in those links. Such iterators and pointers are unaffected by insertions or deletions of lines. For example, a word processor may keep a vector<list<Line>::iterator> holding iterators to the beginning of every title and subtitle in the current Document:

Image

We can add lines to “paragraph 20.2” without invalidating the iterator to “paragraph 20.3.”

Image

In conclusion, we use a list of lines rather than a vector of lines or a vector of all the characters for both logical and performance reasons. Please note that situations where these reasons apply are rather rare so that the “by default, use vector” rule of thumb still holds. You need a specific reason to prefer a list over a vector — even if you think of your data as a list of elements! (See §20.7.) A list is a logical concept that you can represent in your program as a (linked) list or as a vector. The closest STL analog to our everyday concept of a list (e.g., a to-do list, a list of groceries, or a schedule) is a sequence, and most sequences are best represented as vectors.

20.6.1 Lines

How do we decide what’s a “line” in our document? There are three obvious choices:

1. Rely on newline indicators (e.g., '\n') in user input.

2. Somehow parse the document and use some “natural” punctuation (e.g., .).

3. Split any line that grows beyond a given length (e.g., 50 characters) into two.

There are undoubtedly also some less obvious choices. For simplicity, we use alternative 1 here.

We will represent a document in our editor as an object of class Document. Stripped of all refinements, our document type looks like this:

using Line = vector<char>;             // a line is a vector of characters

struct Document {
          list<Line> line;                        // a document is a list of lines
          Document() { line.push_back(Line{}); }
};

Every Document starts out with a single empty line: Document’s constructor makes an empty line and pushes it into the list of lines.

Reading and splitting into lines can be done like this:

istream& operator>>(istream& is, Document& d)
{
          for (char ch; is.get(ch); ) {
                    d.line.back().push_back(ch);         // add the character
          if (ch=='\n')
                              d.line.push_back(Line{});     // add another line
          }
          if (d.line.back().size()) d.line.push_back(Line{});    // add final empty line
          return is;
}

Both vector and list have a member function back() that returns a reference to the last element. To use it, you have to be sure that there really is a last element for back() to refer to — don’t use it on an empty container. That’s why we defined a Document to end with an empty Line. Note that we store every character from input, even the newline characters ('\n'). Storing those newline characters greatly simplifies output, but you have to be careful how you define a character count (just counting characters will give a number that includes space and newline characters).

20.6.2 Iteration

If the document was just a vector<char> it would be simple to iterate over it. How do we iterate over a list of lines? Obviously, we can iterate over the list using list<Line>::iterator. However, what if we wanted to visit the characters one after another without any fuss about line breaks? We could provide an iterator specifically designed for our Document:

class Text_iterator {      // keep track of line and character position within a line
          list<Line>::iterator ln;
          Line::iterator pos;
public:
          // start the iterator at line ll’s character position pp:
          Text_iterator(list<Line>::iterator ll, Line::iterator pp)
                    :ln{ll}, pos{pp} { }

          char& operator*() { return *pos; }
          Text_iterator& operator++();

          bool operator==(const Text_iterator& other) const
                    { return ln==other.ln && pos==other.pos; }
          bool operator!=(const Text_iterator& other) const
                    { return !(*this==other); }
};

Text_iterator& Text_iterator::operator++()
{
          ++pos;                                       // proceed to next character
          if (pos==(*ln).end()) {
                    ++ln;                                // proceed to next line
                    pos = (*ln).begin();       // bad if ln==line.end(); so make sure it isn’t
          }
          return *this;
}

To make Text_iterator useful, we need to equip class Document with conventional begin() and end() functions:

struct Document {
          list<Line> line;

          Text_iterator begin()               // first character of first line
                    { return Text_iterator(line.begin(), (*line.begin()).begin()); }
          Text_iterator end()                 // one beyond the last character of the last line
          {
                    auto last = line.end();
                    ––last;          // we know that the document is not empty
                    return Text_iterator(last, (*last).end());
          }
};

We need the curious (*line.begin()).begin() notation because we want the beginning of what line.begin() points to; we could alternatively have used line.begin()–> begin() because the standard library iterators support –>.

We can now iterate over the characters of a document like this:

void print(Document& d)
{
          for (auto p : d) cout << *p;
}

print(my_doc);

Presenting the document as a sequence of characters is useful for many things, but usually we traverse a document looking for something more specific than a character. For example, here is a piece of code to delete line n:

void erase_line(Document& d, int n)
{
          if (n<0 || d.line.size()–1<=n) return;
          auto  p = d.line.begin();
          advance(p,n);
          d.line.erase(p);
}

A call advance(p,n) moves an iterator p n elements forward; advance() is a standard library function, but we could have implemented it ourselves like this:

template<typename Iter>     // requires Forward_iterator<Iter>
void advance(Iter& p, int n)
{
          while (0<n) { ++p; ––n; }
}

Note that advance() can be used to simulate subscripting. In fact, for a vector called v, p=v.begin; advance(p,n); *p=x is roughly equivalent to v[n]=x. Note that “roughly” means that advance() laboriously moves past the first n–1 elements one by one, whereas the subscript goes straight to thenth element. For a list, we have to use the laborious method. It’s a price we have to pay for the more flexible layout of the elements of a list.

Image

For an iterator that can move both forward and backward, such as the iterator for list, a negative argument to the standard library advance() will move the iterator backward. For an iterator that can handle subscripting, such as the iterator for a vector, the standard library advance() will go directly to the right element rather than slowly moving along using ++. Clearly, the standard library advance() is a bit smarter than ours. That’s worth noticing: typically, the standard library facilities have had more care and time spent on them than we could afford, so prefer the standard facilities to “home brew.”


Image Try This

Rewrite advance() so that it will “go backward” when you give it a negative argument.


Probably, a search is the kind of iteration that is most obvious to a user. We search for individual words (such as milkshake or Gavin), for sequences of letters that can’t easily be considered words (such as secret\nhomestead — i.e., a line ending with secret followed by a line starting withhomestead), for regular expressions (e.g., [bB]\w*ne — i.e., an upper- or lowercase B followed by 0 or more letters followed by ne; see Chapter 23), etc. Let’s show how to handle the second case, finding a string, using our Document layout. We use a simple — non-optimal — algorithm:

• Find the first character of our search string in the document.

• See if that character and the following characters match our search string.

• If so, we are finished; if not, we look for the next occurrence of that first character.

For generality, we adopt the STL convention of defining the text in which to search as a sequence defined by a pair of iterators. That way we can use our search function for any part of a document as well as a complete document. If we find an occurrence of our string in the document, we return an iterator to its first character; if we don’t find an occurrence, we return an iterator to the end of the sequence:

Text_iterator find_txt(Text_iterator first, Text_iterator last, const string& s)
{
          if (s.size()==0) return last;      // can’t find an empty string
          char first_char = s[0];
          while (true) {
                    auto p = find(first,last,first_char);
                    if (p==last || match(p,last,s)) return p;
                    first = ++p;                     // look at the next character
          }
}

Returning the end of the sequence to indicate “not found” is an important STL convention. The match() function is trivial; it just compares two sequences of characters. Try writing it yourself. The find() used to look for a character in the sequence of characters is arguably the simplest standard library algorithm (§21.2). We can use our find_txt() like this:

auto p = find_txt(my_doc.begin(), my_doc.end(), "secret\nhomestead");
if (p==my_doc.end())
          cout << "not found";
else {
          // do something
}

Our “text processor” and its operations are very simple. Obviously, we are aiming for simplicity and reasonable efficiency, rather than at providing a “feature-rich” editor. Don’t be fooled into thinking that providing efficient insertion, deletion, and search for arbitrary character sequences is trivial, though. We chose this example to illustrate the power and generality of the STL concepts sequence, iterator, and container (such as list and vector) together with some STL programming conventions (techniques), such as returning the end of a sequence to indicate failure. Note that if we wanted to, we could develop Document into an STL container — by providing Text_iterator we have done the key part of representing a Document as a sequence of values.

20.7 vectorlist, and string

Why did we use a list for the lines and a vector for the characters? More precisely, why did we use a list for the sequence of lines and a vector for the sequence of characters? Furthermore, why didn’t we use a string to hold a line?

We can ask a slightly more general variant of this question. We have now seen four ways to store a sequence of characters:

• char[] (array of characters)

• vector<char>

• string

• list<char>

Image

How do we choose among them for a given problem? For really simple tasks, they are interchangeable; that is, they have very similar interfaces. For example, given an iterator, we can walk through each using ++ and use * to access the characters. If we look at the code examples related toDocument, we can actually replace our vector<char> with list<char> or string without any logical problems. Such interchangeability is fundamentally good because it allows us to choose based on performance. However, before we consider performance, we should look at logical properties of these types: what can each do that the others can’t?

Image

• Elem[]: Doesn’t know its own size. Doesn’t have begin(), end(), or any of the other useful container member functions. Can’t be systematically range checked. Can be passed to functions written in C and C-style functions. The elements are allocated contiguously in memory. The size of the array is fixed at compile time. Comparison (== and !=) and output (<<) use the pointer to the first element of the array, not the elements.

• vector<Elem>: Can do just about everything, including insert() and erase(). Provides subscripting. List operations, such as insert() and erase(), typically involve moving elements (that can be inefficient for large elements and large numbers of elements). Can be range checked. The elements are allocated contiguously in memory. A vector is expandable (e.g., use push_back()). Elements of a vector are stored (contiguously) in an array. Comparison operators (==, !=, <, <=, >, and >=) compare elements.

• string: Provides all the common and useful operations plus specific text manipulation operations, such as concatenation (+ and +=). The elements are guaranteed to be contiguous in memory. A string is expandable. Comparison operators (==, !=, <, <=, >, and >=) compare elements.

• list<Elem>: Provides all the common and usual operations, except subscripting. We can insert() and erase() without moving other elements. Needs two words extra (for link pointers) for each element. A list is expandable. Comparison operators (==, !=, <, <=, >, and >=) compare elements.

Image

As we have seen (§17.2, §18.6), arrays are useful and necessary for dealing with memory at the lowest possible level and for interfacing with code written in C (§27.1.2, §27.5). Apart from that, vector is preferred because it is easier to use, more flexible, and safer.


Image Try This

What does that list of differences mean in real code? For each array of char, vector<char>, list<char>, and string, define one with the value "Hello", pass it to a function as an argument, write out the number of characters in the string passed, try to compare it to "Hello" in that function (to see if you really did pass "Hello"), and compare the argument to "Howdy" to see which would come first in a dictionary. Copy the argument into another variable of the same type.



Image Try This

Do the previous Try this for an array of int, vector<int>, and list<int> each with the value { 1, 2, 3, 4, 5 }.


20.7.1 insert and erase

Image

The standard library vector is our default choice for a container. It has most of the desired features, so we use alternatives only if we have to. Its main problem is its habit of moving elements when we do list operations (insert() and erase()); that can be costly when we deal with vectors with many elements or vectors of large elements. Don’t be too worried about that, though. We have been quite happy reading half a million floating-point values into a vector using push_back() — measurements confirmed that pre-allocation didn’t make a noticeable difference. Always measure before making significant changes in the interest of performance; even for experts, guessing about performance is very hard.

Image

As pointed out in §20.6, moving elements also implies a logical constraint: don’t hold iterators or pointers to elements of a vector when you do list operations (such as insert(), erase(), and push_back()): if an element moves, your iterator or pointer will point to the wrong element or to no element at all. This is the principal advantage of lists (and maps; see §21.6) over vectors. If you need a collection of large objects or of objects that you point to from many places in a program, consider using a list.

Let’s compare insert() and erase() for a vector and a list. First we take an example designed only to illustrate the key points:

Image

Note that q is now invalid. The elements may have been reallocated as the size of the vector grew. If v had spare capacity, so that it grew in place, q most likely points to the element with the value 3 rather than the element with the value 4, but don’t try to take advantage of that.

Image

That is, an insert() followed by an erase() of the inserted element leaves us back where we started, but with q invalidated. However, in between, we moved all the elements after the insertion point, and maybe all elements were relocated as v grew.

To compare, we’ll do exactly the same with a list:

Image

Note that q still points to the element with the value 4.

Image

Again we find ourselves back where we started. However, for list as opposed to for vector, we didn’t move any elements and q was valid at all times.

A list<char> takes up at least three times as much memory as the other three alternatives — on a PC a list<char> uses 12 bytes per element; a vector<char> uses 1 byte per element. For large numbers of characters, that can be significant.

Image

In what way is a vector superior to a string? Looking at the lists of their properties, it seems that a string can do all that a vector can, and more. That’s part of the problem: since string has to do more things, it is harder to optimize. In fact, vector tends to be optimized for “memory operations” such as push_back(), whereas string tends not to be. Instead, string tends to be optimized for handling of copying, for dealing with short strings, and for interaction with C-style strings. In the text editor example, we chose vector because we were using insert() and delete(). That is a performance reason, though. The major logical difference is that you can have a vector of just about any element type. We have a choice only when we are thinking about characters. In conclusion, prefer vector to string unless you need string operations, such as concatenation or reading whitespace-separated words.

20.8 Adapting our vector to the STL

After adding begin(), end(), and the type aliases in §20.5, vector now just lacks insert() and erase() to be as close an approximation of std::vector as we need it to be:

template<typename T, typename A = allocator<T>>
          // requires Element<T>() && Allocator<A>() (§19.3.3)
class vector {
          int sz;              // the size
          T* elem;         // a pointer to the elements
          int space;       // number of elements plus number of free space “slots”
          A alloc;           // use allocate to handle memory for elements
public:
          // . . . all the other stuff from Chapter 19 and §20.5 . . .
          using iterator = T*;      // T* is the simplest possible iterator

          iterator insert(iterator p, const T& val);
          iterator erase(iterator p);
};

We again used a pointer to the element type, T*, as the iterator type. That’s the simplest possible solution. We left providing a range-checked iterator as an exercise (exercise 18).

Image

Typically, people don’t provide list operations, such as insert() and erase(), for data types that keep their elements in contiguous storage, such as vector. However, list operations, such as insert() and erase(), are immensely useful and surprisingly efficient for short vectors or small numbers of elements. We have repeatedly seen the usefulness of push_back(), which is another operation traditionally associated with lists.

Basically, we implement vector<T,A>::erase() by copying all elements after the element we erase (remove, delete). Using the definition of vector from §19.3.7 with the additions above, we get

template<typename T, typename A>            // requires Element<T>() &&
                                                                              // Allocator<A>() (§19.3.3)
vector<T,A>::iterator vector<T,A>::erase(iterator p)
{
          if (p==end()) return p;
          for (auto pos = p+1; pos!=end(); ++pos)
                    *(pos–1) = *pos;                 // copy element “one position to the left”
          alloc.destroy(&*(end()–1));        // destroy surplus copy of last element
          ––sz;
          return p;
}

It is easier to understand such code if you look at a graphical representation:

Image

The code for erase() is quite simple, but it may be a good idea to try out a couple of examples by drawing them on paper. Is the empty vector correctly handled? Why do we need the p==end() test? What if we erased the last element of a vector? Would this code have been easier to read if we had used the subscript notation?

Implementing vector<T,A>::insert() is a bit more complicated:

template<typename T, typename A>             // requires Element<T>() &&
                                                                                // Allocator<A>() (§19.3.3)
vector<T,A>::iterator vector<T,A>::insert(iterator p, const T& val)
{
          int index = p–begin();
          if (size()==capacity())
                    reserve(size()==0?8:2*size());   // make sure we have space

          // first copy last element into uninitialized space:
          alloc.construct(elem+sz,*back());

          ++sz;
          iterator pp = begin()+index;      // the place to put val
          for (auto pos = end()–1; pos!=pp; ––pos)
                    *pos = *(pos–1);                // copy elements one position to the right
          *(begin()+index) = val;               // “insert” val
          return pp;
}

Please note:

• An iterator may not point outside its sequence, so we use pointers, such as elem+sz, for that. That’s one reason that allocators are defined in terms of pointers and not iterators.

• When we use reserve(), the elements may be moved to a new area of memory. Therefore, we must remember the index at which the element is to be inserted, rather than the iterator to it. When vector reallocates its elements, iterators into that vector become invalid — you can think of them as pointing to the old memory.

• Our use of the allocator argument, A, is intuitive, but inaccurate. If you should ever need to implement a container, you’ll have to do some careful reading of the standard.

• It is subtleties like these that make us avoid dealing with low-level memory issues whenever we can. Naturally, the standard library vector — and all other standard library containers — get that kind of important semantic detail right. That’s one reason to prefer the standard library over “home brew.”

For performance reasons, you wouldn’t use insert() and erase() in the middle of a 100,000-element vector; for that, lists (and maps; see §21.6) are better. However, the insert() and erase() operations are available for all vectors, and their performance is unbeatable when you are just moving a few words of data — or even a few dozen words — because modern computers are really good at this kind of copying; see exercise 20. Avoid (linked) lists for representing a list of a few small elements.

20.9 Adapting built-in arrays to the STL

We have repeatedly pointed out the weaknesses of the built-in arrays: they implicitly convert to pointers at the slightest provocation, they can’t be copied using assignment, they don’t know their own size (§18.6.2), etc. We have also pointed out their main strength: they model physical memory almost perfectly.

To get the best of both worlds, we can build an array container that provides the benefits of arrays without the weaknesses. A version of array was introduced into the standard as part of a Technical Report. Since a feature from a TR is not required to be part of every implementation, arraymay not be part of the implementation you use. However, the idea is simple and useful:

Image

template <typename T, int N>                        // requires Element<T>()
struct array {                                                        // not quite the standard array
          using value_type = T;
          using iterator = T*;
          using const_iterator = const T*;
          using size_type = unsigned int;           // the type of a subscript

          T elems[N];
          // no explicit construct/copy/destroy needed

          iterator begin() { return elems; }
          const_iterator begin() const { return elems; }
          iterator end() { return elems+N; }
          const_iterator end() const { return elems+N; }

          size_type size() const;

          T& operator[](int n) { return elems[n]; }
          const T& operator[](int n) const { return elems[n]; }

          const T& at(int n) const;                       // range-checked access
          T& at(int n);                                             // range-checked access

          T * data() { return elems; }
          const T * data() const { return elems; }
};

This definition isn’t complete or completely standards-conforming, but it will give you the idea. It will also give you something to use if your implementation doesn’t yet provide the standard array. If available, it is in <array>. Note that because array<T,N> “knows” that its size is N, we can (and do) provide assignment, ==, !=, etc. just as for vector.

As an example, let’s use an array with the STL version of high() from §20.4.2:

void f()
{
          array<double,6> a = { 0.0, 1.1, 2.2, 3.3, 4.4, 5.5 };
          array<double,6>::iterator p = high(a.begin(), a.end());
          cout << "the highest value was " << *p << '\n';
}

Note that we did not think of array when we wrote high(). Being able to use high() for an array is a simple consequence of following standard conventions for both.

20.10 Container overview

The STL provides quite a few containers:

Image

Image

You can look up incredible amounts of additional information on these containers and their use in books and online documentation. Here are a few quality information sources:

Josuttis, Nicholai M. The C++ Standard Library: A Tutorial and Reference. Addison-Wesley, 2012. ISBN 978-0321623218. Use only the 2nd edition.

Lippman, Stanley B., Jose Lajoie, and Barbara E. Moo. The C++ Primer. Addison-Wesley, 2005. ISBN 0201721481. Use only the 5th edition.

Stroustrup, Bjarne. The C++ Programming Language. Addison-Wesley, 2012. ISBN 978-0321714114. Use only the 4th edition.

The documentation for the SGI implementation of the STL and the iostream library: www.sgi.com/tech/stl. Note that they also provide complete code.

Image

Do you feel cheated? Do you think we should explain all about containers and their use to you? That’s just not possible. There are too many standard facilities, too many useful techniques, and too many useful libraries for you to absorb them all at once. Programming is too rich a field for anyone to know all facilities and techniques — it can also be a noble art. As a programmer, you must acquire the habit of seeking out new information about language facilities, libraries, and techniques. Programming is a dynamic and rapidly developing field, so just being content with what you know and are comfortable with is a recipe for being left behind. “Look it up” is a perfectly reasonable answer to many problems, and as your skills grow and mature, it will more and more often be the answer.

On the other hand, you will find that once you understand vector, list, and map and the standard algorithms presented in Chapter 21, you’ll find other STL and STL-style containers easy to use. You’ll also find that you have the basic knowledge to understand non-STL containers and code using them.

Image

What is a container? You can find the definition of an STL container in all of the sources above. Here we will just give an informal definition. An STL container

• Is a sequence of elements [begin():end()).

• Provides copy operations that copy elements. Copying can be done with assignment or a copy constructor.

• Names its element type value_type.

• Has iterator types called iterator and const_iterator. Iterators provide *, ++ (both prefix and postfix), ==, and != with the appropriate semantics. The iterators for list also provide –– for moving backward in the sequence; that’s called a bidirectional iterator. The iterators for vector also provide ––, [ ], +, and — and are called random-access iterators. (See §20.10.1.)

• Provides insert() and erase(), front() and back(), push_back() and pop_back(), size(), etc.; vector and map also provide subscripting (e.g., operator [ ]).

• Provides comparison operators (==, !=, <, <=, >, and >=) that compare the elements. Containers use lexicographical ordering for <, <=, >, and >=; that is, they compare the elements in order starting with the first.

The aim of this list is to give you an overview. For more detail see Appendix B. For a more precise specification and complete list, see The C++ Programming Language or the standard.

Some data types provide much of what is required from a standard container, but not all. We sometimes refer to those as “almost containers.” The most interesting of those are:

Image

In addition, many people and many organizations have produced containers that meet the standard container requirements, or almost do so.

Image

If in doubt, use vector. Unless you have a solid reason not to, use vector.

20.10.1 Iterator categories

We have talked about iterators as if all iterators are interchangeable. They are interchangeable if you do only the simplest operations, such as traversing a sequence once reading each value once. If you want to do more, such as iterating backward or subscripting, you need one of the more advanced iterators.

Image

From the operations offered, we can see that wherever we can use an output iterator or an input iterator, we can use a forward iterator. A bidirectional iterator is also a forward iterator, and a random-access iterator is also a bidirectional iterator. Graphically, we can represent the iterator categories like this:

Image

Note that since the iterator categories are not classes, this hierarchy is not a class hierarchy implemented using derivation.

Image Drill

1. Define an array of ints with the ten elements { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }.

2. Define a vector<int> with those ten elements.

3. Define a list<int> with those ten elements.

4. Define a second array, vector, and list, each initialized as a copy of the first array, vector, and list, respectively.

5. Increase the value of each element in the array by 2; increase the value of each element in the vector by 3; increase the value of each element in the list by 5.

6. Write a simple copy() operation,

template<typename Iter1, typename Iter2>
          // requires Input_iterator<Iter1>() && Output_iterator<Iter2>()
Iter2 copy(Iter1 f1, Iter1 e1, Iter2 f2);

that copies [f1,e1) to [f2,f2+(e1–f1)) and returns f2+(e1–f1) just like the standard library copy function. Note that if f1==e1 the sequence is empty, so that there is nothing to copy.

7. Use your copy() to copy the array into the vector and to copy the list into the array.

8. Use the standard library find() to see if the vector contains the value 3 and print out its position if it does; use find() to see if the list contains the value 27 and print out its position if it does. The “position” of the first element is 0, the position of the second element is 1, etc. Note that iffind() returns the end of the sequence, the value wasn’t found.

Remember to test after each step.

Review

1. Why does code written by different people look different? Give examples.

2. What are simple questions we ask of data?

3. What are a few different ways of storing data?

4. What basic operations can we do to a collection of data items?

5. What are some ideals for the way we store our data?

6. What is an STL sequence?

7. What is an STL iterator? What operations does it support?

8. How do you move an iterator to the next element?

9. How do you move an iterator to the previous element?

10. What happens if you try to move an iterator past the end of a sequence?

11. What kinds of iterators can you move to the previous element?

12. Why is it useful to separate data from algorithms?

13. What is the STL?

14. What is a linked list? How does it fundamentally differ from a vector?

15. What is a link (in a linked list)?

16. What does insert() do? What does erase() do?

17. How do you know if a sequence is empty?

18. What operations does an iterator for a list provide?

19. How do you iterate over a container using the STL?

20. When would you use a string rather than a vector?

21. When would you use a list rather than a vector?

22. What is a container?

23. What should begin() and end() do for a container?

24. What containers does the STL provide?

25. What is an iterator category? What kinds of iterators does the STL offer?

26. What operations are provided by a random-access iterator, but not a bidirectional iterator?

Terms

algorithm

array container

auto

begin()

container

contiguous

doubly-linked list

element

empty sequence

end()

erase()

insert()

iteration

iterator

linked list

sequence

singly-linked list

size_type

STL

traversal

using

type alias

value_type

Exercises

1. If you haven’t already, do all Try this exercises in the chapter.

2. Get the Jack-and-Jill example from §20.1.2 to work. Use input from a couple of small files to test it.

3. Look at the palindrome examples (§18.7); redo the Jack-and-Jill example from §20.1.2 using that variety of techniques.

4. Find and fix the errors in the Jack-and-Jill example from §20.3.1 by using STL techniques throughout.

5. Define an input and an output operator (>> and <<) for vector.

6. Write a find-and-replace operation for Documents based on §20.6.2.

7. Find the lexicographical last string in an unsorted vector<string>.

8. Define a function that counts the number of characters in a Document.

9. Define a program that counts the number of words in a Document. Provide two versions: one that defines word as “a whitespace-separated sequence of characters” and one that defines word as “a sequence of consecutive alphabetic characters.” For example, with the former definition,alpha.numeric and as12b are both single words, whereas with the second definition they are both two words.

10. Define a version of the word-counting program where the user can specify the set of whitespace characters.

11. Given a list<int> as a (by-reference) parameter, make a vector<double> and copy the elements of the list into it. Verify that the copy was complete and correct. Then print the elements sorted in order of increasing value.

12. Complete the definition of list from §20.4.12 and get the high() example to run. Allocate a Link to represent one past the end.

13. We don’t really need a “real” one-past-the-end Link for a list. Modify your solution to the previous exercise to use 0 to represent a pointer to the (nonexistent) one-past-the-end Link (list<Elem>::end()); that way, the size of an empty list can be equal to the size of a single pointer.

14. Define a singly-linked list, slist, in the style of std::list. Which operations from list could you reasonably eliminate from slist because it doesn’t have back pointers?

15. Define a pvector to be like a vector of pointers except that it contains pointers to objects and its destructor deletes each object.

16. Define an ovector that is like pvector except that the [ ] and * operators return a reference to the object pointed to by an element rather than the pointer.

17. Define an ownership_vector that hold pointers to objects like pvector but provides a mechanism for the user to decide which objects are owned by the vector (i.e., which objects are deleted by the destructor). Hint: This exercise is simple if you were awake for Chapter 13.

18. Define a range-checked iterator for vector (a random-access iterator).

19. Define a range-checked iterator for list (a bidirectional iterator).

20. Run a small timing experiment to compare the cost of using vector and list. You can find an explanation of how to time a program in §26.6.1. Generate N random int values in the range [0:N). As each int is generated, insert it into a vector<int> (which grows by one element each time). Keep the vector sorted; that is, a value is inserted after every previous value that is less than or equal to the new value and before every previous value that is larger than the new value. Now do the same experiment using a list<int> to hold the ints. For which N is the list faster than thevector? Try to explain your result. This experiment was first suggested by John Bentley.

Postscript

Image

If we have N kinds of containers of data and M things we’d like to do with them, we can easily end up writing N*M pieces of code. If the data is of K different types, we could even end up with N*M*K pieces of code. The STL addresses this proliferation by having the element type as a parameter (taking care of the K factor) and by separating access to data from algorithms. By using iterators to access data in any kind of container from any algorithm, we can make do with N+M algorithms. This is a huge simplification. For example, if we have 12 containers and 60 algorithms, the brute-force approach would require 720 functions, whereas the STL strategy requires only 60 functions and 12 definitions of iterators: we just saved ourselves 90% of the work. In fact, this underestimates the saved effort because many algorithms take two pairs of iterators and the pairs need not be of the same type (e.g., see exercise 6). In addition, the STL provides conventions for defining algorithms that simplify writing correct code and composable code, so the saving is greater still.