Template functions

In the previous lecture I constructed code for a sorting function that used pointers to sort data in an array.

// Sort the items between begin and end.
// end should point just past the end of the
// range of values being sorted.
void sort(int *begin,int *end)
{
  // Sort the array using insertion sort
  int *mover = begin; // mover points to the element being inserted.
  mover++;
  while(mover != end) {
    int value = *mover; // The value of the item being inserted
    // The current pointer will scan through the part of the array
    // that appears before the mover location.
    int *current = mover;
    while(current != begin) {
      // previous points to the location before current
      int *previous = current;
      previous--;
      // As long as we keep seeing numbers that are greater than
      // the value we are inserting, we will keep moving things
      // aside to make room for the value we want to insert.
      if(*previous > value) {
        *current = *previous;
        current--;
      } else
        break;
    }
    // Drop the value into place and move on to the next
    // item to insert.
    *current = value;
    mover++;
  }
}

One limitation of this code is that all of the pointers are declared to be pointers to int. This means that if we wanted to use this code to sort an array of double instead we would be out of luck. To make the sorting code more flexible we can turn this function into a template function.

// Sort the items between begin and end.
// end should point just past the end of the
// range of values being sorted.
template <typename T>
void sort(T *begin,T *end)
{
  // Sort the array using insertion sort
  T *mover = begin; // mover points to the element being inserted.
  mover++;
  while(mover != end) {
    T value = *mover; // The value of the item being inserted
    // The current pointer will scan through the part of the array
    // that appears before the mover location.
    T *current = mover;
    while(current != begin) {
      // previous points to the location before current
      T *previous = current;
      previous--;
      // As long as we keep seeing numbers that are greater than
      // the value we are inserting, we will keep moving things
      // aside to make room for the value we want to insert.
      if(*previous > value) {
        *current = *previous;
        current--;
      } else
        break;
    }
    // Drop the value into place and move on to the next
    // item to insert.
    *current = value;
    mover++;
  }
}

The declaration

template <typename T>

marks this function as a template function. In a template function, we replace one or more types with a placeholder. In this case, the replace the int type with a placeholder, T. When we subsequently use the function in a program, the compiler will examine the type of the pointers we are passing to the sort function to determine what type to use for the placeholder T.

For example, if we have an array B that has been declared

double *B = new double[100];

and we call

sort(B,B+100);

the compiler can determine automatically that it should replace the T with double everywhere that T appears in the code for the templated sort function.

The template mechanism makes it possible to sort several different types of arrays in the same program. For example, if we are writing a program with arrays

int *A = new int[100];
double *B = new double[100];

it will be perfectly legal to later do

sort(A,A+100);
sort(B,B+100);

In the first case, the compiler will automatically generate a version of the sort function to sort an array of ints. In the second case, it will compile a version that can sort an array of doubles.

Pointers and Iterators

We saw in the previous lecture that it is possible to use pointers to implement an algorithm on an array. To make this possible, pointers have to support a set of key operations: moving with ++ or --, comparison, and dereferencing. Once we recognize that pointers allow all of these operations, we can recast almost any algorithm that uses traditional index notation such as A[n] into an algorithm that does the same thing with pointers.

This idea of using pointers to work with arrays can potentially be extended to work with data structures that are not arrays. Since we are going to be studying a variety of different data structures in this course, it will be worth our while to extend this pointer concept to other, non-array data structures. The main impediment to doing this is that the ++ and -- operators which move a pointer forward or backward through the elements of an array may not have a natural equivalent in other data structures. In particular, if we are dealing with a nonlinear data structure, one in which the elements are not arranged in a linear sequence, the ++ and -- operations may no longer make sense.

In order to bring the pointer concept to a wider variety of data structures, C++ uses a generalization of the pointer concept called an iterator. An iterator is typically a class which is closely associated with a particular data structure that acts a generalization of the pointer concept for that data structure. At any given instant in time, an iterator "points to" a specific element of the underlying data structure. A valid iterator has to support a core set of operations:

  1. The data structure has to provide some mechanism to produce iterators. Typically, a data structure will offer one function that returns an iterator that "points to" the start of the data structure and a second function that returns an iterator that "points" just past the end of the data structure.
  2. Dereferencing an iterator with * should return the element that the iterator currently points to.
  3. The iterator can be advanced forward one item in the underlying data structure with ++, and moved backward one item with --.
  4. Iterators can be compared with == and !=. The == operator returns true only if the two iterators we apply it to both point to the same element of the same data structure.

Rewriting the sorting algorithm for iterators

Here now is a version of the sort algorithm from above that is designed to work with iterators instead of pointers.

template <typename iter>
void sort(iter begin,iter end)
{
  // Sort the array using insertion sort
  iter mover = begin; // mover points to the element being inserted.
  mover++;
  while(mover != end) {
    // The value of the item being inserted
    auto value = *mover; 
    // The current pointer will scan through the part of the array
    // that appears before the mover location.
    iter current = mover;
    while(current != begin) {
      // previous points to the location before current
      iter previous = current;
      previous--;
      // As long as we keep seeing numbers that are greater than
      // the value we are inserting, we will keep moving things
      // aside to make room for the value we want to insert.
      if(*previous > value) {
        *current = *previous;
        current--;
      } else
        break;
    }
    // Drop the value into place and move on to the next
    // item to insert.
    *current = value;
    mover++;
  }
}

For the most part, this code looks very much like the original. This is natural, because iterators are a generalization of pointers, and almost everything we can do with a pointer we can also do with an iterator.

There is only one significant difference between these two versions. At the beginning of the while loop in the earlier version we see a statement

T value = *mover; // The value of the item being inserted

In this statement we are dereferencing the pointer mover to gain access to the item that the pointer is pointing to. We then store that value in a variable named value for later use. Notice that the type of the variable is T, which is the type that the pointers in question are designed to point to. (Notice that begin, end, and mover are all declared to have type T*, which means "pointer to T".)

The equivalent statement in the second version is

auto value = *mover;

Once again we are dereferencing the iterator mover in an effort to gain access to the value that the iterator points to. Once we obtain that value, we want to store in a variable named value. The complication this time around is that we don't have immediate access to the correct type to use for the value variable. Because a pointer will be declared as T*, we can see directly from the pointer's type declaration that the pointer points to an item of type T. In the case of an iterator all we know is that we have an iterator, but we have no immediate information to tell us what type of thing the iterator points to. Fortunately, this is something we can get the compiler to figure out for us. Starting with C++11, the C++ language now features the auto variable type declaration. In an auto type declaration you essentially say to the compiler "I don't know what the type of this thing is, so please deduce its type for me."

All modern C++ compilers support the C++11 standard, and should be able to compile this code successfully. In case you are using a compiler that does not support this standard, you can use a more elaborate mechanism to deduce that type, the iterator_traits class. Given an iterator type, iterator_traits can deduce useful information about the iterator, including the value type that the iterator points to. Here is how to use that approach to declare the value variable:

typename std::iterator_traits<iter>::value_type value = *mover;

Beyond that one statement, the rest of the code in the iterator version of insertion sort looks exactly the same as the pointer version it is based on.

Since iterators are a generalization of pointers, the first and easiest way to test a piece of code that was written to work with iterators is to use it with pointers. Here is a complete program that shows the iterator version of insertion sort being used in combination with an array and pointers into that array.

#include <iostream>
#include <fstream>

// Sort the items between begin and end.
// end should point just past the end of the
// range of values being sorted.
template <typename iter>
void sort(iter begin,iter end)
{
  // Sort the array using insertion sort
  iter mover = begin; // mover points to the element being inserted.
  mover++;
  while(mover != end) {
    auto value = *mover; // The value of the item being inserted
    // The current pointer will scan through the part of the array
    // that appears before the mover location.
    iter current = mover;
    while(current != begin) {
      // previous points to the location before current
      iter previous = current;
      previous--;
      // As long as we keep seeing numbers that are greater than
      // the value we are inserting, we will keep moving things
      // aside to make room for the value we want to insert.
      if(*previous > value) {
        *current = *previous;
        current--;
      } else
        break;
    }
    // Drop the value into place and move on to the next
    // item to insert.
    *current = value;
    mover++;
  }
}

int main(int argc,const char* argv[])
{
  std::ifstream in;
  std::ofstream out;

  int N = 100;
  int n = 0;
  int k;
  int* A = new int[N];

  in.open("numbers.txt");

  int x;
  while(in >> x) {
    // Resize the A array if needed.
    if(n == N) {
      int* B = new int[2*N];
      for(k = 0;k < n;k++)
        B[k] = A[k];
      delete[] A;
      A = B;
      N *= 2;
    }
    A[n] = x;
    n++;
  }
  in.close();

  std::cout << "Read " << n << " integers from the file." << std::endl;

  sort(A,A+n);

  out.open("sorted.txt");
  for(k = 0;k < n;k++)
    out << A[k] << std::endl;
  out.close();

  delete[] A;

  std::cout << "Done sorting and saving." << std::endl;

  return 0;
}

The key line in main is

sort(A,A+n);

in this statement A and A+n are both pointers. A points to the beginning of the array in question, while A+n points just past the end of the array. Since these two pointers are also interators, it is perfectly legal to pass them as parameters to our iterator sort function.

The vector class

In Java we worked with the ArrayList class. The ArrayList class is essentially a fancy array that can resize itself automatically. The C++ equivalent of the Java ArrayList is the vector class. A vector is essentially a flexible array that can resize itself automatically to accomodate new items.

Here are some of the key features of the C++ vector class.

  1. To use the vector class we use the include statement
  2. #include <vector>

  3. To declare an empty vector of integers named v we say
  4. std::vector<int> v;

  5. To declare a vector of integers with room for 100 ints we say
  6. std::vector<int> v(100);

  7. To declare a vector of integers with room for 100 ints and also initialize each entry to have value 0 we say
  8. std::vector<int> v(100,0);

  9. To access an entry in a vector we use the traditional array bracket notation.
  10. v[0] = 12;

  11. To determine the size of a vector we use the size() method.
  12. for(n = 0;n < v.size();n++) std::cout << v[n] << std::endl;

  13. To push a new entry onto the back of a vector (and automatically resize the vector to be one larger than before) we use the push_back method.
  14. v.push_back(20);

  15. To obtain an iterator that points to the first entry in a vector we use the begin() method.
  16. To obtain an iterator that points just past the last entry in a vector we use the end() method.

Here now is a version of our sorting program that uses a vector to store the data.

#include <iostream>
#include <fstream>
#include <vector>

// Sort the items between begin and end.
// end should point just past the end of the
// range of values being sorted.
template <typename iter>
void sort(iter begin,iter end)
{
  // Sort the array using insertion sort
  iter mover = begin; // mover points to the element being inserted.
  mover++;
  while(mover != end) {
    auto value = *mover; // The value  of the item being inserted
    // The current pointer will scan through the part of the array
    // that appears before the mover location.
    iter current = mover;
    while(current != begin) {
      // previous points to the location before current
      iter previous = current;
      previous--;
      // As long as we keep seeing numbers that are greater than
      // the value we are inserting, we will keep moving things
      // aside to make room for the value we want to insert.
      if(*previous > value) {
        *current = *previous;
        current--;
      } else
        break;
    }
    // Drop the value into place and move on to the next
    // item to insert.
    *current = value;
    mover++;
  }
}

int main(int argc,const char* argv[])
{
  std::ifstream in;
  std::ofstream out;

  std::vector<int> v;

  in.open("numbers.txt");

  int x;
  while(in >> x) {
    v.push_back(x);
  }
  in.close();

  std::cout << "Read " << v.size() 
            << " integers from the file." << std::endl;

  sort(v.begin(),v.end());

  out.open("sorted.txt");
  for(unsigned int k = 0;k < v.size();k++)
    out << v[k] << std::endl;
  out.close();

  std::cout << "Done sorting and saving." << std::endl;

  return 0;
}

The program creates an initially empty vector of ints and uses the push_back method to push each new int we read from the input file onto the back of the v vector. The v vector will automatically resize to have exactly the size needed to accomodate all the numbers.

The sort function used here is exactly the same as in the last program. Since that sort function was engineered to work with iterators, it can interface seamlessly with the vector class:

sort(v.begin(),v.end());

This statement obtains a pair of iterators, one pointing to the start of the vector and the second pointing just past the end, and passes those iterators to the sort function, which is expecting to receive a pair of iterators.

Using the STL sort function

The C++ vector class is a part of the C++ Standard Template Library, or STL. The STL contains a number of classes that implement useful data structures. We will be studying many of these data structure classes as we progress through the course.

In addition to data structures, the STL also offers a collection of standard algorithms that are designed to work with the STL data structures. For example, the STL algorithms collection has a sort function that can be used to sort data stored in STL data structures. The STL version of the sort function works exactly the same as the sort function I demonstrated above. Given a pair of iterators, one pointing to the start of a range and the other pointing just past the end of the range, the STL sort algorithm can quickly and correctly sort the data items.

Here is our program rewritten to use the vector class in combination with the STL sort algorithm.

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>

int main(int argc,const char* argv[])
{
  std::ifstream in;
  std::ofstream out;

  std::vector<int> v;

  in.open("numbers.txt");

  int x;
  while(in >> x) {
    v.push_back(x);
  }
  in.close();

  std::cout << "Read " << v.size() 
            << " integers from the file." << std::endl;

  std::sort(v.begin(),v.end());

  out.open("sorted.txt");
  for(auto itr = v.begin();itr != v.end();itr++)
    out << *itr << std::endl;
  out.close();

  std::cout << "Done sorting and saving." << std::endl;

  return 0;
}

The only additional requirement to use std::sort is that we have to include the <algorithm> header file.

std::sort will work with anything that looks like an iterator. In particular, you can also use std::sort to sort data stored in a conventional array. For example, if A points to an array containing n ints we can sort A by doing

std::sort(A,A+n);

An added benefit of using std::sort is that std::sort uses a sorting algorithm that is much more sophisticated and significantly faster than the simple insertion sort algorithm I used for the earlier versions of our example program.