Introduction to Functional Programming

Functional programming

The C++ language is a very rich language that seeks to support many different programming paradigms. We have already seen that C++ supports two of the most popular programming paradigms:

In procedural programming we decompose large problems into smaller pieces and then write functions to handle those smaller pieces of the problem. This was the predominant style of programming in the original C language.
In object oriented programming we solve even larger problems by constructing classes that provide services to a program. Programs are then mostly composed of collections of objects that cooperate to perform some larger task. C++ is an object-oriented extension of the C language that supports this style of programming.

In these lecture notes I am going to introduce you to a third major programming paradigm that C++ supports, functional programming. First, a formal definition:

In functional programming functions become first class language elements that can be stored, passed as parameters to other functions, and returned as results from other functions.

A first example

Here is a fairly natural example that features one of the key ideas in functional programming, passing a function as a parameter to another function.

This example begins with our friend quicksort,

void swap(int &a, int &b){
  int temp = a;
  a = b;
  b = temp;
}

int median(vector<int> & a,int first,int last) {
    int mid=(first+last)/2;

    if (a[first] < a[mid]){
      if (a[mid] < a[last])
        return mid;
      else if (a[first] < a[last])
        return last;
      else
        return first;
    } else {
      if(a[first] < a[last])
        return first;
      else if(a[last] < a[mid])
        return mid;
      else
        return last;
    }
}

int partition(vector<int> & a,int first,int last) {
    int p=first;
    
    swap(a[first],a[median(a,first,last-1)]);  
    int piv = a[first];
    
    for(int k=first+1;k<last;k++) {
      if (a[k] <= piv) {
        p++;            
        swap(a[k],a[p]);
      }
    }
    swap(a[p],a[first]);
    return p;
}

void quickSort(vector<int> & a,int first,int last) {
  if (last - first > 20) {
    int piv = partition(a,first,last);
    quickSort(a,first,piv);
    quickSort(a,piv+1,last);
  } else {
    selectionSort(a,first,last);
  }
}

void quickSort(vector<int>& a) {
  quickSort(a,0,a.size());
}

One important thing to note about this code is that it sorts a list of integers into ascending order. That aspect of the algorithm is coded into the implementation at a very deep level. This portion of the partition function makes decisions about which half of the partition a number should go into:

for(int k=first+1;k<last;k++) {
  if (a[k] <= piv) {
    p++;            
    swap(a[k],a[p]);
  }
}

This code uses a comparison in the if statement to make the decision about where number should go. If a[k] is smaller than the pivot it goes to the left, otherwise it goes to the right.

If we wanted to make a version of quicksort that sorted a vector of ints into descending order, we would have to replace the <= in the if statement with >=. We would also have to rename several of the functions:

int partitionDescending(vector<int> &a,int first,int last) {
    int p=first;
    
    swap(a[first],a[median(a,first,last-1)]);  
    int piv = a[first];
    
    for(int k=first+1;k<last;k++) {
      if (a[k] >= piv) {
        p++;            
        swap(a[k],a[p]);
      }
    }
    swap(a[p],a[first]);
    return p;
}

void quickSortDescending(vector<int> &a,int first,int last) {
  if (last - first > 20) {
    int piv = partitionDescending(a,first,last);
    quickSort(a,first,piv);
    quickSort(a,piv+1,last);
  } else {
    selectionSortDescending(a,first,last);
  }
}

void quickSortDescending(vector<int>& a) {
  quickSortDescending(a,0,a.size());
}

Note also that since quickSort calls selectionSort we would have to prepare a separate version of selectionSort that sorts data into descending order.

A slightly less annoying way to make quicksort more flexible is to apply abstraction to the problem. In this approach we isolate the portion of the code that is likely to change from one version to the next in a function.

bool in_order(int a,int b) {
  return a < b;
}

int partition(vector<int> & a,int first,int last) {
    int p=first;
    
    swap(a[first],a[median(a,first,last-1)]);  
    int piv = a[first];
    
    for(int k=first+1;k<last;k++) {
      if (!in_order(piv,a[k])) {
        p++;            
        swap(a[k],a[p]);
      }
    }
    swap(a[p],a[first]);
    return p;
}

void quickSort(vector<int> & a,int first,int last) {
  if (last - first > 20) {
    int piv = partition(a,first,last);
    quickSort(a,first,piv);
    quickSort(a,piv+1,last);
  } else {
    selectionSort(a,first,last);
  }
}

void quickSort(vector<int>& a) {
  quickSort(a,0,a.size());
}

In this version we isolate the comparison logic in a function, in_order, that both partition and selectionSort can use to do their work. To change the sorting code from ascending to descending we simply have to change the code for the in_order function.

The obvious downside to this approach is that once we write the code for in_order we lock ourselves into a sorting order. A better approach would be to pass the comparison function to the algorithm as a parameter. That way we could pass in one version of the in_order function to sort in ascending order and a different version to sort in descending order.

The original C language actually made it possible to pass functions as parameters to other functions, but the mechanism for doing this was somewhat clunky. If you want to pass a function as a parameter to another function in C, you pass in a pointer to the function you want to use. The code below shows how to do this.

bool ascending(int a,int b) {
  return a < b;
}

bool descending(int a,int b) {
  return a > b;
}

int partition(vector<int> &a, int first,
        int last, bool (*in_order)(int,int))
{
  int p=first;
    
  swap(a[first],a[median(a,first,last-1)]);  
  int piv = a[first];
    
  for(int k=first+1;k<last;k++) {
    if(!in_order(piv,a[k])) {
      p++;            
      swap(a[k],a[p]);
    }
  }
  swap(a[p],a[first]);
  return p;
}

void quickSort(vector<int> &a, int first,
       int last, bool (*in_order)(int,int))
{
  if (last - first > 20) {
    int piv = partition(a,first,last,in_order);
    quickSort(a,first,piv);
    quickSort(a,piv+1,last);
  } else {
    selectionSort(a,first,last,in_order);
  }
}

void quickSort(vector<int>& a,bool (*in_order)(int,int)) {
  quickSort(a,0,a.size(),in_order);
}

At various points in this code you will see a parameter that takes the form

bool (*in_order)(int,int)

This is read "in_order is a pointer to a function that takes two ints as its parameters and then returns a bool." A pointer to a function passed as a parameter to one function can be passed along as a parameter to another function. Eventually, we call the function we passed in as a parameter in the partition function when we need to use it to compare two items.

With this arrangement we would call

quickSort(V,ascending);

to sort a vector V of ints into ascending order. Here I have passed the ascending function as a parameter to quickSort. In both C and C++ the name of a function is interpreted as a pointer to that function. Since the ascending function is a function that takes two ints as its parameters and then returns a bool, ascending is a valid function pointer to pass as the second parameter to quickSort.

C++ offers another method to simplify passing functions as parameters to other functions. This method is based on the use of C++ template functions to simplify the task of setting up function parameters. Here is the example above rewritten using template functions.

template <typename T>
bool ascending(const T& a,const T& b) {
  return a > b;
}

template <typename T>
bool descending(const T& a,const T& b) {
  return a < b;
}

template <typename T,typename F>
int partition(vector<T> &a, int first, int last, F in_order)
{
  int p=first;
    
  swap(a[first],a[median(a,first,last-1)]);  
  int piv = a[first];
    
  for(int k=first+1;k<last;k++) {
    if(!in_order(piv,a[k])) {
      p++;            
      swap(a[k],a[p]);
    }
  }
  swap(a[p],a[first]);
  return p;
}

template <typename T,typename F>
void quickSort(vector<T> &a, int first, int last, F in_order)
{
  if (last - first > 20) {
    int piv = partition(a,first,last,in_order);
    quickSort(a,first,piv);
    quickSort(a,piv+1,last);
  } else {
    selectionSort(a,first,last,in_order);
  }
}

template <typename T,typename F>
void quickSort(vector<T>& a,F in_order) {
  quickSort(a,0,a.size(),in_order);
}

In this version of the code, the in_order parameter looks like just another parameter whose type F is replaced with a template placeholder. As an added bonus, we get to construct a fully templated sorting algorithm where the type of the things being sorted is changed to a template parameter.

Just as in the previous example, we specify which way we want the items to be sorted by passing an appropriate comparison function to quicksort. To sort in ascending order we do

quickSort(V,ascending);

At this point the template mechanism kicks in and deduces the types T and F for us. If V is declared to be a vector of ints, the compiler will deduce that T is int and F is a pointer to a function that takes two ints as its parameters and returns a bool.

Parameterizing a data structure

Last week we studied the heap data structure. Heaps come in two different flavors, max-heaps and min-heaps. To implement these two different flavors we could either define two separate classes, or we could try to use some of the ideas from the previous example to implement a single heap class that can function either as a min-heap or a max-heap based on the comparison function we pass to it.

Here is an implementation of the heap class that does just that using the template mechanism I developed in the last example.

template <typename T,typename F>
class heap
{
public:
  heap(F func): in_order(func) { }

  bool isEmpty() { return cells.empty(); }
  T top() { return cells.front(); }

  void add(const T& item) {
    cells.push_back(item);
    int node = last();
    while(node != 0 && !in_order(cells[node],cells[parent(node)])) {
      swapNodes(node,parent(node));
      node = parent(node);
    }
  }

  void remove() {
    cells[0] = cells.back();
    cells.pop_back();
    int node = 0;
    while(true) {
      int winner = node;
      if(left(node)<(int)cells.size()&& !in_order(cells[left(node)],cells[winner]))
        winner = left(node);
      if(right(node)<(int) cells.size() && !in_order(cells[right(node)],cells[winner]))
        winner = right(node);
      if(winner == node)
        break;
      else {
        swapNodes(node,winner);
        node = winner;
      }
    }
  }
private:
  std::vector<T> cells;
  F in_order;
  
  int parent(int n) { return (n-1)/2; }
  int left(int n) { return 2*n+1; }
  int right(int n) { return 2*n+2; }

  int last() { return cells.size()-1; }

  void swapNodes(int a,int b) {
    T temp = cells[a];
    cells[a] = cells[b];
    cells[b] = temp;
  }
};

There is a subtle but significant difference between this example and the last. In the quickSort example we were dealing with procedural code, where all we had to worry about was passing our in_order function from one function to another as a parameter. In this example the in_order function actually gets stored as a member variable for use later. The constructor for the heap class takes as its sole parameter a comparison function and then stores that function in a member variable, in_order. Later on the add() and remove() methods will invoke that stored comparison function to do their job.

Storing a function as a member variable and then calling it later is only a small change from what we saw in the first example. Unfortunately, there is a much bigger problem lurking beneath the surface in this example. That problem shows itself when we get around to using the heap class. The heap class is a template class - when you use a template class you have to declare the actual types that will replace the placeholders in the template. Suppose now that we wanted to set up a min-heap holding ints:

template <typename T>
bool descending(const T& a,const T& b) {
  return a < b;
}

heap<int,???> H(descending);

The template heap class has a second template parameter, F, which stands for the type of the comparison function we want to use in the heap. To properly declare a template heap object we need to provide concrete types for both of the placeholders. The T is easy - we use int for that. The F is somewhat harder to manage.

One way to manage F is to define a type that is a pointer to a function with the right structure. There are three ways to achieve this. The first method is based on the C language typedef statement.

template <typename T>
bool descending(const T& a,const T& b) {
  return a < b;
}

typedef bool (*comp)(int,int);

heap<int,comp> H(descending);

The typedef statement introduces a new type, comp, that is declared to be a pointer to a function taking two ints as its parameters and returning a bool.

typedef is a C language construct, and does not play nicely with modern C++ concepts such as templates. C++11 introduced a more modern replacement for the ancient typedef, the alias declaration.

template <typename T>
bool descending(const T& a,const T& b) {
  return a < b;
}

template <typename T>
using comp = bool (*)(const T&,const T&);

heap<int,comp<int> > H(descending<int>);

The alias declaration here, which starts with using, says that the comp type is equivalent to a pointer to a function taking two T references as its parameters and returning a bool. As an added bonus, we can make the alias declaration a template to make the definition of the comp type more generic.

The final option to handle the type of C is to use one of the new type deduction tools introduced in C++11, the decltype construct:

bool descending(int a,int b) {
  return a < b;
}
heap<int,decltype(descending) > H(descending);

The decltype construct is a way to instruct the compiler to figure out the type of something for us. decltype takes as its parameter something that has the desired type, and forces the compiler to deduce and use that type.

Function objects

Using the techniques I demonstrated above we are getting closer to being able to fully implement the functional programming paradigm in C++. We can now pass functions as parameters to other functions, and we can store functions as member variables in objects for use later.

To delve even deeper into functional programming, we next need to consider the concept of function objects in C++.

The key to getting an object to act like is function is to make it possible for an object to replicate the one thing that makes a function a function, which is the ability to call the function. In all of the examples above we eventually got around to using the function that we had been passing around by calling that function. For example, in the quickSort example we saw the in_order function being used to implement a key step in the partition function.

template <typename T,typename F>
int partition(vector<T> &a, int first, int last, F in_order)
{
  int p=first;
    
  swap(a[first],a[median(a,first,last-1)]);  
  int piv = a[first];
    
  for(int k=first+1;k<last;k++) {
    if(!in_order(piv,a[k])) { // This is where we call in_order
      p++;            
      swap(a[k],a[p]);
    }
  }
  swap(a[p],a[first]);
  return p;
}

The key to getting something that is not really a function to act like a function is to use the magic of C++ operator overloading. C++ treats function calls as an operation, and makes it possible for you to overload the function call operator, whose name is operator().

Here now is an example of a class that implements operator().

class descending {
  public:
    bool operator()(int a,int b) {
      return a < b; }
};

descending d; // Declare an object of type descending
std::vector<int> V;
quickSort(V,d); // Pass the descending object to V

This does not look like much of an improvement on the previous code we had for quickSort. Where this technique really comes into its own is in the heap example. If you recall, the central difficulty in the heap example was declaring the type of the thing that was going to replace the F placeholder. Since a class is by definition a type, we can use the name of a class anywhere that a type name is required. This now leads to the best implementation of the min-heap:

// Declare a function object type
template <typename T>
class descending {
  public:
    bool operator()(const T& a,const T& b) {
      return a < b; }
};

descending<int> d; // Make a function object
// Pass it to the constructor for the heap 
heap<int,descending<int> > H(d);

Finally, we can take advantage of one other interesting advantage that objects provide in C++: objects can initialize themselves. If you declare a member variable in a class of object type you can sometimes skip the initialization step for the member variable and rely on the ability of an object member variable to initialize itself using its default constructor. Here is the heap class rewritten to make use of this trick:

template <typename T,class F>
class heap
{
public:
  heap() { } // No need to initialize in_order here...

  bool isEmpty() { return cells.empty(); }
  T top() { return cells.front(); }

  void add(const T& item) {
    cells.push_back(item);
    int node = last();
    while(node != 0 && !in_order(cells[node],cells[parent(node)])) {
      swapNodes(node,parent(node));
      node = parent(node);
    }
  }

  void remove() {
    cells[0] = cells.back();
    cells.pop_back();
    int node = 0;
    while(true) {
      int winner = node;
      if(left(node) < (int) cells.size() && !in_order(cells[left(node)],cells[winner]))
        winner = left(node);
      if(right(node) < (int) cells.size() && !in_order(cells[right(node)],cells[winner]))
        winner = right(node);
      if(winner == node)
        break;
      else {
        swapNodes(node,winner);
        node = winner;
      }
    }
  }
private:
  std::vector<T> cells;
  F in_order; // ...we rely on in_order to initialize itself.
  
  int parent(int n) { return (n-1)/2; }
  int left(int n) { return 2*n+1; }
  int right(int n) { return 2*n+2; }

  int last() { return cells.size()-1; }

  void swapNodes(int a,int b) {
    T temp = cells[a];
    cells[a] = cells[b];
    cells[b] = temp;
  }
};

This also simplifies the use of the heap class:

// Declare a function object type
template <typename T>
class descending {
  public:
    bool operator()(const T& a,const T& b) {
      return a < b; }
};

// Use it in the heap class
heap<int,descending<int> > H;

The one downside to this approach is that it locks us into using a function object type for the type C. To help document this fact, I declared the F template parameter in the heap class to be of type class instead of the more generic typename.

Function objects in the standard template library

Function objects are an important part of modern C++ programming practice. As such, the STL defines a number of useful function object classes. These classes are all defined in the <functional> header file. For example, <functional> contains a function object named std::less that essentially replicates the descending class I defined above. Using std::less we can now make a min-heap by doing

heap<int,std::less<int> > H;

and a max-heap by doing

heap<int,std::greater<int> > H;