Memory management with malloc and free

The C++ is backward compatible with the C language, which means that we can use any of the facilities offered by C in a C++ program. One specific facility we are going to make use of to implement our custom memory manager is the C memory management functions malloc and free. To use these functions in a C++ program you will need to use the include statement

#include <stdlib.h>

The C malloc (or memory allocation) function is designed to allocate a block of memory and return a pointer to that block. The sole parameter to malloc is the size of the block in bytes. To help us compute how many bytes are needed for a particular memory block, malloc is often used in combination with the sizeof() operation, which returns the size in bytes of a particular data type.

For example, suppose we are writing a program that uses objects of type node. To use malloc to allocate a single object of type node we would use this code:

node *ptr = (node*) ::malloc(sizeof(node));

To make an array of 1000 nodes we would instead do

node *ptr = (node*) ::malloc(1000*sizeof(node));

Because malloc returns a generic pointer to a memory location (called a void* pointer), we have to type cast the pointer returned from malloc to the type we have intended to create.

Important warning Even though the code above suggests that we can use malloc to allocate memory for objects in a C++ program, this approach by-passes one important aspect of object creation in C++. Allocating objects in this way does not result in the constructor for the object being called. The only way to guarantee that the object's constructor is invoked is to create the object with new. Below I will show how to combine the use of new with a custom allocation scheme.

When you are done using a block of memory that was originally allocated with malloc, you can free up that memory by calling the free function.

::free(ptr);

Overriding operator new and operator delete

When you create an object with new in C++ you are using the general purpose memory management system provided by the C++ standard library. C++ allows you to replace this general purpose memory manager with your own memory management system. The key to doing this is the ability to override operator new and operator delete in any class. In these lecture notes I will demonstrate how to do this by implementing a custom memory manager that works in combination with the BTree implementation I wrote for the BTree assignment.

I will be using the custom memory manager to manage the allocation of BTreeNode objects for my BTree. Since all BTreeNode objects have the same size, and we will only need to manage the allocation and deallocation for objects of a single type, we can design a simple and very efficient memory management system for this application.

The first step in setting up the custom memory manager is to set up a memory pool. This is simply a large array of BTreeNodes that we allocated with malloc. To do this, we add a pair of member variables

static BTreeNode<T, M> *pool;
static int next;

to the BTreeNode class. The first of these variables is a pointer to the start of the memory pool, and the second is an index we will use to track the location of the next available unallocated BTreeNode in the memory pool.

In addition to these two new member variables we also add a pair of static member functions to the BTreeNode class. These functions are responsible for setting up and tearing down the memory allocation system for BTreeNodes:

template <typename T, int M>
void BTreeNode<T, M>::initMemory(int maxNodes) {
  pool = (BTreeNode<T,M>*) ::malloc(maxNodes*sizeof(BTreeNode<T, M>));
  next = 0;
}

template <typename T, int M>
void BTreeNode<T, M>::releaseMemory() {
  ::free(pool);
}

These two static methods will have to be called before and after we plan to do anything with BTreeNodes in our application. The initMemory method takes a single parameter, which is the number of nodes we would like to have in our pool. We have to be careful to set this value to be large enough so that we can provide as many nodes as our application will need.

The final step needed to set up the memory manage is to override operator new and operator delete for our class. Here is the initial code for our class's operator new implementation:

template <typename T, int M>
void* BTreeNode<T, M>::operator new(std::size_t size) {
  void* result = &(pool[next]);
  next++;
  return result;
}

This code simply returns the address of the next available BTreeNode in the pool and increments the next index.

The purpose of operator new is to allocate a block of memory of the desired size and return a pointer to that block. The code here does that very quickly and efficiently by simply returning a pointer to one of the blocks in the pool. The great advantage to overriding operator new is that we get to replace the generic memory manager with our own, faster memory manager while still guaranteeing that the constructor of the object we are allocating will get called.

Implementing memory recycling via a free list

The other function of a memory allocation system is to handle the reuse of memory from objects that have been deleted. To manage this aspect of memory allocation we will add one additional feature to our custom memory allocator, a free list. The free list is a singly linked list of memory blocks that have been freed by a call to delete. To implement this system, we add another static member variable

static void* free;

to our BTreeNode class and modify the initMemory function to also initialize this member variable.

template <typename T, int M>
void BTreeNode<T, M>::initMemory(int maxNodes) {
  pool = (BTreeNode<T,M>*) ::malloc(maxNodes*sizeof(BTreeNode<T, M>));
  next = 0;
  free = nullptr;
}

We then provide an implementation of operator delete that takes any BTreeNode that has been deleted and adds it to the free list.

template <typename T, int M>
void BTreeNode<T, M>::operator delete(void* ptr) {
  void* *handle = (void**)ptr;
  *handle = free;
  free = ptr;
}

The code here is somewhat subtle. The parameter ptr that gets passed to operator delete is a generic pointer that points to the start of the memory block occupied by the BTreeNode that is being deleted. The first thing we do is to type cast that pointer to be a pointer to a pointer. Once we have done that, we can assign the current value of the free pointer (which points to the start of our free list) to that location. Finally, we update the free pointer to point to the block that we have just added to the start of the free list.

Once we have set up a free list, we also have to go back and update our code for operator new so that it will return blocks from the free list whenever those are available:

template <typename T, int M>
void* BTreeNode<T, M>::operator new(std::size_t size) {
  if (free != nullptr) {
    void* result = free;
    void* *handle = (void**)free;
    free = *handle;
    return result;
  }

  void* result = &(pool[next]);
  next++;
  return result;
}

The code we have added here will follow the free pointer to the block it points to. At that location there is a pointer that points to the second block in the free list. By updating the free pointer to point to that second block, we have effectively unhooked the first block from the free list, and we can then return a pointer to that block. If there is currently nothing available in the free list, operator new will instead allocate a block from the pool in the normal way.

One important feature that is missing here is any additional logic to check for an out-of-memory condition. To make a more complete and correct system, we should also add logic here that checks for no remaining blocks in the system and throws an exception when that happens.

Performance improvements

Because the general purpose memory manager in C++ is a general purpose system that is designed to allocate memory for all different kinds of objects, it tends to be slower than a special purpose system that only has to allocate memory for a single kind of object.

To test the performance of my custom memory allocation system, I wrote a test program that allocates and frees a large number of BTreeNodes:

int main() {
  BTreeNode<int, 4>::initMemory(100000);
  {
    BTree<int, 4> tree;

    // Read the first list of numbers
    std::ifstream one("numbers.txt");
    std::vector<int> numbers;
    int x;
    while (one >> x) {
      numbers.push_back(x);
    }
    one.close();

    // Add all of the numbers in the first list to the tree
    clock_t start = std::clock();
    for (auto itr = numbers.begin(); itr != numbers.end(); itr++)
      if (!tree.contains(*itr))
        tree.insert(*itr);
    std::cout << ((double)((std::clock() - start) * 1000)) / CLOCKS_PER_SEC << " milliseconds to insert." << std::endl;

    // Read the second list of numbers
    std::ifstream two("remove.txt");
    numbers.clear();
    while (two >> x) {
      numbers.push_back(x);
    }
    two.close();

    // Remove the numbers in the second list
    start = std::clock();
    for (auto itr = numbers.begin(); itr != numbers.end(); itr++)
      if (tree.contains(*itr))
        tree.remove(*itr);
    std::cout << ((double)((std::clock() - start) * 1000)) / CLOCKS_PER_SEC << " milliseconds to remove." << std::endl;

    // Add the numbers in the second list back in again
    start = std::clock();
    for (auto itr = numbers.begin(); itr != numbers.end(); itr++)
      if (!tree.contains(*itr))
        tree.insert(*itr);
    std::cout << ((double)((clock() - start) * 1000)) / CLOCKS_PER_SEC << " milliseconds to insert." << std::endl;
  }
  BTreeNode<int, 4>::releaseMemory();

  return 0;
}

I also wrote a second version of the program that uses the built-in memory manager for all objects, including the BTreeNode class.

The code above reads 100,000 random integers from a text file and stores in them in a BTree. It then reads a second list of 50,000 random integers, removes them from the BTree, and then adds them back in. This will test the performance of the free list, since removing a large number of items from the BTree will result in a large number of BTreeNodes being deleted and returned to the free list.

To measure the performance of the insert and remove operations I used the std::clock() function to measure the amount of time needed to perform these operations. To use this function you need to include <ctime>.Since reading the numbers from the two text files actually takes much longer than the insertions or the removals, I first read the numbers into a vector and then timed how long it took to do the insertions of the numbers from the vector.

Using the built-in memory manager, the program reports these times:

100.354 milliseconds to insert.
35.284 milliseconds to remove.
62.58 milliseconds to insert.

With our own custom memory manager the program reports:

86.817 milliseconds to insert.
29.14 milliseconds to remove.
52.784 milliseconds to insert.

This is overall about a 15% improvement in speed.