In my solution for the BTree assignment I made a very strong effort to make sure that I was following best practices for C++ programming. One such best practice that students routinely ignore is the need for careful memory management. The cardinal rule in C++ memory management is
If you created it with new
, you must later delete
it.
Most beginning programmers routinely ignore this rule, but experienced programmers go to great lengths to ensure that they are always faithfully following this rule.
In these lecture notes I will show you two approaches to following this rule. The first approach relies on careful consideration of memory management, while the second approach uses some modern tools that appeared in C++11 to help automate memory management tasks.
In the traditional approach we pay careful attention to situations where we are creating objects with new
. In any situation where we create an object with new
, we need to make sure that the object eventually gets deleted.
The first step in making sure that things get deleted is to supply any class that holds pointers to objects with a destructor. The destructor is the obvious place to delete
any objects that the owning object points to. To that end, I created destructors for the both the BTree and BTreeNode classes in my original solution:
template <typename T, int M> BTreeNode<T, M>::~BTreeNode() { if (!leaf) { for (int n = 0; n <= keyTally; n++) delete pointers[n]; } } template <typename T, int M> BTree<T, M>::~BTree() { if (root != nullptr) delete root; }
This gets us most of the way to the goal of deleting everything we created with new. Unfortunately, it does not cover every case. Most data structures also have methods for removing items, and item removal very often results in objects being dropped from the data structure. The BTree class is no exception. Most of the time when we remove a key from a BTree we end up removing a key from a node, but the node itself does not get removed from the BTree. On rare occasions we will find ourselves finally removing a node from the BTree: when that happens, we have to be careful to also delete the node. The one event in the life of the BTree that will involve the removal of a node is the consolidation algorithm. In that algorithm we consolidate two nodes into one, and one of the two nodes has to be destroyed in the process. Here is the relevant code:
template <typename T, int M> void BTree<T, M>::consolidate(const BTreeLink<T, M>& left, const BTreeLink<T, M>& right) { BTreeNode<T, M> *parent = left.parent; BTreeNode<T, M> *leftChild = left.ptr(); BTreeNode<T, M> *rightChild = right.ptr(); leftChild->keys[leftChild->keyTally] = parent->keys[left.pos]; leftChild->keyTally++; for (int n = 0; n < rightChild->keyTally; n++) { leftChild->keys[leftChild->keyTally] = rightChild->keys[n]; if (!rightChild->leaf) { leftChild->pointers[leftChild->keyTally] = rightChild->pointers[n]; } leftChild->keyTally++; } if (!rightChild->leaf) leftChild->pointers[leftChild->keyTally] = rightChild->pointers[rightChild->keyTally]; for (int n = left.pos; n < parent->keyTally - 1; n++) { parent->keys[n] = parent->keys[n + 1]; parent->pointers[n + 1] = parent->pointers[n + 2]; } rightChild->keyTally = -1; delete rightChild; parent->keyTally--; // Since consolidation removes a key from a parent node, we need to // immediately check whether or not the parent has entered an // underflow state. Also, we have to check for an important special // case. If the parent we just removed a key from is actually the // root, we can not handle that parent via the normal underflow // strategy. Since the root never has siblings it can not participate // in rebalancing or consolidation. An important special case to check // for is the case where the root underflows all the way down to // having no keys left. In that case, the root will be left with a // single child pointer, so that child will have to be promoted to // being the new root for the tree. if (parent->keyTally < M / 2 && parent != root) { BTreeLink<T, M> link = pointerToLink(parent); handleUnderflow(link); } else if (parent == root && parent->keyTally == 0) { if (parent->leaf) root = nullptr; else root = parent->pointers[0]; parent->keyTally = -1; delete parent; } }
The consolidation takes place in the block of code in the top half of the method. Note that one of the last events in the consolidation process is the explict deletion of the rightChild
node. A very subtle consideration that comes into play at that point is making sure that if rightChild
had any children that we don't end up accidentally deleting them along with the node itself. The pointers to rightChild
's children will have been transferred over to leftChild
as part of the consolidation process, but rightChild
may still hold copies of those pointers. To keep those grandchildren from accidentally being deleted by the BTreeNode destructor, we have to reduce the keyTally
for rightChild
down to -1. This ensures that the loop in the BTreeNode destructor, which normally tries to delete child nodes, won't run. This is a very subtle detail that is easy to overlook, and can lead to subtle and hard-to-find bugs involving the double deletion of nodes.
C++11 introduced some very significant new tools for memory management, smart pointers. A smart pointer is an object that holds a pointer to some other object. The first thing a smart pointer does is that it can pretend to be a pointer itself. Smart pointers do this by overloading the dereferencing operators, operator*()
and operator->()
.
To make use of smart pointers in a program, you will need to include the <memory>
header file.
Smart pointers perform automatic memory management by tracking references to the underlying object and then automatically deleting that object when the last smart pointer that refers to that object goes away. Here is an example to demonstrate how this works. This example makes use of the C++ shared_ptr
class, which is one of two smart pointer classes available in C++.
// A simple class class A { public: A(); ~A(); void foo(); private: // details hidden }; std::shared_ptr<A> f() { std::shared_ptr<A> ptr(new A()); ptr->foo(); return ptr; } int main() { std::shared_ptr<A> other = f(); other->foo(); return 0; }
The important thing to note in this example is that we never call delete
on the object created in the function f()
. We don't need to do this, because we hand the pointer returned by new A()
over to a shared_ptr
as soon as we create the A
object in f()
. f()
is then free to pass along a copy of that pointer to its caller in main
. As soon as we exit f()
the local variable ptr
will go away, but by that time a copy of the pointer held by ptr
will have been passed out of f()
. ptr
will sense this, and then be smart enough not to immediately delete
the object it points to. Instead, the system will wait until the last shared_ptr
that holds a copy of the object pointer goes away. This happens when we reach the end of main()
. At that point, other
will realized that it is the last shared_ptr
object holding a reference to the shared object, and other
will then delete
that object automatically for us.
C++11 offers two smart pointer classes, shared_ptr
and unique_ptr
. The unique_ptr
class implements exclusive ownership semantics. This means that a unique_ptr
object will claim exclusive ownership of the object it points to. When the unique_ptr
object goes away, it automatically deletes the owned object for you. In addition, the unique_ptr
class features a move constructor and a move assignment operator - both of these transfer an owned pointer from one unique_ptr
object to another. In any such move operation the unique_ptr
that the pointer moves from has its internal pointer value reset to nullptr
, thus reliquishing any claim to the object that it used to own.
unique_ptr
is appropriate for use in an application like the BTree class. In a BTree, every node can be said to have a single, unique owner. For most nodes that owner is the parent of the node. For the root, which has no parent, the BTree itself acts as the unique owner. This immediately suggests the follow change to the structure of the BTree and BTreeNode classes:
template <typename T, int M> class BTreeNode { friend class BTree<T, M>; friend class BTreeLink<T, M>; public: BTreeNode(); BTreeNode(const T& el); ~BTreeNode(); bool isFull(); bool containsKey(const T& key); void insertKey(const T& key); // Used to insert a new key in a leaf void insertChild(const T& key, std::unique_ptr<BTreeNode<T, M>> child, int pos); void removeKey(const T& key); // Used to remove a key from a leaf BTreeLink<T, M> findSuccessor(const T& key); void swapWithSuccessor(const T& key, BTreeLink<T, M> leaf); void traverse(std::ostream& out); void snapShot(std::ostream& out); private: bool leaf; int keyTally; T keys[M - 1]; std::unique_ptr<BTreeNode<T,M>> pointers[M]; int findKeyPos(const T& key); }; template <typename T, int M> class BTree { public: BTree(); ~BTree(); bool contains(const T& key); void insert(const T& key); void remove(const T& key); void traverse(std::ostream& out); void snapShot(std::ostream& out); private: std::unique_ptr<BTreeNode<T, M>> root; // Methods to assist in the presplitting strategy void splitRoot(); void splitNode(const BTreeLink<T, M>& link); BTreeLink<T, M> pointerToLink(BTreeNode<T, M> *ptr); void handleUnderflow(const BTreeLink<T, M>& link); void rebalance(const BTreeLink<T, M>& left, const BTreeLink<T, M>& right); void consolidate(const BTreeLink<T, M>& left, const BTreeLink<T, M>& right); };
The root
pointer in the BTree class is now a unique_ptr
, and the array pointers
in the BTreeNode class is now an array of unique_ptr
objects.
Another immediate change that this brings about is in the code for the destructors for the two classes:
template <typename T, int M> BTreeNode<T, M>::~BTreeNode() { } template <typename T, int M> BTree<T, M>::~BTree() { }
These destructors can now both be completely empty, because the unique_ptr
objects embedded in these two classes manage the pointers they hold for us automatically. When a BTree or BTreeNode object goes away, the destructors for any contained objects will run automatically. The destructor for the unique_ptr
class checks to see if the unique_ptr
holds a pointer - if it does the destructor automatically calls delete
on that owned pointer.
In applications where we use unique_ptrs to store pointers to objects, we will often want the unique_ptrs to claim ownership of those objects the instant we create the object. For example, in the BTree application we will have to create a root node for the BTree at some point. The recommended procedure for doing this is to replace the traditional code
root = new BTreeNode(key);
with
root = std::make_unique<BTreeNode<T,M>>(key);
This creates a new BTreeNode initialized with key
and moves ownership of the node over to root
.
On occasion, it will be appropriate to transfer ownership of a pointer from one unique_ptr
to another. An example of this in the BTree is the transfer of child pointers in splitting, consolidation or rebalancing operations. For example, here is the code for one of the node splitting methods:
template <typename T, int M> void BTree<T, M>::splitNode(const BTreeLink<T, M>& link) { BTreeNode<T, M> *toSplit = link.ptr(); std::unique_ptr<BTreeNode<T, M>> right = std::make_unique<BTreeNode<T, M>>(); for (int i = 0; i < (M - 2) / 2; i++) right->keys[i] = toSplit->keys[i + (M - 1) / 2 + 1]; if (!toSplit->leaf) { for (int i = 0; i < (M - 2) / 2 + 1; i++) right->pointers[i] = std::move(toSplit->pointers[i + (M - 1) / 2 + 1]); right->leaf = false; } right->keyTally = (M - 2) / 2; toSplit->keyTally = (M - 1) / 2; link.parent->insertChild(toSplit->keys[(M - 1) / 2], std::move(right), link.pos); }
The key statement here is
right->pointers[i] = std::move(toSplit->pointers[i + (M - 1) / 2 + 1]);
In this statement one of the children of the node being split is being transferred over to the new node, right
. Because unique_ptrs
strictly regulate the transfer of ownership from one unique_ptr
to another, we can not simply say
right->pointers[i] = toSplit->pointers[i + (M - 1) / 2 + 1];
This would trigger the use of the copy assignment operator, which is actually declared = delete
in the unique_ptr
class. To make it absolutely clear that we intend a move, we have to wrap the unique_ptr
on the right hand side in std::move()
. This triggers the use of the move assignment operator instead of the copy assignment operator. The move assignment operator moves the owned pointer from the unique_ptr
on the right hand side to the unique_ptr
on the left hand side.
A little further down in the method we have to transfer ownership of the new node to the parent node. This takes place in the last statement of the method, where we pass a new key and a new child up to the parent to insert. The second parameter to insertChild
is a unique_ptr
, so we have to wrap right
in std::move
to move ownership of the new node over to the parameter. Once right
has relinquished ownership of that node it resets its internal pointer to nullptr
. When we exit the method the destructor for right
will run, and will see that it no longer owns a pointer. Thus the new node object will have been transferred safely to its new home in the parent.
In many applications where we use unique_ptrs to manage ownership of objects, we will also often have need of plain pointers to those same objects. The BTree program is no exception. An important helper class in the BTree program is the BTreeLink class, which stores a pointer to the parent of a node:
template <typename T, int M> class BTreeLink { friend class BTree<T, M>; public: BTreeLink(); BTreeLink(BTreeNode<T, M>* par, int p); BTreeNode<T, M>* ptr() const { return parent->pointers[pos].get(); } BTreeLink<T, M> searchStep(const T& key) const; bool hasLeftSibling() const; BTreeLink<T, M> leftSibling() const; bool hasRightSibling() const; BTreeLink<T, M> rightSibling() const; private: BTreeNode<T, M>* parent; int pos; };
This class does not use a unique_ptr
, instead it stores a raw pointer to the parent node. This is appropriate, because the BTreeLink in no sense can be said to own the parent node it refers to. It is actually quite common to have raw pointers in use in an application that also uses unique_ptrs
. The first and most important convention to follow in cases like this is to make it clear that the raw pointers never own the objects they point to. More specifically, we should never call delete
on any raw pointer we work with. This very likely would lead to a double deletion error when the unique_ptr
that actually owns the object tries to delete it later. Another danger in using raw pointers is that the unique_ptr
that owns the object may decide at any point to delete that object. As soon as that happens, the pointer stored in the raw pointer becomes a dangling pointer, and attempts to use it may trigger an exception.
The final technical hurdle we need to get over if we plan to use raw pointers alongside unique_ptrs
is figuring out how to get the unique_ptr
to give us a copy of its owned pointer. The unique_ptr
class contains a get()
method that will return the owned pointer to you as a raw pointer. Here is an example.
BTreeLink<T, M> link(root.get(), pos);
In this code we are setting up a BTreeLink that will refer to one of the children of root
. The first parameter of the constructor for the BTreeLink class requires a raw pointer to a BTreeNode, so we call get()
on root
to obtain that pointer.
Both the unique_ptr
class and the shared_ptr
class implement objects that contain and manage pointers. Both of these classes take advantage of a key aspect of C++ objects: when the object is created the constructor runs, and when the object goes away the destructor runs. Both of these classes use their destructor to decide whether or not to delete the object that the pointer points to.
This intelligent use of constructors and destructors to acheive a special effect is an application of a broader programming idiom, the RAII idiom.
RAII stands for "Resource Acquisition is Initialization." A class that uses this idiom will use its constructor to acquire a resource, and will then use its destructor to release the resource.
Here is another concrete example of a simple class that uses the RAII idiom. The resource in this case is a file stream used to read numbers from the file. The class's constructor is designed to open the file, and its destructor is designed to close the file.
class FileReader { private: ifstream input; public: FileReader(string fileName) { input.open(fileName); } ~FileReader() { input.close(); } int read() { int x; input >> x; return x; } };
Here is some example code to show how to use this class.
int main() { FileReader in("numbers.txt"); std::vector<int> A; for(int n = 0;n < 1000;n++) A.push_back(in.read()); }
The FileReader class takes responsibility for managing an input stream, setting it up for us at start, and then automating the process of closing that input stream when we reach the end of main and the FileReader's destructor runs.