The ArrayList Class Revisited

Now that we have the ability to write our own versions of various standard operators, we can create a more convenient version of the ArrayList class. The big problem with the original version of the ArrayList class is that it forced us to write clumsy code like this:

A.set(n,B.get(n));

We would prefer to write code like this instead:

A[n] = B[n];

With the ability to define operator overloads, we can now make the latter form work. The trick we need here is to define a version of the bracket operator for the ArrayList class. Here is what that looks like.

template <class T>
class ArrayList
{
private:
  T *A;
  int N;
public:
  ArrayList(); // Default constructor
  ArrayList(int size); // Standard constructor
  ArrayList(const ArrayList<T> &other); // Copy constructor
  ~ArrayList(); // Destructor
  
  ArrayList<T>& operator=(const ArrayList<T>& other);

  int size() const;

  void resize(int newSize);

  T& operator[](int n) { return A[n]; }
  T operator[](int n) const {return A[n]; }
};

There are two different versions of the operator corresponding to the two different usages of the bracket operator. The first version gets called in cases in which we want to assign a value to the ArrayList, while the second will get called when we want to retrieve a value from an ArrayList. In the statement

A[n] = B[n];

the first version of operator[] gets used for the A[n] part, while the second version gets used for the B[n] part.

The vector class

Now that we have make our ArrayList class fully functional and easy to use, I should point out that the job has already been done for us. There is a class in the C++ Standard Template Library that replicates all of the features of the ArrayList class and adds many more besides. To use that class in your programs, simply place the #include statement

#include <vector>

at the top of your source files.

Introduction to Iterators

Making a generic algorithm

We have already seen how template functions make it possible to make more generic functions. Today we are going to continue the drive to make functions even more generic by judicious use of abstraction and operator overloading.

Here is a function that conducts a linear search in an array and returns the index of the location where an item is first found or -1 if the item is not found.

template <class T>
int find(T item,T A[],int length) {
  for(int n = 0;n < length;n++)
    if(A[n] == item)
      return n;
  return -1;
  }

Suppose we wanted to make this function even more generic by allowing it to search in a data structure other than an array. As a start, we might do something like this.

template <class T,class DS>
int find(const T& item,const DS& A,int length) {
  for(int n = 0;n < length;n++)
    if(A[n] == item)
      return n;
  return -1;
  }

This will actually work, provided that the data structure we pass in understands the bracket operation. When the compiler encounters the expression A[n] and A is not an array it will try to make this work by automatically converting A[n] to its function equivalent A.operator[](n). This will succeed if the data structure we pass in as the A parameter contains a member function named operator[].

Here is an example of how to write such a function. If we were to add an operator[] to the DoublyLinkedList class from chapter 3 it would look like this.

// Precondition: The DoublyLinkedList is not empty and 0 <= n < length
// Postcondition: A reference to the item in the nth node has been
// returned
T& operator[](int n)
  {
  DLLNode<T> *temp = head;
  while(n > 0) {
    temp = temp->next;
    n--;
    }
  return temp->info;
  }

Once the DoublyLinkedList is equipped with an operator[] the find function works correctly on both arrays and DoublyLinkedList structures.

int main()
{
int X[] = { 5, 4, 9, 3 };
cout << "9 is in position " << find(9,X,4) << " of X." << endl;
  
DoublyLinkedList<int> Y;
Y.addToDLLTail(3);
Y.addToDLLTail(9);
Y.addToDLLTail(4);
Y.addToDLLTail(5);
cout << "9 is in position " << find(9,Y,4) << " of Y." << endl;
return 0;
}

The operator[] we have just created works very well. In fact, you can use it just as you would with an array. The only thing you have to watch out for is to make sure that the given location in the structure exists before you try to access it.

DoublyLinkedList<int> Z;
// Make Z have a length of 10 by pushing 10 zeros.
for(int n = 0;n < 10;n++)
  Z.addToDLLTail(0);

Z[2] = 4;
cout << Z[2] << endl; // Will output 4.

This example also shows the power of returning a reference from a function. When we say 'Z[2]' we are asking for a reference to the item in position 2 in the Linked structure Z. When you assign something to a reference, you are effectively assigning something to the thing the reference refers to. Thus, when we say 'Z[2] = 4' we are setting the item in position 2 of Z to 4.

What is wrong with this approach

Writing an operator[] appears to be a slick way to solve the problem of making a generic find algorithm. Ultimately, this turns out not to be the best approach. There are two problems. The first is that not every data structure we are going to study in this course is a linear data structure. Accessing items in a data structure by their index only makes sense if the data items can be lined up and assigned integer indices. This will not be the case in many of the data structures we study in this course.

A second and more immediate problem is that operator[] for the Linked class is extremely inefficient. Arrays are an example of a random access data structure. A random access structure allows you to access any given item in the data structure at the same low cost in time. The Linked data structure is an example of a sequential access data structure. To get to a given location in a Linked object you have to jump over nodes until you reach the location. This takes time, and in large Linked objects can take a considerable amount of time. For example, if you build a Linked object that contains 1000 nodes and try to access the last node you will have to jump over 999 nodes to get to that last node.

The problems caused by sequential access really grow when you use operator[] in the context of a loop. Consider this example:

for(int n = 0;n < A.size();n++)
  cout << A[n] << endl;

The cost of each array access A[n] is proportional to n. If you total this cost over all different values of n the total cost is

This is approximately equal to the square of the size of the data structure, and starts to become prohibitively expensive in time as the structure A grows beyond a few hundred items.

An alternative approach

The only way out of the problems caused by using operator[] is to abandon its use altogether. This may seem impossible to do, because we still want our find function to work with arrays and it is hard to imagine using arrays without making use of operator[]. Fortunately, there is an alternative way to interact with an array.

The alternative method makes use of the fact that you can also use pointers to access elements in an array. We have already seen that the name of an array is essentially a pointer to the first element in the array. It also turns out that pointers can be incremented. If you take a pointer that points to the first item in an array and increment the pointer you get a pointer that points to the second item in the array. This makes it possible to write a loop that uses pointers to iterate over an array. For example, here is a piece of code that initializes every item in an array A of N ints to 0.

int* ptr = A; // Get a pointer to the first item
for(int n = 0;n < N;n++) //Iterate across all N items
  {
  *ptr = 0; // Set the current item to 0
  ptr++; // Move to the next item
  }

Note that this example still needs to use an index n. We don't need the n to access the items in the array, but we do need the n to keep track of how many items we have processed. One final trick will eliminate the n completely. Besides incrementing pointers you can also do pointer arithmetic with them. Adding an integer value to a pointer that points to the beginning of an array causes the pointer to move over by that many spaces. In particular, if A points to the beginning of the array A of size N, (A + N) points to the location just beyond the end of the array. This makes it possible to do this:

int* ptr = A; // Point at the beginning of the array
int* end = ptr + N; // Point just past the end
while(ptr != end) { // while we haven't run off the end
  *ptr = 0; // Set current item to 0
  ptr++; // Move to the next item
  }

Here is what the find function looks like when we rewrite it in the pointer idiom.

template <class T,class P>
P find(const T& what,P from,P to) {
  P current = from;
  while(current != to) {
    if(*current == what)
      return current;
    else
      current++;
    }
  return to;
  }

This template function uses two template parameters. T is the type of the data items stored in the structure we are searching. P is the type of the pointer we are using to access those data items and move through the structure. Here is how you would use this find function with an array.

int X[] = { 5, 4, 9, 3 };
int* begin = X;
int* end = X + 4;
int* ptr = find(9,begin,end);
if(ptr != end)
  cout << "The value 9 is in the array X." << endl;
else
  cout << "The value 9 is not in the array X." << endl;

Modifying the DoublyLinkedList class to work with the new find function

If we are going to apply this same idiom to the DoublyLinkedList class, we will need to create a data type that can function as a sort of generalized pointer into a DoublyLinkedList structure. Specifically, we need a data type that supports these basic operations:

  1. An operator++ which moves the 'pointer' from one item to the next.
  2. A way to dereference the 'pointer' to get at the thing it 'points' to.
  3. A comparison operator!= that we can use to check whether or not we have run off the end of the structure.
  4. An assignment operator= that we can use to copy one 'pointer' to another 'pointer'.
  5. Some way to get a 'pointer' to the beginning of the DoublyLinkedList structure and a 'pointer' that points just past the end of the structure.

To satisfy requirements 1, 2, 3, and 4 one has to create an iterator class for the DoublyLinkedList structure that supports operator++, operator*, operator=, and operator!=. To satisfy requirement 5 one needs to add begin() and end() member functions to the Linked class itself. Those functions will return iterators pointing to the beginning and just past the end of the structure.

Here is the necessary source code needed to accomplish these things.

  class Iterator {
    friend class DoublyLinkedList;
  private:
    DLLNode<T> *nodePtr;
    // The constructor is private, so only our friends
    // can create instances of Iterators.
    // Postcondition: The iterator has been constructed
    //                from newPtr.
    Iterator (DLLNode<T> *newPtr)
    {
      nodePtr = newPtr;
    } // constructor

  public:
    // Postcondition: The iterator has been constructed.
    Iterator()
    {
    } // default constructor

    // Postcondition: true has been returned if the
    //                iterator is not equal to itr;
    //                otherwise, false has been returned.
    bool operator!= (const Iterator& itr) const
    {
      return nodePtr != itr.nodePtr;
    } // overloading !=

    // Postcondition: A reference to the item the iterator
    //                is positioned at has been returned.
    T& operator*() const
    {
      return nodePtr -> info;
    } // overloading *

    // Precondition: The iterator is positioned at an item.
    // Postcondition: The iterator has been positioned at
    //                the next item in the Linked object.
    Iterator operator++(int)
    {
      Iterator temp = *this;
      nodePtr = nodePtr -> next;
      return temp;
    } // post-increment ++
  }; // class Iterator

  // Postcondition: An iterator positioned at the front of
  //                the DoublyLinkedList has been returned.
  Iterator begin()
  {
    return Iterator(head);
  } // begin

  // Postcondition: An iterator positioned just after the
  //                back of the DoublyLinkedList has been
  //                returned.
  Iterator end()
  {
    return Iterator (0);
  } // end

Once these changes are made, we can use the 'pointer' based find function with the DoublyLinkedList class:

DoublyLinkedList<int> Y;
Y.addToDLLTail(3);
Y.addToDLLTail(9);
Y.addToDLLTail(4);
Y.addToDLLTail(5);
DoublyLinkedList<int>::Iterator begin = Y.begin();
DoublyLinkedList<int>::Iterator end = Y.end();
DoublyLinkedList<int>::Iterator ptr = find(9,begin,end);
if(ptr != end)
  cout << "The value 9 is in the DoublyLinkedList Y." << endl;
else
  cout << "The value 9 is not in the DoublyLinkedList Y." << endl;

One further benefit of this arrangement is that operator* returns a reference. This means that we can use Iterators to change values as well. For example, if we wanted to write a function that finds every occurance of some value in a data structure and replaces it with another value we can do this.

template <class T>
void findAndReplace(const T& item,const T& replacement,DoublyLinkedList<T>& A) {
  DoublyLinkedList<T>::Iterator where = find(item,A.begin(),A.end());
  while(where != A.end()) {
    *where = replacement; // Replace the old item
    where = find(item,where,A.end()); // Search again starting from where
    }
  }