Example project

A merging method

Here is the code for a method that can merge two sorted arrays into a third array. The method assumes that arrays one and two have been sorted into ascending order, and that array destination has a size equal to the combined lengths of the two source arrays.

public static void merge(int[] one, int[] two, int[] destination) {

    int i = 0, j = 0, k = 0;
    // While there are still numbers left to process in both of the
    // original arrays...
    while (i < one.length && j < two.length) {
        // ...move the smaller of the two numbers at one[i], two[j]
        // over to merged[k]...
        if (one[i] <= two[j]) {
            destination[k] = one[i];
            k++;
            i++;
        } else {
            destination[k] = two[j];
            k++;
            j++;
        }
    }
    //...then move any remaining numbers over.
    while (i < one.length) {
        destination[k] = one[i];
        k++;
        i++;
    }
    while (j < two.length) {
        destination[k] = two[j];
        k++;
        j++;
    }
}

The method works by using a set of three indices, i, j, and k. Index i tracks our progress through the first source array and j tracks our progress through the second source array. Index k tracks our progress through the destination array. On each round through the main loop this method selects the smaller of the two numbers at one[i] or two[j] to move over to the destination array. After copying the number over, we increment the index k to move to the next location in the destination array, and we increment the index for the array we copied the number from.

Typically when the main loop finishes one of the two source arrays will still have numbers left in it. The two smaller loops following the main loop check to see if any numbers remain to be copied over from the two source arrays.

Merge Sort

Perhaps the most useful application of the merge method is a sorting algorithm known as merge sort. This algorithm breaks an array to be sorted into two halves, sorts the halves, and then merges them back together again to form a full sorted list.

public static void mergeSort(int[] list) {
    if (list.length < 20) {
        selectionSort(list);
    } else {
        // Copy the first half of list into a new array
        int firstLength = list.length / 2;
        int[] one = new int[firstLength];
        for (int n = 0; n < firstLength; n++) {
            one[n] = list[n];
        }
        // Copy the second half of list into another array
        int secondLength = list.length - firstLength;
        int[] two = new int[secondLength];
        for (int n = firstLength; n < list.length; n++) {
            two[n - firstLength] = list[n];
        }
        // Sort the two smaller arrays
        mergeSort(one);
        mergeSort(two);
        // Merge the sorted arrays back into the original array
        merge(one, two, list);
    }
}

mergeSort is an example of a recursive algorithm. It calls itself to help sort the two smaller arrays copied from the original. In Java it is legal for a method to call itself - the only thing you have to watch for is the possibility of an infinite regress as the method calls itself over and over again. What prevents that problem here is the fact that the two arrays created by mergeSort are smaller than the original. There is a limit to the number of times an array can be subdivided: in fact, when mergeSort produces a smaller array of size less than 20, it will switch to sorting that small array by selectionSort, since selectionSort is relatively efficient for arrays of this size. This eventual switch to selectionSort ends the process of mergeSort calling itself, and prevents an infinite regress.

Using mergeSort to speed up other algorithms

In the lecture on Monday I showed two ways to find the most common number in a list of numbers. The second method was to sort the list and then use a simple algorithm to find the longest run of consecutive numbers in the sorted array. I used selection sort to sort the numbers in the example on Monday. This is not ideal, because selection sort has a run time that is proportional to N2 for a list of length N. Merge sort is a much more efficient sorting algorithm, with a run time proportional to N log N, so switching to merge sort and then searching for runs produces a much faster result.

An even better sorting algorithm

There are even better sorting algorithms than merge sort. One such sorting algorithm is available in the java.util.Arrays class. This method, Arrays.sort() is even faster than merge sort and also does not require a temporary array to do its work. This makes the algorithm both more time and space efficient.

Intersection by merging

The merge method shown above is useful in its own right. What makes the method even more interesting is that it can be used as the basis for other important algorithms, and can also be modified to solve a wider range of problems.

Here is an example of a program that can be solved by a modified version of the merging process. We seek to write a program that can compute the intersection of two lists of integers stored in files named "one.txt" and "two.txt". The following program reads the two lists into arrays, sorts them, and then uses a modified version of the merging algorithm to construct the intersection of the two lists.

public class IntersectionByMerging {
    public static int[] computeIntersectionByMerging(int[] one,int[] two)
    {
        int[] temp = new int[one.length]; // The intersection will be no bigger than this
        int n = 0; // Where the next number in temp goes

        int i = 0, j = 0; // Indices that walk across one and two
        while(i < one.length && j < two.length) {
            if(one[i] == two[j]) {
                temp[n] = one[i];
                n++;
                i++;
                j++;
            } else if(one[i] < two[j])
                i++;
            else
                j++;
        }

        // Make a new array with the right size...
        int result[] = new int[n];
        // ...and copy the numbers over to it.
        for(i = 0;i < n;i++)
          result[i] = temp[i];
        return result;
    }

    public static void main(String[] args) {
        int[] A = readNumbers("one.txt");
        int[] B = readNumbers("two.txt");
        Arrays.sort(A);
        Arrays.sort(B);
        int[] C = computeIntersectionByMerging(A,B);
        for(int n = 0;n < C.length;n++)
          System.out.println(C[n]);
    }
}