In an earlier lecture I discussed sorting in Python. In today's lecture we are going to take a more in-depth look at this topic.
In that earlier lecture I showed some code for the simplest sorting algorithm, selection sort.
Here is an outline of the main ideas in selection sort.
Below is an implementation of this algorithm. For convenicnce, I have structured the algorithm in two functions. The function sort()
implements the full sorting algorithm, while the function findSmallest()
acts as a helper function to help sort()
do its work.
def findSmallest(aList,start,end): smallest = aList[start] whereSmallest = start for i in range(start+1,end): if aList[i] < smallest: smallest = aList[i] whereSmallest = i return whereSmallest def sort(aList): for i in range(0,len(aList)): where = findSmallest(aList,i,len(aList)) temp = aList[i] aList[i] = aList[where] aList[where] = temp data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89] sort(data) for x in data: print(x)
To give us more practice in writing more complex loops, here is a second sorting algorithm, insertion sort.
Here is an outline of the main ideas in insertion sort.
Here now is code for insertion sort.
def sort(aList): for i in range(1,len(aList)): j = i while j > 0 and aList[j] < aList[j-1]: temp = aList[j] aList[j] = aList[j-1] aList[j-1] = temp j -= 1 data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89] sort(data) for x in data: print(x)
More powerful and efficient sorting algorithms make use of a strategy called recursion. This strategy handles a problem like sorting by breaking the list into parts, sorting each part separately, and then combining the parts together.
Here is the code for merge sort.
def merge(aList,i,j,k): first = [aList[n] for n in range(i,j+1)] second = [aList[n] for n in range(j+1,k)] n = 0 m = 0 p = i while n < len(first) and m < len(second): if first[n] < second[m]: aList[p] = first[n] n += 1 else: aList[p] = second[m] m += 1 p += 1 while n < len(first): aList[p] = first[n] n += 1 p += 1 while m < len(second): aList[p] = second[m] m += 1 p += 1 def sort(aList,i,j): if j > i: mid = (i + j)//2 sort(aList,i,mid) sort(aList,mid+1,j) merge(aList,i,mid,j) data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89] sort(data,0,len(data)) for x in data: print(x)
The QuickSort algorithm also uses recursion to do its work. This algorithm starts with a partition step that selects one of the items in the list to be the pivot value and then reorganizes the list into items less than the pivot, the pivot, and items greater than the pivot. We then recursively go to work sorting the parts of the list on either side of the pivot.
def partition(aList, i, j): k = i - 1 pivot = aList[j-1] for n in range(i, j-1): if aList[n] <= pivot: k += 1 temp = aList[k] aList[k] = aList[n] aList[n] = temp aList[j-1] = aList[k+1] aList[k+1] = pivot return k+1 def sort(aList,i,j): if j > i: k = partition(aList,i,j) sort(aList,i,k) sort(aList,k+1,j) data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89] sort(data,0,len(data)) for x in data: print(x)
There are a number of sites on the Internet where you can find animations that illustrate the operation of basic algorithms, such as these four sorting algorithms. One such site is www.hackerearth.com, which provides nice visualizations of many sorting algorithms, including the four I showed here.
The four sorting algorithms that I showed above fall into two broad categories: elementary algorithms and recursive algorithms. The elementary algorithms are elementary because they rely on ideas that are easy to understand and code. The recursive algorithms are somewhat less obvious, but end up having much better performance than the elementary algorithms.
To illustrate the performance difference between the algorithms I am going to set up a demonstration. In this demonstration we are going to have sorting algorithms compete to sort the same list of numbers and time each algorithm. This will give us immediate feedback on the relative efficiency of the algorithms.
To give the competing sorting algorithms a common list of numbers to sort, I am going to start by writing a simple program that generates a long list of random integers. The program will save these integers to a file for the sorting program to read from.
import pickle import random nums = [random.randint(1,100000) for _ in range(20000)] pickle.dump(nums, open('nums.p','wb'))
The program to save the numbers makes use of the Python pickle library, which provides a simple and convenient way to dump binary data from a Python program into a binary file.
Here now is the program to read the numbers from the file and sort them twice, once with insertion sort and a second time with QuickSort.
import pickle import time def insertion(aList): """Sort aList into ascending order with insertion sort.""" for i in range(1,len(aList)): j = i while j > 0 and aList[j] < aList[j-1]: temp = aList[j] aList[j] = aList[j-1] aList[j-1] = temp j -= 1 def partition(aList, i, j): k = i - 1 pivot = aList[j-1] for n in range(i, j-1): if aList[n] <= pivot: k += 1 temp = aList[k] aList[k] = aList[n] aList[n] = temp aList[j-1] = aList[k+1] aList[k+1] = pivot return k+1 def quicksort(aList,i,j): if j > i: k = partition(aList,i,j) quicksort(aList,i,k) quicksort(aList,k+1,j) nums = pickle.load( open( 'nums.p', 'rb' ) ) now = time.time() insertion(nums) later = time.time() print('Insertion sort took ',later - now,' seconds.') nums = pickle.load( open( 'nums.p', 'rb' ) ) now = time.time() quicksort(nums,0,len(nums)) later = time.time() print('QuickSort took ',later - now,' seconds.')
This program shows a dramatic difference between the speed of the two algorithms. With a list of 20000 random integers, insertion sort takes 19.8 seconds to sort the list, while QuickSort takes only 0.04 seconds. This difference gets even more pronounced as you increase the length of the list. Insertion sort takes time proportional to the square of the length of the list to sort, so doubling the length of the list increases the sorting time by a factor of 4. QuickSort, on the other hand, takes time that is proportional to n log n to sort a list of n items. Doubling the length of the list only increases the sorting time by a little over a factor of 2.
In this assignment we are going to combine the insertion sort and QuickSort algorithms to make QuickSort even more efficient. One aspect of QuickSort that is less than ideal is the fact that QuickSort subdivides the list to be sorted into lists of size one. An improvement on this is to use a hybrid strategy. In the hybrid strategy we modify QuickSort so that when it encounters a sublist whose size is smaller than some preset cutoff it uses insertion sort to sort the sublist rather than continue recursively subdividing the sublist into smaller and smaller lists.
Implement this version of QuickSort, and have it transition to using insertion sort whenever it comes across a sublist of size smaller than 10. Compare this modified version of QuickSort to the original on a large list of numbers, and have your program demonstrate that the modified version is faster.