Application: sorting a list of integers

A common step in data analysis task involves putting a list of data values into ascending order. This step is frequently used as the first step in a more complex data analysis task. In this example I will develop a simple algorithm that can sort a list of integers into ascending order.

The algorithm I will use to do this is one of the simplest algorithms for sorting, the selection sort algorithm.

Here is an outline of the main ideas in selection sort.

  1. The algorithm divides the list into two parts, a sorted part and an unsorted part.
  2. At start, the sorted part of the list is empty and the unsorted part covers the entire list.
  3. The algorithm proceeds as a series of rounds. On each round we do the following:
    1. Search the unsorted portion of the list for the smallest number we can find.
    2. Make the smallest number trade places with the number that sits at the start of the unsorted portion.
    3. Expand the sorted portion of the list by one to take in the new number, and shrink the unsorted portion by one
  4. The algorithm continues until the unsorted portion is empty and the sorted portion covers the entire list. At that point the list has been sorted.

Below is an implementation of this algorithm. For convenicnce, I have structured the algorithm in two functions. The function sort() implements the full sorting algorithm, while the function findSmallest() acts as a helper function to help sort() do its work.

def findSmallest(aList,start,end):
    """Find and return the location of the smallest
       number found between locations start and
       end in the list aList."""
    smallest = aList[start]
    whereSmallest = start
    for i in range(start+1,end):
        if aList[i] < smallest:
            smallest = aList[i]
            whereSmallest = i
    return whereSmallest

def sort(aList):
    """Sort aList into ascending order with
       selection sort."""
    for i in range(0,len(aList)):
        where = findSmallest(aList,i,len(aList))
        temp = aList[i]
        aList[i] = aList[where]
        aList[where] = temp


data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89]
sort(data)
for x in data:
    print(x)

Sorting in Python

The sorting algorithm I showed above makes a nice example, but the selection sort algorithm simply isn't powerful enough to be used for sorting longer lists.

For longer lists you should use the list sort() method:

data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89]
data.sort()

This method does what is called in-place sorting, which means that the elements in the original list get rearranged to put them in ascending order.

If you want to retain the original list for some reason, Python provides an alternative method to via the sorted() function. This function takes a list as its parameter and returns a new list that has the original elements sorted in ascending order:

data = [12, 23, 51, 14, 25, 67, 33, 46, 60, 89]
newList = sorted(data)

Another issue that comes up from time to time is sorting lists of tuples. Here is an example. Suppose you have a list of tuples where each tuple includes a student's name and a score they made on an exam.

examScores = [('Kim',87),('Anna',92),('Leon',89),('Shawn',85)]

If you use the sort() or sorted() method on this list of tuples, Python will automatically sort the data items using the first entry in the tuple. That is, the data will be sorted in ascending order of the names.

byName = sorted(examScores)

Python provides a way to sort the tuples based on some other member of the tuple. The first step in doing this is to construct a special function called a key function. This function takes a data tuple as its parameter and returns the member of the tuple that you would like to use as the basis for sorting.

def scoreKey(score):
    return score[1]

To sort the scores by exam score instead of name, we pass the key function as an additional parameter to the sorted() function.

byScore = sorted(examScores,key=scoreKey)