knerr cs21 notes...

back to schedule

WEEK09: sorting (and more analysis of algorithms)
---------------------------------------------------------------
 M: selection sort vs mylist.sort()

FROM LAST TIME:

 - we fixed up anagrams.py to use a binary search
 - how do anagrams.py and binarysearchanagrams.py compare
   when given a 7-letter word (like parsley)? how about
   and 8-letter word (like predator)?

   (see /home/jk/inclass/binarysearchanagrams.py for details)


SORTING:

 - python has a built-in sort function for lists:

>>> L = range(10)
>>> print L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> from random import *
>>> shuffle(L) 
>>> print L
[3, 1, 7, 4, 5, 9, 0, 6, 2, 8]
>>> L.sort()
>>> print L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

 How does L.sort() work, and is it a *good* sort to use???

 - how would *you* sort numbers in a list? if you had to
   write a function to sort a given list, how would you do it?

   (try it out using some playing cards -- see if you can come
    up with a sorting algorithm)

 - here's one idea, called selection sort:

     * assume we have a list called L
     * find the smallest number in the list, swap it with L[0]
     * find the smallest number in the list, except for L[0],
       and swap it with L[1]
     * find the smallest number in the list, except for L[0] 
       and L[1], and swap it with L[2]
     * and so on....

   it's called selection sort because each time you are selecting
   the smallest number from the remaining unsorted elements

 - here's one way to code selection sort:

def selectionSort(L):
  """
  given a list, sort from lowest to highest
  """
  for i in range(len(L)):
    ismall = i  
    # find index of smallest element of rest of list 
    for j in range(i+1,len(L),1):
      if L[j] < L[ismall]:
        ismall = j
    # swap smallest element with L[i]
    L[i],L[ismall] = L[ismall],L[i]


PYTHON:
 
 - this is python's own shorthand notation for swapping 
   two elements in a list:

        L[i],L[j] = L[j],L[i]

   what it is really doing is something like this:

        tmp = L[i]
        L[i] = L[j]
        L[j] = tmp
 

YOUR TURN:

 - copy sorts.py and add the selection sort
 - try selection sort vs L.sort() for N=5000, 1000, 2000, 4000:

    * which is faster?
    * how do the times change as N doubles?
    * how does selection sort's time depend on N?

list size (n): 10
[219, 908, 21, 837, 417, 852, 409, 739, 385, 96]
L.sort time: 0.0000
[21, 96, 219, 385, 409, 417, 739, 837, 852, 908]
-----------------------------------
[385, 908, 837, 417, 219, 96, 409, 852, 21, 739]
selectionSort(L) time: 0.0001
[21, 96, 219, 385, 409, 417, 739, 837, 852, 908]

list size (n): 5000
L.sort time: 0.0053
-----------------------------------
selectionSort(L) time: 3.1940

list size (n): 10000
L.sort time: 0.0114
-----------------------------------
selectionSort(L) time: 12.2937

list size (n): 20000
L.sort time: 0.0246
-----------------------------------
selectionSort(L) time: 50.9708