knerr cs21 notes...

back to schedule

WEEK08: searching, analysis of algorithms
---------------------------------------------------------------
 M: linear and binary search

ANNOUNCEMENTS:
 - quiz 4 this Friday...while loops, file I/O!
 - Lab 7 due *next* Tuesday night (Nov 3)
 - if you want to use Andy Danner's lab 6 functions in your
    lab 7, do this:

from lab06 import *

   then use his GetAConsonant, GetMenuOption, IsAVowel, PrintMenu
   functions. For the function help info, do this:

>>> import lab06
>>> help(lab06)


NOTE: we're mostly done with teaching you python at this point.
There's lots more you could learn about python, but you've got
the basics now, and enough to write some interesting programs.
From here on out we'll focus more on Computer Science, rather
than computer programming...

SEARCHING:

 - basic search question...is x in y, where x is some single
   value, and y is a group of values

 - for us, x is a simple variable, and y is a list

 - if x = 12, and y = [7,30,-29,-4,88,49,-29,100,92], is x in y??

 - you can actually use the "in" operator to answer this:

>>> x = 12
>>> y = [7,30,-29,-4,88,49,-29,100,92]
>>> x in y
False
>>> x = 88
>>> x in y
True

 - how does the "in" operator work?? how would you search through
   the list and see if a certain value is there or not??

LINEAR SEARCH: go one at a time, through the whole list (a for loop!)

def linSearch(x, nums):
    for i in range(len(nums)):
        if nums[i] == x:       # item found, return the index value
            return i
    return -1                  # loop finished, item was not in list

 - is linSearch any good? how long does it take? does it 
   matter if len(nums) is 10 or 10,000,000,000???

 - in the best case, the item you're looking for is first in the
   list. In the worst case, the item is last or not in the list:

        best case: takes 1 "step"
        worst case: takes len(nums) "steps"

 - if we double the size of the list (nums), the worst case also
   doubles in size. This means the time it takes linSearch to work
   is proportional to the size of the list

BINARY SEARCH:

 - *if* the list is sorted, there's a better way!

 - start in the middle of the list -- if that's the one you're looking
   for, then you're done. If not, what you're looking for must either
   be greater or less than the middle value. Either way, you can ignore
   one half of the list and start the process over with the other half
   of the list. Here's an example:

>>> y.sort()
>>> print y
[-29, -29, -4, 7, 30, 49, 88, 92, 100]

   suppose we're looking for x = 12 in y.

   start with the 4th element (30). That's not it, and we know the
   list is sorted, and we know 12 is less than 30, so let's refocus
   on the left half of the list [-29, -29, -4, 7]. 

   Pick the middle of the left-half -- either the 1st or 2nd element,
   and check for x. Let's say we picked the 1st element (-29), which
   is not what we are looking for, and x=12 is greater than -29, so
   now we can refocus on just [-4, 7] (and so on....)

 - the point is, each time we *halve* the list.

 - if the list has 8 elements in it, we can only halve that 3 times
   until we've either found x or it's not in the list

 - if the list has 16 elements in it, we can only halve that 4 times.

 - so each time we double the size of the list, it's only one more
   step we need to make!!! (unlike the linear search, where each time
   we double the size of the list, we double the number of steps needed)

 - so the time for binary search to do it's work is equal to "how many
   times can we halve the size of the list"

 - the question "how many times can we halve the list" can be answered
   with logarithms: log(base 2) of N = how many times we can halve N.
   try this in python:

>>> import math
>>> math.log(8,2)
3.0
>>> math.log(16,2)
4.0
>>> math.log(32,2)
5.0

   each time we double N, that's only one more step...
   and this pays off big when N is large:

>>> math.log(1000000,2)
19.93

   so even if the list had 1 million elements, it would still only
   take about 20 steps to find what we're looking for (linear search
   would take 1 million steps in the worst case).

YOUR TURN:

 - copy and try these programs for N=100000, 200000, 400000, 800000:

/home/jk/inclass/linearsearch.py
/home/jk/inclass/binarysearch.py

$ python linearsearch.py 
enter N: 100000
linear search: 
List built
1 2 3 4 5 6 7 8 9 10 
average: 0.0199360847473

 - note how linear search's time approximately doubles each time
   you double N (try it!!! run the code for different N)

 - also note how much faster binary search is, and how it's time
   changes as you double N