Week 9: Searching and Analysis of Algorithms

Lectures:

You must be logged on to your swarthmore.edu account for the youtube lecture videos to be viewable

Logistics

  • We have further delayed both quizzes

    • Quiz 4 will be available April 6

    • Quiz 5 will be available April 20

Search is the process of looking for a particular value in a collection of values.

Search is ubiquitous! You probably use google search multiple times every day. It is a key topic in Computer Science. We want search to be as efficient as possible.

This week we will look at several ways to implement search. We want all of our search algorithms to have an interface as shown below. The search takes two parameters, an item and a list, and returns an index of where the item was found or a special index (-1) if the item was not found.

def search(item, ls):
    """
    Purpose: find an item in a list of items
    Inputs:  item to search for and a list to search in
    Returns: index of the item if found, otherwise -1
    """
    # How should we implement this?

We need a systematic way of searching through a list.

The linear search approach uses a for loop to start at the beginning of the list and look at each item of the list in turn until it finds the item we want, or gets to the end of the list.

def linearSearch(item, ls):
   for i in range(len(ls)):
      if item == ls[i]:
         return i
   return -1

Notice that the only way to return -1 is to have checked every item in the list and not found the item we are looking for.

Analysis of algorithms

Analysis of algorithms is a sub-discipline of Computer Science that studies how to quantify the amount of work each algorithm requires.

Typically when we analyze an algorithm we use the variable N to represent the size of the problem that we are trying to solve.

Question: In searching a list for an item, what would N represent?

Answer: It would represent the size of the list to be searched.

So, if N is the size of the list to be searched, in the worst case it would take linear search N steps to find out that the item we want to find is NOT in the list.

For a list containing 10,000 items it would take 10,000 steps to determine that some item is NOT in the list.

For a list containing 20,000 items it would take 20,000 steps to determine that some item is NOT in the list.

As you can see, the amount of work that linear search requires grows linearly with the size of the list, thus the name linear search.

Linear search is the default method to use when we know nothing about the ordering of the items we are searching. However, if we know that the items are in sorted order, then we can come up with more efficient ways to search.

Play a guessing game with someone. Have them think of a number between 1 and 100. Your goal is to guess that number in the fewest number of tries. For each guess you try, they will tell you whether the actual number is lower or higher.

Given this scenario, the best strategy is to always try the number halfway between the lowest possible number and the highest possible number remaining. So your first guess should be 50.

If they tell you the number is lower, then your next guess should be halfway between 1 and 49. If they tell you that the number is higher, then your next guess should be halfway between 51 and 100.

By guessing in this way you will continually eliminate half of the remaining numbers each time and quickly home in on the correct number.

The binary search strategy takes a similar approach to searching a sorted list, except it uses the indices of the list to be searched. It keeps track of the low_index, high_index, and mid_index.

You must be on campus or using the Swarthmore VPN to view these slides.

Friday: Recap of search and Big-O

Examples of different running times

See Friday’s video for details.