Week 10: Sorting

Lectures:

You must be logged on to your swarthmore.edu account for the youtube lecture videos to be viewable

Searching/Sorting animations:

Preview of Lab 8

In Lab 8 you will be searching a real-world data set of restaurants from Yelp. This data set is stored in sorted order by restaurant name. You will read in each line of data and create a list of Restaurant objects to store this data. Next you will create a program that allows to you to explore this data through various searches. In some cases a binary search will work and in other cases you’ll need to use a linear search.

Here’s an example of how to open the small file containing information about 12 restaurants, to read in one line from this file, to make a Restaurant object, and to access methods of that object.

python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from restaurant import *
>>> r = Restaurant()
>>> file = open("/home/alinen/public/cs21/restaurants-small.json", "r")
>>> line = file.readline()
>>> r.loadFromLine(line)
>>> r.getName()
'1 Pot'
>>> r.getStars()
3.5
>>> r.getState()
'AB'
>>> r.getCity()
'Calgary'
>>> r.getReviewCount()
31
>>> r.getCategories()
['Buffets', ' Chinese', ' Hot Pot', ' Restaurants', ' Dim Sum', ' Asian Fusion', ' Fondue']

Monday: Sorting

We talked last week about the fact that search is ubiquitous, and we saw that it is much more efficient to search a sorted list than an unsorted one. Therefore, finding good sorting algorithms is also a key problem in Computer Science.

We will focus on sorting algorithms that sort a list in-place. In other words, we will update the list such that when we are done sorting the items in the list have been moved into their correct sorted order.

In order to sort a list in-place there are two primary operations we will focus on:

  • comparing two items in the list

  • swapping two items in the list

Bubble Sort

Bubble sort works by repeatedly looping through the list and comparing adjacent items. If two adjacent items are out of order, it swaps them, incrementally bubbling up the largest items to the end of the list.

Suppose we had the following list:

[5, 3, 8, 0, 4, 1, 7]
 ^  ^

Bubble sort compares 5 with 3, and swaps them:

[3, 5, 8, 0, 4, 1, 7]
    ^  ^

Next it compares 5 with 8, and does nothing since they are already in order.

[3, 5, 8, 0, 4, 1, 7]
       ^  ^

Then it compares 8 with 0 and swaps them:

[3, 5, 0, 8, 4, 1, 7]
          ^  ^

Compares 8 with 4 and swaps them:

[3, 5, 0, 4, 8, 1, 7]
             ^  ^

Compares 8 with 1 and swaps them:

[3, 5, 0, 4, 1, 8, 7]
                ^  ^

Compares 8 with 7 and swaps them:

[3, 5, 0, 4, 1, 7, ||| 8]
 unsorted              sorted

Notice that the largest element in this list, 8, has bubbled up to the end of the list. It is in it’s final sorted position and doesn’t need to be checked again.

Each iteration of Bubble sort moves one more item into its final sorted position.

Bubble sort would begin this whole process again, starting at the front of the list, but this time it would stop when it got to the second to the last element. Here’s what the list would look like after this iteration:

[3, 0, 4, 1, 5, ||| 7, 8]
 unsorted           sorted

What would the list look like after the next iteration?

[0, 3, 1, 4, ||| 5, 7, 8]
 unsorted        sorted

Wednesday: More sorting

On Monday we focused on Bubble Sort, and determined that it is an O(n2) algorithm. Unfortunately quadratic algorithms are quite slow. Let’s explore other sorting methods to see if we can find something more efficient.

Selection Sort

The key idea of selection sort is that it repeatedly selects the minimum item remaining in the unsorted portion of the list and then swaps it into its final sorted location.

Consider how selection sort would operate on following list:

[5, 3, 8, 0, 4, 1, 7]
 0  1  2  3  4  5  6

When it begins, the entire list is unsorted. Therefore, the goal is to get the minimum item in the list into position 0. To do this it keeps track of the indexOfMin. It always starts this variable at the first index of the unsorted portion of the list, and updates it whenever it finds a smaller item.

The indexOfMin starts at 0. Since 3 < 5, indexOfMin is updated to location 1. Since 8 is not < 3, indexOfMin is unchanged. Since 0 < 3, indexOfMin is updated to location 3. All other locations in the list will be checked to see if their contents are < 0, and they aren’t.

After the entire unsorted portion of the list has been checked, one swap is done between locations indexOfMin and the location at the beginning of the unsorted portion. The current ordering in the list will become:

[0, 3, 8, 5, 4, 1, 7]
 0  1  2  3  4  5  6

I will redraw this to make it clear where the division between the sorted and unsorted portions of the list occurs.

[0,     ||| 3, 8, 5, 4, 1, 7]
 0          1  2  3  4  5  6
 sorted     unsorted

Next, selection sort will repeat the process described above. It will find that the indexOfMin is now 5, and it will swap the items at locations 1 and 5. Giving us this new list ordering:

[0, 1, ||| 8, 5, 4, 3, 7]
 0  1      2  3  4  5  6
 sorted    unsorted

Notice that the sorted portion of the list is growing by one more item each iteration through selection sort.

What would the list look like after the next iteration?

[0, 1, 3, ||| 5, 4, 8, 7]
 0  1  2      3  4  5  6
 sorted       unsorted

Like bubble sort, selection sort is implemented as a loop within a loop, where the amount of work can be described as the summation (n-1)(n-2) …​ + 1 which is O(n2). So unfortunately, selection sort does not improve on the worst case running time of bubble sort.

Wednesday: Sorting continued

We continue our quest to find a faster sorting algorithm.

Insertion sort

Insertion sort creates a sorted portion at the front of the list, and incrementally inserts one new item into this portion on every iteration. Each new item is compared to the preceding item. If the new item is smaller, a swap is done. This process continues either until the new item is no longer smaller than the preceding item or the front of the list is reached.

Here’s an example:

[5, 1, 7, 3, 4, 8]

Let’s consider the first item of the list to be the start of the sorted portion.

[5,    ||| 1, 7, 3, 4, 8]
sorted     unsorted

Now, we insert 1 into this sorted portion:

1 works its way down
[5, 1]
[1, 5]

[1, 5 ||| 7, 3, 4, 8]
sorted    unsorted

Next, we insert 7 into this sorted portion:

7 works its way down, but no need to move it at all!
[1, 5, 7]

[1, 5, 7 ||| 3, 4, 8]
sorted       unsorted

We continue to add in one item at a time, working it down until it is in sorted order:

3 works its way down
[1, 5, 7, 3]
[1, 5, 3, 7]
[1, 3, 5, 7]

[1, 3, 5, 7, ||| 4, 8]
sorted           unsorted

We do this for the rest of the items in the unsorted portion of the list:

4 works its way down
[1, 3, 5, 7, 4]
[1, 3, 5, 4, 7]
[1, 3, 4, 5, 7]

[1, 3, 4, 5, 7, ||| 8]
 sorted             unsorted

And the final item:

8 works its way down, but no need to move at all!
[1, 3, 4, 5, 7, 8]

Like both bubble and selection sort, insertion sort is implemented as a loop within a loop, which in the worst case also has O(n2) running time.

Comparing sorting algorithms

Although all three of these algorithms have the same big O in the worst case, in practice they may have different performance characteristics. Let’s measure how many swaps and how many comparisons each does on the same randomly ordered list of size 100.

Table 1. Summary of sorting algorithms
Algorithm Worst case Num swaps Num comps

Bubble

O(n2)

2225

4950

Selection

O(n2)

100

4950

Insertion

O(n2)

2225

2225

Based on our experiments, its clear that Bubble sort is not a great option—​it does a lot of swaps and makes a lot of comparisons. Both Selection and Insertion will be faster in practice than Bubble. Selection makes many fewer swaps, and Insertion makes fewer comparisons.