w10-3

Are people still discovering new sorting/searching algorithms, or does the field seem to be pretty exhausted? At least in terms of finding the most efficient algorithms for general lists?

That is an excellent question. We do know that for the general sorting problem, the upper bound (the best we can do) is N log N. There are many known algorithms that accomplish this (merge sort and quick sort being the most popular). However, if we have certain scenarios, we can better. For example, if you have multiple computers that can communicate and run separately, we can devise parallel sorting algorithms. If you know that the items you are sorting occur in a finite range (e.g., I have 500 items and the range of possible values are integers between 1 and 1000), we can take advantage of the finite aspect to come up with quicker algorithms. Another hard problem is how do we sort data that is so massive, it can’t all fit on the computer at once (e.g., Google’s index of all web pages in the world). The traditional algorithms we came up with fail miserably in this scenario.

On top of that, while we initially care about the worst-case run time category (e.g., log N vs N*N), actual software engineers try to eek out every possible second of savings. There is an annual competition for who can come up with the fastest sorting algorithm. Check out the 2016 winners! The winners sorted 100 Terabytes of data in 134 seconds! That’s about 1000 times more data than your laptop can hold.

Can we make selection sort so that the first iteration gets put to the end of the list instead of the front?

Yes! We can flip the guarantee so that the maximum i items are in the last i locations.

Is log(n) faster than n?

It is much faster. See the table from last week comparing binary to linear search. log N describes the orders of magnitude. So going from e.g., 20 to 21 is not just an increase of 1, but a doubling of the run time.

Why would insertion sort be used over different sorting methods?

It probably wouldn’t be used in most scenarios. In fact, none of the three we covered are what Python uses for its built-in sorting algorithm. They use quick sort.

That being said, insertion sort is often handy when you want to add new items to a sorted list. Let’s say you sorted a list of animals at the zoo by name. Then the zoo procured 100 new animals. You could start from scratch and resort all items, or you could treat the already sorted animals as the left-side of the list (sorted) and the new animals as the right side (unsorted) and start insertion sort.

How is the insertion sort different from the opposite of bubble sort?

The guarantees (invariants) are different. The internal mechanism for moving items is similar, but the way they march down the list differs.

How can you sort lists with various types within them?

You have to choose a way to compare items. You can actually redefine the relational operators (>,<,==, etc) for your specific data. We’ll see in two weeks how we can definite new types; we can also create a method for an object that compares it to another object of the same type.

how can you ask the computer to decide wich sorting method is better based on the type of data it is sorting

That’s a little bit tricker. You could try to estimate the number of out-of-place items (called inversions). If there aren’t many, bubble sort would be a good choice. If there are a lot, than selection sort.

how could you sort a list of strings from the back of each word? (last letter alphabetical)

You’d have to define your own comparison function (well beyond the scope of CS21) e.g.,

def reverseGreater(strA, strB):
  revA = reverseString(strA)
  revB = reverseString(strB)
  return revA > revB

def reverseString(s):
  accum = ""
  for ch in s:
    accum = ch+accum
  return accum

In your sorting algorithm, instead of asking:

if strA > strB:
  #swap

you can write:

if reverseGreater(strA, strB):
  #swap

What is the advantage of running insertion/bubble/selection if they give around the same run time?

Bubble: can detect early stop if list is already sorted Selection: many fewer swaps in worst (and average case) Insertion: if adding unsorted items, or if list is already partially sorted

Selection, for random elements, tends to be faster than the other two because of the swaps.

The ‘O’ just means worst case scenario, right?

It means lower bound. We can have a big-O for best case and average case as well. For CS21 purposes, you can treat big-O as asymptotic growth. That is, what major mathematical function most closely mirrors the growth of run time relative to input size? If you were to do the exact calculation of run time in terms of steps, big-O would drop all constants and only keep the largest term.

Is there anything faster than selection sort?

Yes! Quicksort and merge sort are N log N algorithms.

The Big O notation for bubble and selection were still kind of hard to understand today…

Sure. The key idea is that we have nested for loops that both look similar to this:

for each element in list
  for each element in list
    #do comparisons

The line #do comparisons is the most important step since it happens most often. If there are N items in our list, we know the inner loop will run N times with 1 comparison per iteration. Therefore, the inner loop, by itself, does N comparisons. The outer loop repeats this for each item in the list, doing the inner loop (which has N amount of comparisons) for each of N items. Rolling that out, we do N work for iteration i=0, N work for iteration i=1, etc. N+N+N…+N (N times) is the same as N*N which is N^2.

It is actually the same as this algorithm for getting everyone in the room to shake hands:

for each person in the room:
  for every other person in the room:
    shake hands

If everyone in the class shook hands with every other person, we’d have N^2 hand shakes.

What helps decide whether to use bubble or selection sort, and binary or linear search?

See above for bubble vs selection. For binary vs linear, we choose binary if we can, and linear if we cannot. Binary is only available if the list is sorted.

How many types of sorts and searches are there?

I’m not sure anyone has enumerated all of the sorting algorithms, since some end up being equivalent to previously invented ones. This page lists the most common ones. There are so many variations because of the different scenarios we can come up with (see the first question).

Search is a little more limited, since all variations generally fall under linear, binary, or hash functions.

Why can you disregard N in the big O run time? Is it just very small compared to N^2?

That is exactly correct! If you have a billion items, your problem is the N^2 part, not the N part.

How each of the three sorting algorithms deal with the list that has the same values twice? For example, [4,7,11,8,11,9]

You can break ties arbitrarily. In some circumstances, people perfer stable sorting algorithms, which preserve the original ordering if there is a tie (that is, the 11 at index 2 will always be to the left of the 11 at index 4). But this rarely matters.

Will we always be instructed which type of sorting method to use?

If we want you to show work, probably. You should know how to show a couple iterations of bubble and selection sort, and you should be able to describe the run times of each in terms of comparison and swaps. On the lab, we’ll let you use whichever you like most.

How can we decrease the big O time, or is that inherent in the structure of the type of sort such as bubble sort.

Our second course in the curriculum is called Data Structures and Algorithms. Run time can be decreased by coming up with a better algorithm (e.g., merge sort and quicksort are better than the ones we discussed) or by organizing the data in a different way than a list (data structure).

I heard someone mention that a heap sort is useful, and would like to know what kind of sort that is.

Heapsort is also N log N. It structures that data as if it were in a tree and processes the resulting tree in an efficient way. It is also covered in the next course because it requires a little more math and understanding of how to structure data.

What are other ways to sort?

Have fun!

How does python actually execute a swap

It has to move tiny little elephants :)

This requires a lot more understanding of how memory is physically stored on the CPU. It varies based on architecture. But, in mirrors closely the first style of swap I showed, where we have a temporary variable.