w09-3

Which programs are best suited to use linear search?

Linear search is still the most common type of search. We have used it many different labs and examples so far this semester, including Lab 5 (handman). The reason is simple: binary search is only useful if the items we are searching are in sorted order (e.g., smallest to largest). This is rarely the case in real-world examples and so linear search is very popular. In real-world problems, if search is a common need, we go through the effort of first sorting the values (next week’s topic!)

I’m confused about how the logarithm was derived.

Yes, this is not the most intuitive concept. First, let us think about what logarithms are. A logarithm is the complementary function to exponents. That is, 10^6 takes 10 and multiplies it by itself 6 times. The logarithm does the opposite - it asks how many times would have to multiple 10 by itself to get a certain result.

Here is another way to think about it: the logarithm you learned about was probably log base 10. It counts how many 0s are in a number. E.g., 1 million is 6 0s and therefore log(1000000) is 6. Counting how many 0s there is the same as asking “how many times can I divide by 10 before the number becomes 1”. 1000000 divide by 10 6 times (1,000,000/10/10/10/10/10/10) is 1.

Change the 10 above to 2 and you are asking log base 2 - how many times do I divide a number by 2 before it becomes 1. This fits binary search, which divides the size of a list by 2 until there is only 1 element left. So, N/2/2/2/2/…2 = 1. The number of times we divide by 2 is the log(N)

What is a case where the there would be a function with a log base besides two?

log base 2 is common in CS because we deal in binary. Physicists tend to use natural log, which is log base euler’s constant. log base 10 is also popular. We could write functions that do any of these. In the database course I teach, we learn about algorithms that divid a problem into as many as 5000 pieces at once giving us log base 5000!

What if we consider the average case scenario instead of the worse case, would the binary search is still better than the linear?

Definitely. It turns out that the average case is still O(N) for linear and O(log N) for binary search. To see why, think of the average scenario for linear search - you have to search half way through the list to find the value. That is N/2 steps. Since we drop constants, the big-O is still N!

What do we do when we don’t have a sorted list?

We attend Week 10 of CS21.

Is there another way to analyze run time other than binary and linear search?

I assume you mean analyze run time other than logarithmic or linear? Yes, we will see different categories of run time this week. If you mean are there other search algorithms, the answer is yes but not for lists (we may see a different way to store items that makes searching even faster).

Can linear searching be faster than sorting a list so that we can use binary search?

Yep! That is why linear search is still popular (notice that Python has linear search baked in to the system with the in operator, but does not have binary search). Sorting first to do one search is way more work. Sorting once so you can search over and over again, on the other hand, is useful.

What do you do when N is very large but not sorted?

Linear search and a fidget spinner to pass the time.

How do search engines implement their searches?

It varies, but hash tables are at the core of most algorithms. They are O(1) in the average case.

How is this going to be incorporated into a lab?

See Lab 8!

How could a function have a constant for a runtime?

The power of hashing. Create a function that gives a value a unique location that can be recovered by a simple equation. This is related to encrypting messages and passwords! See the hash table link above for more details; it seems complex, but it is a week long topic in CS35 that is really easy to pick up.

Are we always going to be instructed on whether to use binary or linear search?

No - you should be sure to learn the assumptions for each model and detect which search is best to use. Remember, linear search is the optimal choice if the list is not sorted.

Are there other types of searches besides binary, linear, and the one you mentioned with O(1)?

Dozens. Hundreds? It all depends on how you store the data. In a list, there are primarily the two we discussed.

Can binary search be used for lists of lists, if each individual list is sorted?

Sure, but that is tough to manage. Usually you can only sort by one part of the list at a time. For example, imagine you were storing a list of products and each item in the list has a [name, price, location, shipping_weight]. If you want to sort by price, you probably can’t also have it sorted by name.

Why is it called binary search?

That is a great question! I’m not sure why that name stuck. Half-interval or logarithmic search are less-common names. Dichotomous search also sounds better, doesn’t it? It’s probably because each step in the search serves up a binary choice (i.e., two options): search on the left half, or search on the right half.

How do we apply this search technique in a very general way to all algorithmic programs?

If binary search solved our problems, our field would be a relatively simple one! It turns out there are many problems that a binary “divide and conquer” approach is useful, but it depends on our objectives and our data.