Data Structures and Algorithms

Test 2 Study Guide

The tests in this course may contain any material covered in lecture up until this point, although much stronger emphasis will be placed on material for which you have received graded feedback. This page contains a collection of exercises that you may wish to complete to prepare yourself. We also encourage you to review the lecture notes from topics presented since the previous test.

Sorting

You should make sure to review the sorting algorithms we covered in class and be ready to discuss them.

What is the worst-case time complexity of Insertion Sort? QuickSort? MergeSort?

Insertion Sort and QuickSort both have worst-case complexity of \(O(n^2)\). MergeSort has a worst-case complexity of \(O(n \log n)\).
Although QuickSort’s worst-case time complexity is worse than MergeSort’s, we use QuickSort as a default in practice. Why?

Randomized QuickSort’s expected worst-case complexity is \(O(n \log n)\). Also, QuickSort is an in-place sort, meaning that it performs less copying than MergeSort.
What is an in-place sorting algorithm?

An in-place sorting algorithm is one in which the contents of the array are sorted while remaining within the array.
What is wrong with the following QuickSort pseudocode? What would happen if we used this pseudocode to sort an array? How could we fix it?
```
Function brokenQuickSort(array, left, right)
    pivot <- partition(array, left, right)
    brokenQuickSort(array, left, pivot-1)
    brokenQuickSort(array, pivot+1, right)
EndFunction
```
There is no recursive base case in the provided pseudocode. If this code were executed, it would break the list down into paritions of size 0 and keep trying to split those size 0 partitions indefinitely. We need to add a condition that does not try to sort if the array is of size less than two.

Write pseudocode for a partition function which can be used in the QuickSort algorithm.

Any partition function which correctly organizes the array into a “low section” and a “high section”, returning the index of the pivot between them, is acceptable. Here’s the Lomuto algorithm which was discussed in class:

Function partition(array, start, end)
 pivot <- array[end]
 frontier <- start
 fence <- start
 While frontier < end
   If array[frontier] < pivot
     swap array[frontier] with array[fence]
     fence <- fence + 1
   EndIf
   frontier <- frontier + 1
 EndWhile
 swap array[end] with array[fence]
 Return fence
EndFunction

Write pseudocode for the merge function of the MergeSort algorithm.

Function merge(a1, a2, am)
 i <- 0
 j <- 0
 k <- 0
 While i < length(a1) And j < length(a2)
   If a1[i] < a2[j]
     am[k] <- a1[i]
     i <- i + 1
     k <- k + 1
   Else
     am[k] <- a2[j]
     j <- j + 1
     k <- k + 1
   EndIf
 EndWhile
 While i < length(a1)
   am[k] <- a1[i]
   i <- i + 1
   k <- k + 1
 EndWhile
 While j < length(a2)
   am[k] <- a2[j]
   j <- j + 1
   k <- k + 1
 EndWhile
EndFunction

Big-O Proofs

Recall the definition of big-\(O\) notation we used in class: \(f(n)\) is \(O(g(n))\) if there are some positive \(c\) and \(n_0\) such that, for every \(n\geq n_0\), \(f(n) \leq c \cdot g(n)\). You should be prepared, given this definition, to prove complexity bounds on functions.

Prove that \(f(n) = \frac{n^2+1}{n}\) is \(O(n)\).

We want to show that \(\frac{n^2+1}{n}\) is \(O(n)\). Let \(n_0=1\) and let \(c=2\). For all \(n\geq n_0\), we have

\(\frac{n^2+1}{n}\)	\(=\)	\(n + \frac{1}{n}\)	(because \(n \neq 0\))
	\(\leq\)	\(n + n\)	(because \(\frac{1}{n} \leq n\) when \(n \geq 1\))
	\(=\)	\(n + n\)
	\(=\)	\(2n\)
	\(=\)	\(c\cdot n\)

This shows, when \(c=2\) and \(n_0=1\), we have for every \(n\geq n_0\) that \(f(n) \leq c\cdot g(n)\), so \(f(n)\) is \(O(g(n))\).

Prove that \(f(n) = 3n^3 + 5n^2 + 8\) is \(O(n^3)\).

We want to show that \(3n^3 + 5n^2 + 8\) is \(O(n^3)\). Let \(n_0=1\) and let \(c=16\). For all \(n\geq n_0\), we have

\(3n^3 + 5n^2 + 8\)	\(\leq\)	\(3n^3 + 5n^3 + 8\)	(because \(5n^2 \leq 5n^3\) when \(n\geq1\))
	\(\leq\)	\(3n^3 + 5n^3 + 8n^3\)	(because \(8 \leq 8n^3\) when \(n\geq1\))
	\(=\)	\(16n^3\)
	\(=\)	\(c\cdot n^3\)

This shows, when \(c=16\) and \(n_0=1\), we have for every \(n\geq n_0\) that \(f(n) \leq c\cdot g(n)\), so \(f(n)\) is \(O(g(n))\).

Prove that \(f(n) = (2n+4)(n-1)\) is \(O(n^2)\).

We want to show that \((2n+4)(n-1)\) is \(O(n^2)\). Let \(n_0=2\) and let \(c=3\). For all \(n\geq n_0\), we have

\((2n+4)(n-1)\)	\(=\)	\(2n^2+2n-4\)	(by distribution)
	\(<\)	\(2n^2+2n\)	(because \(-4 < 0\))
	\(\leq\)	\(3n^2\)	(because \(2n \leq n^2\) when \(n\geq2\))
	\(=\)	\(3n^2\)
	\(=\)	\(c\cdot n^2\)

This shows, when \(c=3\) and \(n_0=2\), we have for every \(n\geq n_0\) that \(f(n) \leq c\cdot g(n)\), so \(f(n)\) is \(O(g(n))\).

Data Structures and Invariants

You should be prepared to discuss the data structures we have created in class, the fields in their implementations, and the invariants we rely upon in order to write algorithms for those data structures.

What is an ADT? What is the difference between an ADT and a data structure? Give two examples of each.

An ADT is a description of how data can be accessed and manipulated. In other words, an ADT is an interface allowing the caller to store and retrieve data. A data structure is a concrete realization of an ADT; data structures give specific algorithms to implement the interface that the ADT describes.

Two examples of ADTs are lists and stacks. Lists and stacks are both defined by a collection of routines that allow a user to store and manipulate data (e.g. insertFirst, getLast, etc. for list; pop, push, etc. for stack). Two examples of data structures are linked lists and array lists. Both linked lists and array lists are concrete implementations of the list ADT; each of them give implementations for the interface that the list ADT describes.
What is the difference between a List and a Queue? When might we use a Queue instead of a List?

List and Queue are two different ADTs. The List ADT supports a variety of operations (such as get by index) that a Queue does not. A Queue may be implemented using a List but is not a kind of List.

We might use a Queue instead of a List if the operations provided by Queue are enough for our algorithm. By using a “weaker” data structure, the algorithm that uses it is easier to verify as correct since it’s easier to understand how the data structure behaves.

What are the big-\(O\) complexities of the following operations? Answer for each of (a) LinkedList with a tail pointer, (b) LinkedList without a tail pointer, and (c) ArrayList. Briefly explain each answer.

Insert to the front of the list.
Insert to the back of the list.
Retrieve the first element of the list.
Retrieve the middle element of the list.
Remove an element from the back of the list.

	`LinkedList` with `tail`	`LinkedList` without `tail`	`ArrayList`
`insertFirst`	\(O(1)\) A new head is created and linked to the old head.	\(O(1)\) Identical to `LinkedList` with `tail`.	\(O(n)\) Every element must shift down one space.
`insertLast`	\(O(1)\) A new tail is created and linked to the old tail.	\(O(n)\) Without a `tail` pointer, we must walk to the end of the list to find the tail.	amortized \(O(1)\) We add to the end of the array; this may require us to grow the array.
`getFirst`	\(O(1)\) The first node of the list is immediately available.	\(O(1)\) Identical to `LinkedList` with `tail`.	\(O(1)\) Accessing an array is constant time.
`get`	\(O(n)\) We have to find the appropriate node.	\(O(n)\) Identical to `LinkedList` with `tail`.	\(O(1)\) Accessing an array is constant time.
`removeLast`	\(O(n)\) The `tail` pointer lets us find the last element, but we need to find the penultimate element (second to last) so we can make it the new last element. This requires us to walk the list.	\(O(n)\) We must walk to the end of the list to find the last two nodes.	\(O(1)\) We can simply decrease the `size` of the `ArrayList` by one; the array doesn’t need to change.

What is amortized complexity?

Amortized complexity is the average complexity of an algorithm over a sequence of calls to it. For instance, the amortized worst-case complexity of ArrayList’s insertLast is \(O(1)\) even though the worst case complexity is \(O(n)\): on any given call to insertLast, \(O(n)\) is the worst possible outcome. However, a sequence of \(n\) calls to insertLast on the same list has time complexity \(O(n)\) so, on average once the sequence is finished, each call can be assigned \(O(1)\) time.