Data Structures and Algorithms

Test 2 Study Guide

The tests in this course may contain any material covered in lecture up until this point, although much stronger emphasis will be placed on material for which you have received graded feedback. This page contains a collection of exercises that you may wish to complete to prepare yourself. We also encourage you to review the lecture notes from topics presented since the previous test.

Sorting

You should make sure to review the sorting algorithms we covered in class and be ready to discuss them.

What is the worst-case time complexity of bubble sort? QuickSort? MergeSort?

Bubble sort and QuickSort both have worst-case complexity of \(O(n^2)\). MergeSort has a worst-case complexity of \(O(n \log n)\).
Although QuickSort’s worst-case time complexity is worse than MergeSort’s, we use QuickSort as a default in practice. Why?

Randomized QuickSort’s expected worst-case complexity is \(O(n \log n)\). Also, QuickSort is an in-place sort, meaning that it performs less copying than MergeSort.
What is an in-place sorting algorithm?

An in-place sorting algorithm is one in which the contents of the array are sorted while remaining within the array.
What is wrong with the following QuickSort pseudocode? What would happen if we used this pseudocode to sort an array? How could we fix it?
```
Function brokenQuickSort(array, left, right)
    pivot <- partition(array, left, right)
    brokenQuickSort(array, left, pivot-1)
    brokenQuickSort(array, pivot+1, right)
EndFunction
```
There is no recursive base case in the provided pseudocode. If this code were executed, it would break the list down into paritions of size 0 and keep trying to split those size 0 partitions indefinitely. We need to add a condition that does not try to sort if the array is of size less than two.

Write pseudocode for a partition function which can be used in the QuickSort algorithm.

Any partition function which correctly organizes the array into a “low section” and a “high section”, returning the index of the pivot between them, is acceptable. Here’s the Lomuto algorithm which was discussed in class:

Function partition(array, start, end)
 pivot <- array[end]
 frontier <- start
 fence <- start
 While frontier < end
   If array[frontier] < pivot
     swap array[frontier] with array[fence]
     fence <- fence + 1
   EndIf
   frontier <- frontier + 1
 EndWhile
 swap array[end] with array[fence]
 Return fence
EndFunction

Write pseudocode for the merge function of the MergeSort algorithm.

Function merge(a1, a2, am)
 i <- 0
 j <- 0
 k <- 0
 While i < length(a1) And j < length(a2)
   If a1[i] < a2[j]
     am[k] <- a1[i]
     i <- i + 1
     k <- k + 1
   Else
     am[k] <- a2[j]
     j <- j + 1
     k <- k + 1
   EndIf
 EndWhile
 While i < length(a1)
   am[k] <- a1[i]
   i <- i + 1
   k <- k + 1
 EndWhile
 While j < length(a2)
   am[k] <- a2[j]
   j <- j + 1
   k <- k + 1
 EndWhile
EndFunction

Big-O Proofs

Recall the definition of big-\(O\) notation we used in class: \(f(x)\) is \(O(g(x))\) if \(\exists c>0, k\geq 1.\ \forall n \geq k.\ f(n) \leq cg(n)\). You should be prepared, given this definition, to prove complexity bounds on functions.

Prove that \(f(n) = \frac{n^2+1}{n}\) is \(O(n)\).

(This is a “reverse” proof.)

We want to show that \(f(n)\) is \(O(n)\). By definition, this is to claim that \(\exists c>0, k\geq 1.\ \forall n \geq k.\ \frac{n^2+1}{n} \leq cn\).

Let \(k=1\) and \(c=2\). Then it suffices to show that \(\forall n \geq 1.\ \frac{n^2+1}{n} \leq 2n\). We can redistribute the let side to read \(\forall n \geq 1.\ n + \frac{1}{n} \leq 2n\). We have from algebra that, if \(a \leq c\) and \(b \leq d\), then \(a + c \leq b + d\); therefore, it suffices to show both that \(\forall n \geq 1. n \leq n\) and also that \(\forall n \geq 1. \frac{1}{n} \leq n\). The prior is true by definition of equality. For the latter, we multiply both sides by \(n\) (a positive number) to get \(\forall n \geq 1. 1 \leq n^2\); we take the square root of both sides to get \(\forall n \geq 1. 1 \leq n\), which is immediately true. As these two points are sufficient to show the original statement, we conclude that \(f(n)\) is \(O(n)\).
Prove that \(f(n) = 3n^3 + 5n^2 + 8\) is \(O(n^3)\).

(This is a “forward” proof.)

\begin{align} &\text{1.}& \forall n \geq 1.\ 1 &\leq n & \text{by definition} \\ &\text{2.}& \forall n \geq 1.\ 1 &\leq n^2 & \text{(1), square both (positive) sides} \\ &\text{3.}& \forall n \geq 1.\ 1 &\leq n^3 & \text{(1), cube both (positive) sides} \\ &\text{4.}& \forall n \geq 1.\ 8 &\leq 8n^3 & \text{(3), multiply by 8} \\ &\text{5.}& \forall n \geq 1.\ 5n^2 &\leq 5n^3 & \text{(1), multiply by \(5n^2\) (positive by (2))} \\ &\text{6.}& \forall n \geq 1.\ 3n^3 &\leq 3n^3 & \text{by definition of equality} \\ &\text{7.}& \forall n \geq 1.\ 3n^3 + 5n^2 &\leq 3n^3 + 5n^3 & \text{by (5), (6) and transitivity of \(\leq\)} \\ &\text{8.}& \forall n \geq 1.\ 3n^3 + 5n^2 + 8 &\leq 3n^3 + 5n^3 + 8n^3 & \text{by (4), (7), and transitivity of \(\leq\)} \\ &\text{9.}& \forall n \geq 1.\ 3n^3 + 5n^2 + 8 &\leq 16n^3 & \text{by (8), simplification} \\ &\text{10.}& \exists c>0. \forall n \geq 1.\ 3n^3 + 5n^2 + 8 &\leq cn^3 & \text{by (9), introduction of existential} \\ &\text{11.}& \exists c>0, k \geq 1.\ \forall n \geq k.\ 3n^3 + 5n^2 + 8 &\leq cn^3 & \text{by (10), introduction of existential} \\ &\text{12.}& 3n^3 + 5n^2 + 8 &\text{ is } O(n^3) & \text{by (11), definition of big-O} \\ \end{align}
Prove that \(f(n) = (n+1)(n-1)\) is \(O(n^2)\).

(This is a limit-based “reverse” proof.)

We aim to prove that \(f(n) = (n+1)(n-1)\) is \(O(n^2)\). By definition, this is to say that \(\lim\limits_{n\to\infty} \dfrac{(n+1)(n-1)}{n^2} = c\) for some constant c. By distribution, this is to say that \(\lim\limits_{n\to\infty} \dfrac{n^2-1}{n^2} = c\).

By the sum law for limits, this is equivalent to the statement \(\left(\lim\limits_{n\to\infty} 1\right) - \left(\lim\limits_{n\to\infty} \dfrac{1}{n^2}\right) = c\). The limit of a constant is the constant itself. The second term in this equation has \(n\) only in the denominator, so this limit is zero. It therefore suffices to show \(1 - 0 = c\) for some constant \(c\), which is true.

Data Structures and Invariants

You should be prepared to discuss the data structures we have created in class, the fields in their implementations, and the invariants we rely upon in order to write algorithms for those data structures.

What is an ADT? What is the difference between an ADT and a data structure? Give two examples of each.

An ADT is a description of how data can be accessed and manipulated. In other words, an ADT is an interface allowing the caller to store and retrieve data. A data structure is a concrete realization of an ADT; data structures give specific algorithms to implement the interface that the ADT describes.

Two examples of ADTs are lists and stacks. Lists and stacks are both defined by a collection of routines that allow a user to store and manipulate data (e.g. insertFirst, getLast, etc. for list; pop, push, etc. for stack). Two examples of data structures are linked lists and array lists. Both linked lists and array lists are concrete implementations of the list ADT; each of them give implementations for the interface that the list ADT describes.
What is the difference between a List and a Stack? When might we use a Stack instead of a List?

List and Stack are two different ADTs. The List ADT supports a variety of operations (such as get by index) that a Stack does not. A Stack may be implemented using a List but is not a kind of List.

We might use a Stack instead of a List if the operations provided by Stack are enough for our algorithm. By using a “weaker” data structure, the algorithm that uses it is easier to verify as correct since it’s easier to understand how the data structure behaves.

What are the big-\(O\) complexities of the following operations? Answer for each of (a) LinkedList with a tail pointer, (b) LinkedList without a tail pointer, and (c) ArrayList. Briefly explain each answer.

Insert to the front of the list.
Insert to the back of the list.
Retrieve the first element of the list.
Retrieve the middle element of the list.
Remove an element from the back of the list.

	`LinkedList` with `tail`	`LinkedList` without `tail`	`ArrayList`
`insertFirst`	\(O(1)\) A new head is created and linked to the old head.	\(O(1)\) Identical to `LinkedList` with `tail`.	\(O(n)\) Every element must shift down one space.
`insertLast`	\(O(1)\) A new tail is created and linked to the old tail.	\(O(n)\) Without a `tail` pointer, we must walk to the end of the list to find the tail.	amortized \(O(1)\) We add to the end of the array; this may require us to grow the array.
`getFirst`	\(O(1)\) The first node of the list is immediately available.	\(O(1)\) Identical to `LinkedList` with `tail`.	\(O(1)\) Accessing an array is constant time.
`get`	\(O(n)\) We have to find the appropriate node.	\(O(n)\) Identical to `LinkedList` with `tail`.	\(O(1)\) Accessing an array is constant time.
`removeLast`	\(O(n)\) The `tail` pointer lets us find the last element, but we need to find the penultimate element (second to last) so we can make it the new last element. This requires us to walk the list.	\(O(n)\) We must walk to the end of the list to find the last two nodes.	\(O(1)\) We can simply decrease the `size` of the `ArrayList` by one; the array doesn’t need to change.

What is amortized complexity?

Amortized complexity is the average complexity of an algorithm over a sequence of calls to it. For instance, the amortized worst-case complexity of ArrayList’s insertLast is \(O(1)\) even though the worst case complexity is \(O(n)\): on any given call to insertLast, \(O(n)\) is the worst complexity that can occur. However, a sequence of \(n\) calls to insertLast on the same list has time complexity \(O(n)\) so, on average once the sequence is finished, each call can be assigned \(O(1)\) time.
Draw a stack diagram of a stack frame in which the following code has been run:
```
Stack<int>* stack = new LinkedStack<int>();
stack->push(4);
stack->push(5);
```