This is a repo that helps you to learn the foundemental algorithms as well as their run time analysis. The algorithms covered such as divide and conquer(binary search, fibonacci, etc), sorting algorithms(quick sort, heap sort, radix sort, etc), BST, dynammic programming, etc.
Also to notice the material is partially based on CSCI-GA-1170 Fundamental Algorithms from 🎓 NYU Courant.
Enjoy your journey learning these fun algorithms! 🥳
Table of content
- Fundamental Algorithms
- Appendix 🖇
- Reference 🗞
pass
Sorting | Time |
---|---|
Generic sort | O(n^2) |
Insertion sort | O(n^2) |
Selection sort | O(n^2) |
Merge sort | Θ(nlogn) |
Quick sort ran | Θ(nlogn) |
Quick sort det: 1st/last element | Θ(n^2) |
Quick sort det: pick median | Θ(nlogn) |
Heap sort | O(nlogn) |
Couting sort | O(n+k) |
Radix sort | O(d(n+b)) |
O(n^2)
: generic sort, insertion sort, quick sort deterO(nlogn)
: Merge sort, Heap sort, quick sort randomizedO(n)
: Radix sort, counting sort
Questions to think through before quicksort To maintain consistency, we will pick the strategy:
- Move the pivot always to the beginning of the segment
- Initialize
i
atlow
,j
atlow + 1
- At the end swap
arr[i]
witharr[low]
/arr[pivot]
. - If
def quick_sort_ran(arr, low, high)
choose to includehigh
, then we should:- use
len(arr)-1
as input - input
pivot - 1
andhigh
(sincelen(arr)-1
used) While j <= high
, should includej==high
- use
Some questions to be cleared before heap sort 👈🏼 Heap sort is a comparison-based sorting algorithm that uses a binary heap. It works by building a heap from the input data, then repeatedly removing/extracting the largest/smallest element from the heap and restoring the heap property, until the heap is empty. This process sorts the data in ascending order when a max-heap is used (or descending order if a min-heap is used).
Heapify Heapify is used when removing or inserting an element into a heap, ensuring that the heap remains a complete binary tree and maintains its order properties.
Insertion: Deletion:
CS Dojo's YT video on counting sort Counting sort is stable sorting algorithm, which keeps the original order of the element when ties occur. And counting sort works best when range of the numbers in an array is small(e.g. only 0,1,3 in the array.) The steps of counting sorts are:
- Finding the starting index of every number, and store them in an auxiliary array.
- Calculate the prefix sum of the array
- Shift to the right by one element
- Fill the new array based on the value of our auxiliary array
- Increase the element encountered each time, and the value of auxiliary array means the index of where the number should be placed.
Runtime analysis ⏱ T(n) = O(n+k), n is the length of the original array(total number of elements), k is the number value range in the array( [smallest, greatest] ).
Space complexity: An array of length n, and another array of length k. Thus, O(n+k).
CS Dojo's YT video on radix sort Radix sort is sorting every numbers in the array digit-by-digit. Bsed on counting sort, radix sort is also stable.
Runtime analysis ⏱ Let's say:
n
: the length of the arrayd
: max digit number we need to represent all the elementsb
: base range where we sort in each level, such as in the base of 10, the range is [0,9]. It is also the length of the auxiliary array.
So d
will be how many rounds of sorting we need to perform. In every round, we will carry out a counting sort in O(n+b)
. So the overall runtime of O(d(n+b))
, which could possibly fast than merge sort or quick sort. Choosing b
and d
can be a trade-off between time and space.
Dynamic programming is wonderful way of solving recurrence in a faster way in trade-off of additional space complexity either by using tabulation or memoization. The primary purpose of using dynamic programming (DP) is to store and reuse the results of subproblems to avoid redundant calculations, thereby reducing the overall time complexity.
Definition:
- fibonacci(0) = 0, fibonacci(1) = 1
- fibonacci(n) = fibonacci(n-1) + fibonacci(n-2), when n >= 2 fibonacci.py
- Brute force
def fibonacci_bf(self, n):
if n == 0: return 0
if n == 1: return 1
return self.fibonacci_bf(self.n - 1) + self.fibonacci_bf(self.n - 2)
Runtime analysis
Brute force has an exponetial runtime.
- Tabulation
def fibonacci_table(self) -> int:
'''Tabulation approach | Bottom-Up | Iterative'''
n = self.n
table = [None] * (n+1)
table[0] = 0
table[1] = 1
for i in range(2, n+1):
table[i] = table[i-1] + table[i-2]
return table[n]
In tabulation approach, recursion is avoided, the value of table[n] is achieved using a loop to get a prefix sum from 2 to n. Since it is accumulated from 2 to 3 to 4 to n, it is a bottom-up approach.
Runtime analysis
DP Table filling approach has a runtime of O(n)
.
- Memoization
def fibonacci_memo(n):
memo = {}
memo[0] = 0
memo[1] = 1
if n not in memo:
memo[n] = fibonacci_memo(n-1) + fibonacci_memo(n-2)
return memo[n]
Runtime analysis
Claim: The number of recursive calls are no greater than 2(n+1)
Proof: At most n+1 times will be write into the memo, and every time when we write into memo, we make 2 recursive calls. So at most 2(n+1) calls.
Theorem: DP Memoization approach runtime: O(n)
See cutting_rod.py
See longest_common_subsequence(lcs).py
Activity Selection
We are given a set of activities
Greedy algorithm:
- Pick the "best first" activity.
- Remove all activities that conflict with this
- if exists activities remaining, keep applying this
In this case, we should always pick the intervals with the ealiest end time.
GSAS means, at each step, a greedy algorithm makes the most optimal local choice in the hopes that these choices will lead to a globally optimal solution. (However, it's important to clarify that this doesn't mean greedy algorithms always produce the optimal solution for all problems.)
Progress measure Write down a "progress measure" and show that greedy is better than any other algorithm, at all point of the algo. Useful tip: Clearly write down progress measure first!
Theorem
- Let G be the greedy solution
- Let Z be any other solution.
- Then for all
$i \geq 0$ ,$F_i(G) \leq F_i(Z)$
Prove GASA by induction F represents the end time of an interval here.
- Base case:
$F_0(G) = F_0(Z) = -\infty$ - Inductive step: suppose
$F_{i-1}(G) \leq F_{i-1}(Z)$ - Prove consider choosing about the
$i_{th}$ element by G ends no greater than the$i_{th}$ element chosen by Z.
Define a sequence of solutions:
Huffman coding is a popular algorithm used for lossless data compression. It was developed by David A. Huffman in 1952 while he was a student at MIT. Huffman coding is widely used in various applications such as file compression (e.g., ZIP files), image compression (e.g., JPEG), and communication protocols.
Using greedy algorithm to build a Huffman tree We want prefix free encoding(PFE), so that F(i) is ont a prefix of F(j) for any i!=j. And minimize the average depth of PFE.
Define an Optimal Prefix Tree:
- No vertex with single child
- Deeper nodes have lower probability
- Deepest leaf(say x) has a sibling(due to prop1), which must also be a leaft(say y.)
Theorem If x and y have least probability values, then there is an optimal tree where x and y are siblings.
Term:
V
: Collection of vertices/node, usually|V| = n
E
: Collection of edges, usually|E| = m
e = (c, h)
: directed edge fromc
toh
e = {c, h}
: undirected edge betweenc
andh
Note: Often we write undirected edges also as(c, h)
for simplicity.
Assume:
- Simple graphs, so no parallel edges or self-loops.
- Edges may have weights/costs/lenghts.
Number of edges:
- Undirected graph:
$0 \leq m \leq \frac{n(n-1)}{2}$ (pick 2 out of n) - Directed graph:
$0 \leq m \leq n(n-1)$ , since every edge in the undirected graph can be bi-directional.
Handshaking lemma:
- In an undirected graph, the sum of the degrees of all vertices is equal to twice the number of edges.
- Directed graph: $$ \sum_{v \in V} indegree(v) = \sum_{v \in V} outdegree(v) = m $$
- Undirected graph: $$ \sum_{v \in V} indegree(v) = 2m $$
Cycle Lemma:
- Graph G = tree if any two of following 3 conditions are satisfied (and then it implies third condition):
- G is connected
- G has n-1 edges
- G has no cycles
The following discussion in about directed graphs and trees. Image from Shichao's Notes
1. Adjacency list
- Space complexity:
$Θ(m+n)$ - Lookup time:
$Θ(out-deg(v))$
2. Adjacency matrix
- Keep in mind that every row is the source vertex, and every column is the destination vertex.
- Space complexity:
$Θ(n^2)$ (indepenent of m!) - Lookup time:
$Θ(1)$
BFS and DFS visualization website
Input: Directed/undirected graph G = (V, E), source vertex s in V. Output: For each vertex v in V, the length of the shortest path from s to v.
v.d
: distance from s to v(# of edges).v.π
: predecessor of v in the shortest path from s to v.
Ideas: Find all vertices at distance t = 0,1,2,... from s, before moving to t+1.
Question: How to check if a vertex is visited?
Method I: Distance:
Set v.d = ∞
and v.π = NIL
for all v in V, and update v.d
when first send. If v.d = ∞
, then v is unvisited, otherwise not.
Pseudo code for method I
BFS(G, s):
for each vertex u in G.V - {s}:
u.d = ∞
u.π = NIL
s.d = 0
s.π = NIL
Q = ∅
ENQUEUE(Q, s)
while Q ≠ ∅:
u = DEQUEUE(Q)
for each v in G.Adj[u]:
if v.d == ∞:
v.d = u.d + 1
v.π = u
ENQUEUE(Q, v)
Runtime analysis: Each node enters Q at most once, then we spend O(out-degree(node)) time to process it. Thus, the total runtime is O(n+m), think of the adjacency list representation.
Input: Directed/undirected graph G = (V, E). (No start node needed)
Output: For each vertex v in V, the discovery time v.d
and finish time v.f
, v.π
is the predecessor of v in the DFS tree.
Ideas: Think of a global clock, at each time, either (1) A new node is discovered, or (2) An old node is finished.
Question: How to check if a vertex is visited? Method II: Coloring
- the vertices: White, Gray, Black.
Pseudo code for coloring
DFS-VISIT(G, u): time = time + 1 u.d = time u.color = GRAY for each v in G.Adj[u]: if v.color == WHITE: v.π = u DFS-VISIT(G, v) u.color = BLACK time = time + 1 u.f = time
DFS-Driver(G): for each vertex u in G.V: u.color = WHITE u.π = NIL time = 0 for each vertex u in G.V: if u.color == WHITE: DFS-VISIT(G, u)
Runtime analysis: The runtime is O(n+m), since we visit each node once, and spend O(out-degree(node)+1) time to process it.
Theorem 1: Parenthesis Theorem
u.d
<v.d
<v.f
<u.f
ifu
is an ancestor ofv
. |([])
or[()]
u.d
<u.f
<v.d
<v.f
ifu
andv
are in different subtrees of the DFS tree.()[]
or[]()
Theorem 2: White-path Theorem Node v is a descendent of u in T if and only if at time u.start, exists a white path from u to v. Proof: If v is indeed a descendent of u, at time of u.start, each recursive calls on u_i are made after u.d
- Tree edges Points to white nodes, to child
- Back edges Points to grey nodes, to ancestor
- Forward edges Points to black nodes, to descendent
- cross edges Points to black nodes, to descendent of its sibling's subtree
Theorem: Cycle and back edges G has no cycles(acyclic) <=> DFS(G) has no back edges
Definition: A topological sort of a directed acyclic graph (DAG) is a linear ordering of its vertices such that for every directed edge u -> v
, u
comes before v
in the ordering.
Input: Directed Acyclic Graph G = (V,E)
Output: Nodes in an order that all edges go from left to right.
How to run topo sort on a DAG?
- Run DFS
- Output in reverse order of v.f numbers.
Input: Connected, undirected graph G = (V, E) with edge weights. Output: A subset of edges E' ⊆ E such that G' = (V, E') is a tree and the sum of the weights of the edges in E' is minimized.
Demo from Wikipedia
Repeatedly pick the cheapest edge until the graph is connected.
Steps:
- Sort edges to that
w(e1) <= w(e2) <= ... <= w(em)
- Create an empty Set
S
- for i=1 to m:
- if adding
e_i
to S created no cycles: - add
e_i
to S
- if adding
- return T = (V, S)
def kruskals(G, W):
sorted_w = sorted(W) # O(nlogn)
S = {}
for w in sorted_w: # Θ(n)
if w not in S and S.append(w, inplace=False).acyclic():
if S.size() > G.V - 1:
break
S.append(w)
return T(V,S)
How to prove the correctness and fastness of Kruskal's algorithm?
- GASA
- Swap Argument
Union Find
Find-Set(x)
: return the representative of the set containing x.Union(x, y)
: merge the sets containing x and y.Make-Set(x)
: create a new set with a single element x.Union by rank
: merge the smaller tree into the larger tree.Path compression
: make the path from x to the root shorter.
Runtime analysis: O(mlogm + mlogn), where m is the number of edges, n is the number of vertices.
The image's from stackoverflow.
Prims(G,s):
S = {s}
n = |G.V|
for i=1 to n-1:
find an edge `e` of minimum weight connecting s to a vertex in V-S
add e to S
return S
Prim's and Kruskal's algorithms are used to connect all vertices in a graph in the cheapest way possible. Dijkstra's algorithm is used to find the cheapest route between two vertices.
Two scenarios we will discuss:
- SSSP: Single source shortest path
- ASSP: All source shortest path
Type | Algorithm | Runtime |
---|---|---|
SSSP | Dijkstra | |
Bellman-Ford | ||
APSP | Floyd-Warshall | |
Johnson's |
Shortest path metrics Define shortest path as: $$ \delta(u,v) = min_{P:u->v} \omega(P)$$ δ(u,v) = +∞ if no path from u to v(Unreachable, infinite distance). This is the thing we try to compute! s.d can hold estimates.
Fact 1: Optimal substructure If P is the shortest path from 1 to k, then any sub paths (i,j)will be the shortest path connecting i and j.
Fact 2: Triangle Inequality
Fact 3: Negative cycle If weight are negative, then there may not be a well-defined shortest path. This is because of a negative weight cycle can go around forever.
See Dijkstra.py
Relaxing
- Maintain estimates for shortest path distance to each node: v.d
- initially: s.d = 0, v.d = ∞
- Take node u with the smallest estimates that is not output yet
- look at its outgoing edges and update estimates
- output u (u.d == δ(s,u))
Dijkstra(G, s):
Set = {}
s.d = 0, v.d = ∞ forall v != s
s.π = nil, v.π = nil
while |Set| < |V|
u <- extractmin(V\Set) # Using PQ
Set.add(u)
for v in Adj[u]:
v.d <- min(r.d, u.d+w(u,v))
v.π <- u
Lemma1:
Bellman-Ford algorithm features:
- Deal with negative weights
- Detect negative weight cycle
- Incorporate Dynamic Programming in relaxing
- Has a runtime of O(mn)
Pseucode code:
Bellman-Ford(G, s)
for each vertex v in G:
v.d = ∞
s.d = 0
for i from 1 to |V| - 1:
for each edge (u, v) in G:
if u.d + w(u, v) < v.d:
v.d = u.d + w(u, v)
for each edge (u, v) in G:
if u.d + w(u, v) < v.d:
error("Graph contains a negative-weight cycle")
return
See bellman_ford.py
Recursion and DP
Bellman-Ford keeps a table to store the shortest distance array DP = []
, and update it based on the neighbors.
In relaxation:
Now, let's prove the recursive relation.
Define: v
itself in x as well)
- O(m): Compute
$T^k$ from T^{k-1} - n-1:
$T^{n-1}$ - each SP has at most n-1 vertices - Runtime = O(mn)
Implementation:
- Run Bellman n times:
$n * O(mn) = O(mn^2) = O(n^4)$ - Floyd-Warshall can run in
$O(n^3)$ , or$O(mn+n^2logn)$ for sparse graphs(where m much smaller than$n^2$ ).
Define:
Base case:
F is a 2D array of size (1+m)(1+n)
Recursion
DP implementation
Floyd-Warshall(G)
n = number of vertices in G
T = array of size [n][n]
for i from 1 to n:
for j from 1 to n:
if i == j:
T[i][j] = 0
else if there is an edge from i to j:
T[i][j] = w(i, j)
else:
T[i][j] = ∞
for k from 1 to n:
for i from 1 to n:
for j from 1 to n:
T[i][j] = min(T[i][j], T[i][k] + T[k][j])
return T
Ideas in short: Shift all the edges to non-negative, then run n times Dijkstra's to every pair. We get $O(mnlogn)$.
Johnson's Algo
- Use 1 run of Bellman to get "shifts". $O(mn)$
- "shift" all the edges to non-negative. $O(m)$
(shortest path stays the same)
- Run n times Dijkstra's. $O(mnlogn)$
Overall runtime: $O(mnlogn)$
Shifting
- We have a height function: $h(V) -> \real$.
- We define $$\hat{w_{u,v}} = h(u) - h(v) + w(u,v)$$
Why the shortest path stay the same?
$$\hat{w_{s,t}} = \hat{w_{v1,v2}} + \hat{w_{v2,v3}} + ... + \hat{w_{v_{n},v}} = h(s) - h(v) + w(s,t)$$
How to shift?
Or how to make $\hat{w(u,v)} \geq 0$ for all $u,v \in V$?
- Find a $V^$ so that it can reach every other node, compute every node's distance from $V^$, define the distance as height, and it must be non-negative.
We need to distinguish between priority queue
, heap
, binary tree
and array
before discussing the implementation of heap sort, because these concepts always intertwined and confused me while learning this sorting algorithm.
Priority queue
: A priority queue is an abstract data type(not a data structure) that operates similarly to a regular queue or stack but with a "priority" feature. A priority queue can be implemented using various data structures, including heaps (most commonly), binary trees, or even arrays.
Heap
: Heap is a specialized tree-based data structure where each node conform to the same rules(like max heap or min heap.) A Binray heap can also be implemented as an array for efficiency.
Binary tree
: A tree data structure in which each node has at most two children. It's a fundamental structure that can be specialized into more specific types, such as BST, AVL trees, and heaps.
Array
: Arrays can be used to implement binary heaps efficiently due to the predictable structure of a complete binary tree, where the relationships between parent and child nodes can be calculated using the indices.
In conclusion, a binary heap is a more specific type of binary tree, tailored for efficiently managing hierarchical data according to certain rules (heap property).
A priority queue is an abstract concept that specifies behavior (an ADT) rather than a concrete data structure. A binary heap is a specific data structure that efficiently implements the priority queue ADT.
- Why do we need to put
arr[pivot]
somewhere "safe"? And where should we put it? First or last position?
- Where should we initialize
i
and j
?
- In the last step of swapping
arr[pivot]
to the middle, which element should we switch with?
- Where on earth does
arr[i]
point to?
- The last element in "smaller" section? Or the first element in "greater" section? Or the first one of "uncompared" section?....
i
points to the last index where an element is less than or equal to the pivot.
- "Two pointers" algorithm in quick sort.
- Quicksort also implemented "Two pointers" algorithm, where i is the slow pointer, and j is the fast pointer.
i
only moves if arr[j]
is greater than arr[pivot]
, and increments after swapping.