# Week 0: Introduction and Basic Data Structures

This 10-week data structures and algorithms series is going to be an essential project in my blog. I am preparing for a Google interview, which is the main reason why I am writing this. I feel it would be valuable if I take notes to record my journey, for reviewing the knowledge that I learn along the way, also for you guys who need sources to prepare for future coding interviews.

A little background about myself: I am studying at UCLA, major in pure mathematics, with a research orientation on cryptography. I am a Python person, so that Python is the official programming language on my website. I haven’t received formal CS education besides theoretical cryptography in school, which means I have gone through all these coding interview preparations by self-studying. However, I don’t consider it a disadvantage. Self-studying is even more effective than taking courses in college, in my opinion. Therefore, don’t worry if you have little background in CS. As long as you follow my plan, I guarantee you can at least reach my level.

While reading books, if you find LeetCode is hard to follow and problems are a little difficult for you, I recommend you to check out this program on educative.io. The first two courses (data structure refresher and hacking the coding interview) can help you to get some practice along with the reading. Note that the system design course in it is very helpful, so make sure you do that too. Spend at least 20 hours on system design is a good investment, even if you are new grad (Google starts to ask system design questions in internship interviews). Knowing basic concepts about system design can make you more confident in an interview.

After these prerequisites, you can start reading my posts and doing LeetCode. The main focus of this series is the “medium” problems on LeetCode, which covers around 80% of a coding interview. Most people are able to handle these medium problems with a proper study plan, so you should spend most time on them. Hard problems are also likely to appear if you are interviewing with big companies, such as Google or Facebook. Anyway, if you are aiming for a higher goal, it is always necessary to put in greater effort.

Enough talking, let’s get to the point. Here is a long list of practice problems that you should do on LeetCode, before moving to Week 1. These problems will show you a big picture of what you would expect to learn in my posts. I will only write solutions on those important problems. That is, either they appear in interviews more frequently than others, or the concept is important for future study.

## Time Complexity:

Make sure you know and understand the time complexity of operations on basic data structures, such as `append`, `pop`, `remove` and etc. The table can be found here:

Big-O Cheat Sheet

Why is it important? Suppose you want to optimize the code inside a loop to $O(1)$. If you use `pop(0)` method on a list to pop out the first element and you assume the operation is $O(1)$, your whole algorithm will be messed up. When you call `pop(0)`, the method will delete the first element and move all other elements to the left by index 1, so it’s $O(n)$. These subtle things could be crucial for the correctness on algorithm design.

Python list is a dynamic array that can hold different data types in it, so it’s very different from C array. This flexibility has a trade-off: Python list takes more space in memory than C array. So if you have a huge data set with only one data type, and you only want to use simple operations on this data set, you should go with C array, also known as the `array` module in Python. Insertion and deletion in standard list takes $O(n)$ time in worst case, since the elements after the insertion/deletion point need to be moved to the correct position to maintain the ability of random access, which is the key for allowing $O(1)$ time random access for list. Do not confuse accessing with searching. During accessing, you pass into an index; during searching, you pass into a target value. Thus the best you can hope for searching is $O(\log n)$ using binary search, not $O(1)$. In contrast, insertion and deletion in linked list takes $O(1)$ time, since the only real operation is modifying pointers. Random access in linked list takes $O(n)$ time in worst case, since we need to traverse from the head node and search each node one by one. Consider this trade-off when you need to pick list or linked list as the primary data structure for your task.

In a big picture, all data structures fall into these two categories. A data structure is either a special list or a special linked list. For example, hash table is a special list and tree is a special linked list. You will understand this part better after learning all data structures.

LeetCode #206. Reverse Linked List (easy) (memorize it)

LeetCode #24. Swap Nodes in Pairs (medium)

LeetCode #141. Linked List Cycle (easy) (memorize it)

LeetCode # 142. Linked List Cycle II (medium)

LeetCode #25. Reverse Nodes in k-Group (hard)

More Problems:

## Stack and Queue:

Stack:

In Python, stack is represented by a list, that is, we can initialize a stack using `stack = []`. The List ADT supports `append` and `pop` at the right end of a list, which perfectly follows “First In Last Out” (FILO) principle of a stack. Just as a fun fact, note that `collections.deque()` can be used as a stack, too. If you only use the `append()` and `pop()` methods, the resulting data structure is FILO. I don’t recommend to use deque as stack though, as it supports other methods like `popleft()`, and that can be problematic if you use the wrong methods accidentally.

Queue:

In contrast, queue is “First in First Out” (FIFO), a mirror data structure of stack. In Python, we always want to use `queue = collections.deque()` to realize the functionality of a queue, which seems more than necessary, but using other modules can be problematic, and here is why. If you use a regular list, the insertion and deletion will be $O(n)$ time, which is far from ideal. If you use `queue = queue.Queue()`, when you pop element from empty queue, it raises an error. Using deque would circumvent both these two disadvantages. Deque is implemented using doubly linked list at low level, so randomly accessing an element in the middle takes $O(n)$ time from traversal. Hence we want to use append and pop only, for that they only take $O(1)$ time. We can make a convention here to use `append()` and `popleft()` for representing a queue. Visually, elements come into the queue from the right end and leave the queue from the left end. In later posts you will see that queue is heavily linked with BFS.

Sample Problems:

LeetCode #20. Valid Parentheses (easy) (memorize it)

LeetCode #225. Implement Stack using Queues (easy)

LeetCode #232. Implement Queue using Stacks (easy)

More Problems:

## Priority Queue:

Sample Problems:

LeetCode #703. Kth Largest Element in a Stream (easy)

LeetCode #239. Sliding Window Maximum (hard)

## Hash Table and Set:

In Python, hash table is represented by dictionary. It supports search, insertion and deletion in $O(1)$ time, which is too good to be true. The trade off here is that dictionary is unordered. If you want to store elements in a relatively ordered data structure, you can use binary search, which supports search, insertion and deletion in $O(\log n)$ time. All operations for both data structures mentioned above will decay to $O(n)$ time in worst case. Both of them have $O(n)$ worst case space complexity.

Also, make sure you understand the theory behind hash table, such as why hash table is fast for lookup (use hash function for indexing), how to design a proper hash function (should be fast and with minimum hash collision), and how to deal with hash collision if any occurs (open addressing and chaining).

Set is an unordered collection of unique elements, which means it contains no duplicates. It is often used to record the elements that had been visited, for example, the set `seen` in Two Sum and BFS.

Sample Problems:

LeetCode #242. Valid Anagram (easy)

LeetCode #1. Two Sum (easy) (memorize it)

LeetCode #15. 3Sum (medium) (memorize it)

LeetCode #18. 4Sum (medium)

More Problems:

## Binary Tree and Binary Search Tree (BST)

Binary tree has a natural divide and conquer flavor, as a problem on binary tree itself always reduces to subproblems regarding its left subtree and right subtree. Thus, most questions on binary trees have easy solutions using recursion. However, you should memorize the iteration solutions too, just in case any follow-up question comes after.

The performance of BST was mentioned in last section, in contrast with hash table. One important thing to note is that in the definition of BST, we need every node on the left subtree to has smaller value than the value of the root, and every node on the right subtree to has larger value than the value of the root. A very common mistake is to only compare left child and right child to the root, which is a misunderstanding on the definition of BST.

Sample Problems:

LeetCode #98. Validate Binary Search Tree (medium)

LeetCode #236. Lowest Common Ancestor of a Binary Tree (medium)

More problems:

# Week 2: BFS and Topological Sorting

Breadth-first Search (BFS) appears in interviews A LOT. It is very simple and versatile, so it is worthy to spend some time and tackle it down. One cool thing about BFS is that we could solve most BFS problems using two templates, which I will show you later. If you study really hard, it is possible to master BFS in just one single day (expect 8 hours of effort). BFS utilizes a queue to keep track on elements that are waiting to be searched. In Python, we use `collections.deque()` to realize this functionality. Also, we need a set to record the nodes that had been visited already. If we don’t do this, the algorithm might falls into infinite loops.

When do we use BFS? There are three major scenarios:

1. Graph Traversal (level order traversal, connected components, topological sorting)
2. Finding shortest path in simple graph (every edge has equal weight)
3. Iteration solution for all possible results (will appear in later posts)

There are two types of BFS, either without-level-order or with-level-order. We will begin our discussion using the simpler case, which is the without-level-order case.

## BFS without Level Order:

If we are not asked to record level order, just simply visit the neighbors of head, and then the neighbors of neighbors, and so on, one by one. If we are asked to record level order, we need insert an additional for loop inside the while loop to traverse all nodes in a certain level, and then move on to the next level, level by level. That is the only difference, which is a simple modification of the code. Let’s take a look at the simpler case:

```"""
BFS without level order
"""
from collections import deque

seen = set()
queue = deque()

queue.append(start)
while len(queue):
if neighbor not in seen:
queue.append(neighbor)```

One thing to note is that, for a graph BFS, `seen` and `queue` always come in pairs. We initialize them together, and each time one of them needs to be updated, the other would need to be updated too. You may use some silly phrase like “have you seen queue today?” to help memorization. If we use BFS on a binary tree, then we don’t need to use `seen`, because trees don’t contain any cycle, so there won’t be any potential danger of infinite loop. Also, note that in a binary tree, the neighbors of a node are its left child and right child (if there is any), so the `for` loop will be replaced by two `if` statement.

Sample Problems:

LeetCode #133. Clone Graph (medium)

Given a reference of a node in a connected undirected graph, return a deep copy (clone) of the graph. Each node in the graph contains a val (`int`) and a list (`List[Node]`) of its neighbors.

Example:

```Input:
{"\$id":"1","neighbors":[{"\$id":"2","neighbors":[{"\$ref":"1"},{"\$id":"3","neighbors":[{"\$ref":"2"},{"\$id":"4","neighbors":[{"\$ref":"3"},{"\$ref":"1"}],"val":4}],"val":3}],"val":2},{"\$ref":"4"}],"val":1}

Explanation:
Node 1's value is 1, and it has two neighbors: Node 2 and 4.
Node 2's value is 2, and it has two neighbors: Node 1 and 3.
Node 3's value is 3, and it has two neighbors: Node 2 and 4.
Node 4's value is 4, and it has two neighbors: Node 1 and 3.
```

Note:

1. The number of nodes will be between 1 and 100.
2. The undirected graph is a simple graph, which means no repeated edges and no self-loops in the graph.
3. Since the graph is undirected, if node p has node q as neighbor, then node q must have node p as neighbor too.
4. You must return the copy of the given node as a reference to the cloned graph.

Solution:

## BFS with level order:

As mentioned, now we need to insert an additional for loop into the code. Here is how it works:

```"""
BFS with level order
"""
from collections import deque

seen = set()
queue = deque()

queue.append(start)
while len(queue):
size = len(queue)
for _ in range(size):
if neighbor not in seen:
queue.append(neighbor)```

Sample Problems:

LeetCode #102. Binary Tree Level Order Traversal (medium)

Given a binary tree, return the level order traversal of its nodes’ values. (ie, from left to right, level by level).

For example:
Given binary tree `[3,9,20,null,null,15,7]`,

```    3
/ \
9  20
/  \
15   7
```

return its level order traversal as:

```[
[3],
[9,20],
[15,7]
]```

Solution:

```# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution:
def levelOrder(self, root: TreeNode) -> List[List[int]]:
if not root:
return []

queue = collections.deque([root])
level_order = []

while queue:
level = []
size = len(queue)
for _ in range(size):
node = queue.popleft()
level.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
level_order.append(level)

return level_order```

Next, we are going to introduce topological sorting. Topological sorting is a hidden gem that many people forget to prepare before coding interviews. This topic only takes a few pages in standard algorithm textbooks, but it’s a rather important topic that most tech companies want you to know.

## Time Complexity:

In short, the time complexity of BFS is O(|V|+|E|), where |V| is the number of nodes and |E| is the number of edges. Why? Note that the while loop is executed at most once for each vertex, since we use the set `seen` to avoid any revisiting on node. This gives O(|V|). The for loop inside the while loop is executed at most once for each edge, since each node is popped out of the queue at most once, and we visit its neighbors through edges if and only if the node is popped. This gives O(|E|). Combining the two loops gives us O(|V|+|E|).

## Topological Sorting:

Topological sorting works for directed acyclic graphs (DAG) only, and the goal is to find a topological order of the given graph. The graph may have more than one topological order, and we only need to find one of them. By topological order, we mean

Algorithm:

1. Record the in-degrees of all nodes, and initialize an empty list “topological_order”.
2. Append all nodes with in-degree 0 to the queue.
3. Feed the nodes in start_nodes to topological_order one by one. Each time we release a node,
we visit all its neighbors, and subtract 1 from the in-degree of each neighbor.
4. If the indegree of any node becomes 0 after this subtraction, we append this node to the queue.
5. If the queue becomes empty, stop the while loop and return topological_order.

Template:

```"""
Definition for a Directed graph node
class DirectedGraphNode:
def __init__(self, x):
self.label = x
self.neighbors = []
"""

class Solution:
"""
@param graph: A list of Directed graph node
@return: A list of integer
"""
def topSort(self, graph):
node_to_indegree = self.get_indegree(graph)

# BFS
topological_order = []
start_nodes = [n for n in graph if node_to_indegree[n] == 0]
queue = collections.deque(start_nodes)

while queue:
node = queue.popleft()
topological_order.append(node)
for neighbor in node.neighbors:
node_to_indegree[neighbor] -= 1
if node_to_indegree[neighbor] == 0:
queue.append(neighbor)

# Record indegree of all nodes
def get_indegree(self, graph):
node_to_indegree = {x: 0 for x in graph}

for node in graph:
for neighbor in node.neighbors:
node_to_indegree[neighbor] += 1

return node_to_indegree```

# Week 1: Basic Algorithms

Continuing last week, we are going to go over those basic algorithms that appear in coding interviews frequently. Note that the content below is just an overview, and we will discuss them later with full details. In this week, just learn what they are and how they work superficially, in BFS manner.

## Recursion and Divide and Conquer:

LeetCode #50. Pow(x, n) (medium)

LeetCode #169. Majority Element (easy)

More problems can be found here:

## Greedy Algorithm:

Greedy Algorithm usually doesn’t appear in a coding interview. In fact greedy algorithm is not a certain algorithm, maybe we should just call it “greedy approach”. If you design a perfect algorithm using greedy approach for a problem, most likely it won’t apply to another problem, even they look similar. That is, greedy approach is not universal, and it requires quite a bit cleverness. As a result, this category doesn’t appear in coding interviews that often. That means, you don’t have to study really hard on this topic. Your task in this week is just to read through and memorize the following solutions, and you are good to go.

Given an array of size n, find the majority element. The majority element is the element that appears more than `⌊ n/2 ⌋` times.

You may assume that the array is non-empty and the majority element always exist in the array.

Example 1:

```Input: [3,2,3]
Output: 3```

Example 2:

```Input: [2,2,1,1,1,2,2]
Output: 2```

Algorithm:

The main idea is that if we eliminate every pair of majority element and non-majority element, the majority element will still be majority in the new list, since it appears more than `⌊ n/2 ⌋` times. The algorithm is the following: suppose the first element in `nums` is the majority element, and initialize `count` to 1. Loop the rest of `nums`. If the next element is still the same, increment `count`; otherwise, decrement `count`. If `count` is decreased to 0, suppose the next element is the majority element, and keep looping. The code is here:

Solution:

```class Solution:
def majorityElement(self, nums: List[int]) -> int:
majority = nums[0]
count = 1
for i in range(1, len(nums)):
if majority == nums[i]:
count += 1
else:
count -= 1
if count == 0:
if i == len(nums):
break
else:
majority = nums[i + 1]
return majority```

Analysis:

This algorithm uses a for loop that runs $n - 1$ times, and all operations inside the for loop takes time $O(1)$, so the overall running time is $O(n)$. It is also an in-place algorithm, thus takes $O(1)$ space.

LeetCode #122. Best Time to Buy and Sell Stock II (easy)

## Rabin-Karp Algorithm:

LeetCode #28. Implement strStr() (easy)