This 10-week data structures and algorithms series is going to be an essential project in my blog. I am preparing for a Google interview, which is the main reason why I am writing this. I feel it would be valuable if I take notes to record my journey, for reviewing the knowledge that I learn along the way, also for you guys who need sources to prepare for future coding interviews.
A little background about myself: I am studying at UCLA, major in pure mathematics, with a research orientation on cryptography. I am a Python person, so that Python is the official programming language on my website. I haven’t received formal CS education besides theoretical cryptography in school, which means I have gone through all these coding interview preparations by self-studying. However, I don’t consider it a disadvantage. Self-studying is even more effective than taking courses in college, in my opinion. Therefore, don’t worry if you have little background in CS. As long as you follow my plan, I guarantee you can at least reach my level.
At the very beginning, you need an easy read to get started. Your first data structure and algorithm should be entertaining and straightforward, just for the sake of your mental health. Please don’t choose books like CLRS as your first book, unless you think you are von Neumann. As a reference, I suggest you go with Grokking Algorithms. This book covers basic data structures too, so the title is a bit misleading. The code examples are either in Python or pseudocode. After reading it, I highly recommend you to buy this book without hesitation: Problem Solving with Algorithms and Data Structures Using Python. By the time I am writing this line, the price on Amazon is $40. It’s definitely worth the money. It is comprehensive, easy to read, and simply enjoyable. More importantly, it uses Python. Writing Python code in a coding interview will save you tons of time and energy, and you don’t need to worry about integer overflow and memory allocation kind of stuff. However, I don’t recommend to use the codes listed in this book directly in practice. For each topic, please refer my posts and use the templates in them.
While reading books, if you find LeetCode is hard to follow and problems are a little difficult for you, I recommend you to check out this program on educative.io. The first two courses (data structure refresher and hacking the coding interview) can help you to get some practice along with the reading. Note that the system design course in it is very helpful, so make sure you do that too. Spend at least 20 hours on system design is a good investment, even if you are new grad (Google starts to ask system design questions in internship interviews). Knowing basic concepts about system design can make you more confident in an interview.
After these prerequisites, you can start reading my posts and doing LeetCode. The main focus of this series is the “medium” problems on LeetCode, which covers around 80% of a coding interview. Most people are able to handle these medium problems with a proper study plan, so you should spend most time on them. Hard problems are also likely to appear if you are interviewing with big companies, such as Google or Facebook. Anyway, if you are aiming for a higher goal, it is always necessary to put in greater effort.
Enough talking, let’s get to the point. Here is a long list of practice problems that you should do on LeetCode, before moving to Week 1. These problems will show you a big picture of what you would expect to learn in my posts. I will only write solutions on those important problems. That is, either they appear in interviews more frequently than others, or the concept is important for future study.
Make sure you know and understand the time complexity of operations on basic data structures, such as
remove and etc. The table can be found here:
Why is it important? Suppose you want to optimize the code inside a loop to . If you use
pop(0) method on a list to pop out the first element and you assume the operation is , your whole algorithm will be messed up. When you call
pop(0), the method will delete the first element and move all other elements to the left by index 1, so it’s . These subtle things could be crucial for the correctness on algorithm design.
List vs Linked List:
Python list is a dynamic array that can hold different data types in it, so it’s very different from C array. This flexibility has a trade-off: Python list takes more space in memory than C array. So if you have a huge data set with only one data type, and you only want to use simple operations on this data set, you should go with C array, also known as the
array module in Python. Insertion and deletion in standard list takes time in worst case, since the elements after the insertion/deletion point need to be moved to the correct position to maintain the ability of random access, which is the key for allowing time random access for list. Do not confuse accessing with searching. During accessing, you pass into an index; during searching, you pass into a target value. Thus the best you can hope for searching is using binary search, not . In contrast, insertion and deletion in linked list takes time, since the only real operation is modifying pointers. Random access in linked list takes time in worst case, since we need to traverse from the head node and search each node one by one. Consider this trade-off when you need to pick list or linked list as the primary data structure for your task.
In a big picture, all data structures fall into these two categories. A data structure is either a special list or a special linked list. For example, hash table is a special list and tree is a special linked list. You will understand this part better after learning all data structures.
LeetCode #206. Reverse Linked List (easy) (memorize it)
LeetCode #141. Linked List Cycle (easy) (memorize it)
Stack and Queue:
In Python, stack is represented by a list, that is, we can initialize a stack using
stack = . The List ADT supports
pop at the right end of a list, which perfectly follows “First In Last Out” (FILO) principle of a stack. Just as a fun fact, note that
collections.deque() can be used as a stack, too. If you only use the
pop() methods, the resulting data structure is FILO. I don’t recommend to use deque as stack though, as it supports other methods like
popleft(), and that can be problematic if you use the wrong methods accidentally.
In contrast, queue is “First in First Out” (FIFO), a mirror data structure of stack. In Python, we always want to use
queue = collections.deque() to realize the functionality of a queue, which seems more than necessary, but using other modules can be problematic, and here is why. If you use a regular list, the insertion and deletion will be time, which is far from ideal. If you use
queue = queue.Queue(), when you pop element from empty queue, it raises an error. Using deque would circumvent both these two disadvantages. Deque is implemented using doubly linked list at low level, so randomly accessing an element in the middle takes time from traversal. Hence we want to use append and pop only, for that they only take time. We can make a convention here to use
popleft() for representing a queue. Visually, elements come into the queue from the right end and leave the queue from the left end. In later posts you will see that queue is heavily linked with BFS.
LeetCode #20. Valid Parentheses (easy) (memorize it)
Hash Table and Set:
In Python, hash table is represented by dictionary. It supports search, insertion and deletion in time, which is too good to be true. The trade off here is that dictionary is unordered. If you want to store elements in a relatively ordered data structure, you can use binary search, which supports search, insertion and deletion in time. All operations for both data structures mentioned above will decay to time in worst case. Both of them have worst case space complexity.
Also, make sure you understand the theory behind hash table, such as why hash table is fast for lookup (use hash function for indexing), how to design a proper hash function (should be fast and with minimum hash collision), and how to deal with hash collision if any occurs (open addressing and chaining).
Set is an unordered collection of unique elements, which means it contains no duplicates. It is often used to record the elements that had been visited, for example, the set
seen in Two Sum and BFS.
LeetCode #1. Two Sum (easy) (memorize it)
LeetCode #15. 3Sum (medium) (memorize it)
Binary Tree and Binary Search Tree (BST)
Binary tree has a natural divide and conquer flavor, as a problem on binary tree itself always reduces to subproblems regarding its left subtree and right subtree. Thus, most questions on binary trees have easy solutions using recursion. However, you should memorize the iteration solutions too, just in case any follow-up question comes after.
The performance of BST was mentioned in last section, in contrast with hash table. One important thing to note is that in the definition of BST, we need every node on the left subtree to has smaller value than the value of the root, and every node on the right subtree to has larger value than the value of the root. A very common mistake is to only compare left child and right child to the root, which is a misunderstanding on the definition of BST.