The post Facebook Interview – Longest Substring Without Repeating Characters appeared first on Gainlo Mock Interview Blog.

]]>If you are following our blog, you might notice that we start to provide coding solutions at the end of each question. At Gainlo, many people have the feeling that questions asked by Google/Facebook are much easier than they expected, but they still didn’t make it. The most common reason is that they failed to provide solid code in a short time.

The surprising truth is that 90% of questions asked in interviews are basic, but the expectation is high – candidates should be able to write bug-free code quickly. And this is exactly what our blog posts are focused on.

In this post, we’re going to talk about how to analysis this popular question and more importantly, tips and tricks on writing bug-free code.

Given a string, find the longest substring without repeating characters.

For example, for string “abccdefgh”, the longest substring is “cdefgh”.

This question has been asked by Facebook recently. In addition, the question is not too hard to solve, but providing solid code in a short time requires some work. And this is the most common type of question in interviews.

As you may already know, it’s recommended to come up with the simplest solution first and then you can keep optimizing. Obviously, we can go over all the substrings and find the longest one without repeating characters. This will give us O(N^4) time complexity (O(N^2) to get all substrings, O(N^2) to check if each substring has dups) and O(1) space complexity. Although it’s clear that the solution is too slow, we should still mention that to the interviewer.

Despite the fact that the solution is naive, it makes us clear that the next step is to make it faster. And the most common approach to speeding up your program is to use more memory. In other words, **we can sacrifice the space complexity to improve the time complexity**.

If you are able to think in this way, things become much clearer. A hashset is the most common data structure to detect dups and with a hashset, we can check if a substring has dups in O(N) time.

Now, the bottleneck is at iterating over all substrings, which has O(N^2) complexity. How can we make it faster?

In fact, there are lots of unnecessary checks. If we already know a substring S has duplicate characters, there’s no need to check any substring that contains S. If you have read our previous post – If a String Contains an Anagram of Another String, you should realize how similar these two questions are.

Take string “abccdefgh” as an example. When we realize that substring “abcc” has dups, we can immediately abandon the whole substring and check substrings from “cdefgh”. Thus, we have the following approach:

- Keep two pointers (start, end) that denotes the current substring we’re checking and both has an initial value of 0.
- Each time, move forward end pointer by 1 character and use the hashset to check if the newly added character has already existed.
- If not, keep moving end pointer forward.
- If the new character is a duplicate one, move the start pointer all the way to pass the duplicate character and pop out all these characters from the hashset.

- Repeating step 2 and output the longest substring without dup after finishing the whole string.

In essence, as we know processing all substrings are unnecessary, sliding window can be a great technique to improve that and the same technique has been used in many problems like Subarray With Given Sum.

The final solution has O(N) time complexity and O(N) space complexity.

If you haven’t tried to write the code please do so. Once you finished, try the following three test cases:

- “abcadbef”
- “abac”
- “aaaaa”

Here’s the solution:

If you write the code to solve it, you may notice that it’s not easy to make it bug-free easily. You may need to address several edge cases to make it work. That’s exactly the point of this question. Many people are able to come up with the solution, but very few of them can write the perfect solution, especially with a time constraint.

Here I’d like to share a couple tips in terms of the code above:

- It’s important to validate the input in the beginning. This should be the first thing when you write the solution without even thinking. If you don’t know what to do with invalid input, ask the interviewer.
- There are several edge cases you need to consider. To begin with, when you use start and end pointer to denote a substring, make it clear whether it’s an open or close interval, which means if the substring is [a, b) or [a, b]. In our code, we make it [a, b), but you can use either so long as it’s consistent across the code.
- When a duplicate character is found, you should be very careful about the logic after “while start < end – 1”. Although there are only 7 lines of code, it’s quite easy to make mistakes. What we are doing here is to pop all the characters from start until we find the duplicate character, in which case we won’t pop out the dup as it’s not added.
- It’s hard for everyone to make it bug-free for the first time, but you should examine with 2-3 test cases.

I hope the biggest takeaway from the post is to care more about the coding part if you are used to solving problem in your mind. Most of the interview questions are not too hard to think, but very few people can get the code right. That’s why most people found the question simpler than expected but still failed in the end.

By the way, if you want to have more guidance from experienced interviewers, you can check Gainlo that allows you to have mock interview with engineers from Google, Facebook etc..

The post Facebook Interview – Longest Substring Without Repeating Characters appeared first on Gainlo Mock Interview Blog.

]]>The post 3Sum appeared first on Gainlo Mock Interview Blog.

]]>We haven’t covered many topics about numbers in the past. Since 3sum questions have been asked by Google and Facebook recently, it’s a great time for us to analyze this topic in detail now.

*Determine if any 3 integers in an array sum to 0.*

For example, for array [4, 3, -1, 2, -2, 10], the function should return true since 3 + (-1) + (-2) = 0. To make things simple, each number can be used at most once.

If you have read The Interviewer’s Expectation, you should be aware of the strategy that starts with the simplest solution. In fact, you should come up with the following brute force approach within a second.

To brute force the result, we can simply write 3 loops to iterate over all the triplets and check if each of them sums up to 0. Apparent, this will give us O(N^3) solution, which is not optimal.

Let’s see how we can optimize it. I believe that most people have practiced with the popular question 2sum – given a sorted array, find 2 numbers that sum up to S. To solve this in O(N) time, we can keep two indices – one in the beginning (start) and the other in the end (end). If the sum of the current two numbers is greater than S, we move the end pointer backward by one step. If the sum is smaller than S, we move the start pointer forward by one step.

When the two pointers meet each other, we know that no two numbers sum up to S. The reason we can make it O(N) is that the array is sorted and we don’t need to check all the combinations.

Note: If you are not familiar with 2sum, you should really practice with more questions. 2sum is something I expect candidates to solve in a minute.

With the previous analysis, we can use the same technique on the 3sum question. As 2sum solution, let’s sort the array first. Now if we fix one number X in the array, the problem becomes finding 2 numbers that sum up to -X, which is exactly the 2sum question and can be solved in O(N) time.

Therefore, we can iterate over each number and inside the loop, solve the sub-problem as 2sum. To calculate the time complexity, sorting is O(NlogN), the outside loop is O(N) and the inside 2sum is O(N). Therefore, the overall time complexity is O(N^2) and space complexity is O(1).

*Find 3 integers in an array whose sum is closest to 0.*

Many candidates find this variation difficult. However, after analyzing the problem, you’ll see how similar it is to the basic version and in essence, they are exactly the same.

Again, let’s start with the simple case – find 2 integers whose sum is closest to S. It should be very obvious to you that we can use exactly the same solution as before – sort the array and keep two pointers (start and end).

- If the sum of current two numbers is greater than S, we move end pointer backward by one step.
- If the sum is smaller than S, we move start pointer forward by one step.
- After start and end meet each other, we can output the pair whose sum is closest to S.

By sorting the array, we don’t necessarily need to check all the combinations, thus save a lot of computations. So back to the 3sum question, we can get a very similar solution.

- Sort the array.
- Iterate over the array by fixing one integer X at a time.
- Find the 2 integers from the rest numbers whose sum is closest to -X.
- At the end of the iteration, we can output the result.

The solution has the same time and space complexity.

*Determine if any 3 integers in an array sum to 0. Each number can be used multiple times.*

For example, for array [4, 3, -1, 2, 5 10], the function should return true because 2 + (-1) + (-1) = 0. In fact, compared to the basic solution, all we need to do is to handle duplicate cases.

The first case is 0, if 0 is in the array, we can output true directly since 3 zeros sum up to 0.

The second case is using one number twice (as the example). In 2sum solution, the only change we need to make is checking if S/2 is in the array. If we find S/2, we can output true. In 3sum solution, in order to check duplicates, only one tiny change is needed. When iterating over the array, the 2sum sub-problem should use the whole array rather than excluding the current number. By doing this, the current number being iterated can be used multiple times.

A few important takeaways from this question that I hope you keep in mind:

- 2sum is a very important question and the idea of sorting the array and using two pointers to skip unnecessary checks has been used in many problems.
- When the problem is complicated, try with a simpler version. In this post, we started with 2sum instead of solving 3sum directly.
- For array problems, sorting the array first may significantly simplify the problem.

As a homework, another variation is to find 3 integers whose sum is closest to 0 and each number can be used multiple times. Leave a comment if you can’t figure this out.

The post 3Sum appeared first on Gainlo Mock Interview Blog.

]]>The post Meeting Room Scheduling Problem appeared first on Gainlo Mock Interview Blog.

]]>If you have taken many coding interviews, you will know that a lot of questions are quite close to real life projects and don’t have a focus on specific data structures. Some people find it hard at first glance. However, with some analysis, you’ll realize that there are no different from other questions.

In this post, we’ll mainly talk about interval problems and some common techniques you can use when getting stuck.

*Given a list of meeting times, checks if any of them overlaps. The follow-up question is to return the minimum number of rooms required to accommodate all the meetings.*

Let’s start with an example. Suppose we use interval (start, end) to denote the start and end time of the meeting, we have the following meeting times: (1, 4), (5, 6), (8, 9), (2, 6).

In the above example, apparently we should return true for the first question since (1, 4) and (2, 6) have overlaps. For the follow-up question, we need at least 2 meeting rooms so that (1, 4), (5, 6) will be put in one room and (2, 6), (8, 9) will be in the other.

The meeting room scheduling problem was asked by Facebook very recently and there are quite a few similar problems.

Again, if you haven’t seen this problem before, spend a couple of minutes to think about it.

Let’s start with the first question – check if there are overlaps. I used to interview people with a similar problem and I’ve seen that quite a few candidates tended to enumerate “all cases” with tons of if-else clause. In reality, you are never done because there are always some corner cases your algorithm won’t cover.

There are basically two suggestions here. First, it’s always recommended to start coding after you are confident enough about your solution. Many people like to write code while thinking, which won’t really work in an interview since you don’t have much time. If you think thoroughly about the enumeration approach, it shouldn’t be hard to know that it won’t work in certain cases.

Secondly, **when you get stuck, a good approach is to manually try some examples** and understand how it works. Let me explain this more. If you manually check if there’re overlaps in the above example, you’ll need to draw each interval in a 1-D space and check if any two intervals overlap. One lesson we learned from this example is that only two “nearby” intervals are able to overlap, which indicates that we may sort all of them and then only check nearby intervals.

In fact, sorting intervals by the start time is an extremely common approach in interval questions. After we sort all the intervals, we can just traverse and check if the next interval’s start point is larger than the current one’s end point.

There’s another way to do that. If you see the sorted intervals as an array of start and end points in order, you can keep a counter. Go over the points one by one in order. When you get a start point, increase the counter by one and decrease by one when it’s an end point. If there’s no overlap, the counter can never get above 1.

The time complexity of sorting is O(NlogN) and the checking overlap is O(N). So overall it’s O(NlogN).

How many meeting rooms do we need at least in order to accommodate all the meetings?

With all the analysis above, it shouldn’t be hard to solve the follow-up question. Think about this: if there’s a timestamp that has 3 intervals covered, we need at least 3 meeting rooms. Otherwise, at least two of those 3 meetings have conflict. With that in mind, we know that if we can find the timestamp with most intervals covered, then the number of covered intervals is the number of meeting rooms we need.

So how to calculate the maximum number of covered intervals? In fact, following the above counter analysis, the max value of the counter is the maximum number of covered intervals. This is because for a given timestamp if there are N unclosed start point in front it, this timestamp must be covered by N intervals.

Some people might find it hard to figure this out. And I’ll tell you that the solution doesn’t come from nowhere. If you try with several examples and draw them on a paper, you will realize how obvious it is.

The biggest thing in this post is to try with examples. Whenever you get stuck, you should consider trying few examples to understand how you can solve the problem as a human. In fact, you might be able to copy the same strategy in a program. Also, some examples can simplify the problem and make it easier to get the essence.

Another thing is that sorting and iterating over the sorted intervals are very common techniques in similar problems. For instance, another popular question is to merge a list of intervals that might have overlaps. After reading this post, you should be able to solve this question with no problem at all.

The post Meeting Room Scheduling Problem appeared first on Gainlo Mock Interview Blog.

]]>The post Lowest Common Ancestor appeared first on Gainlo Mock Interview Blog.

]]>On second thought, this makes a lot of sense. Tree is one of the most useful and fundamental data structures in real products. For instance, tree structure is widely used in machine learning like decision trees. What’s more, tree related interview questions can cover a lot of topics like iteration and recursion.

This week, I’m going to talk about LCA (Lowest Common Ancestor) problem with some extensions. Again, the answer itself is not important and what this post focuses on is how to come up with the right idea and how to analyze the problem from scratch. I’ll cover topics including recursion, big-O analysis, iteration and so on.

*Given a binary tree and two nodes, how to find the common ancestor of the two nodes?*

In the above example, for nodes 5 and 4, the lowest common ancestor is 1. And for nodes 6 and 2, the ancestor is 0. The question has been asked by both Google and Microsoft recently.

Think about the problem by yourself before reading the rest of the post.

Here we go. IMHO, tree problems are relatively easier compared to other data structures in coding interviews. The reason is that there are clear patterns to solve tree problems and they are related to the basic concepts. Let me explain this in detail.

If you have read our previous post – Deepest Node In a Tree, you should know that generally there are two basic concepts about tree problems:

- Traverse. You should be very familiar with traversing a tree. Things like in-order traversal, BFS should come to your mind immediately.
- Recursion. Since it’s very easy to divide a tree problem to subproblems (left and right sub-trees), recursion is one of the most common techniques.

Let’s see how these two concepts can be used to solve LCA.

It should be already clear to you if you know traversal can be used. Let’s say for nodes A and B, it should be very easy to get the path from the root to the nodes. I’ll skip the discussion about how to get the path as it should be easy for you.

Once you have two paths – root to A and root to B, you can just iterate over the two paths simultaneously and the last common node is the lowest common ancestor. Some people find it hard to think about getting the path, that is because they are not familiar with traversal. Once tree traversal has become your basic tools, everything comes naturally.

What is the complexity of the algorithm? To get two paths, we need to traverse the tree twice. Finding the common node also requires one iteration. So the time complexity is O(N). Space complexity is also O(N) as we need extra space to store the paths.

The disadvantage of the previous solution is that it needs to iterate 3 times with extra space. Let’s see if we can make it better.

To use recursion, you need to figure out two things: 1. What is the end point? 2. How to combine sub-problem solutions to solve the bigger problem?

First, we can get LCA from both left tree and right tree (if exists). If the either of A or B is the root node, then the root is the LCA and we just return the root, which is the end point of the recursion. As we keep divide the tree into sub-trees, eventually, we’ll hit either A and B.

To combine sub-problem solutions, if LCA(left tree) returns a node, we know that both A and B locate in left tree and the returned node is the final result. If both LCA(left) and LCA(right) return non-empty nodes, it means A and B are in left and right tree respectively. In this case, the root node is the lowest common node.

You might be confused why it’s possible for both left and right return non-empty nodes. This is because we assume that A and B must exist in the tree so that in any of the subproblem whose root is A or B, we just return the root.

Since we traverse all nodes at most once without external space, both time and space complexity is O(N).

*What if each node has a parent pointer that points to its parent?*

In fact, this is an even simpler problem. With the parent pointer, you can get the path from A/B to the root. Then we can easily get the LCA as the first solution.

However, the time complexity here is O(h) where h is the height of the tree. This is because to get the path to the root, you don’t need to traverse the whole tree as before. Similarly, the space complexity is also O(h).

I believe that with more posts about tree interview questions, you are more familiar with this data structure.

- Basic concepts like traversal. As you can see, once you are familiar with these basic techniques, it’s quite easy to come up with the right idea. If you find yourself confused about particular concepts, do go back and review the textbook.
- Recursion. We’ve been using recursion to solve tree problems for so many times.

The post Lowest Common Ancestor appeared first on Gainlo Mock Interview Blog.

]]>The post Flatten a Linked List appeared first on Gainlo Mock Interview Blog.

]]>In this post, I would focus on topics including linked list manipulation, queue, BFS and summarize some common techniques as before.

*Given a linked list, in addition to the next pointer, each node has a child pointer that can point to a separate list. With the head node, flatten the list to a single-level linked list.*

For instance, the above linked list should be flattened to 1→2->3→4->5->6->7->8->9. The idea is to flatten the linked list by level. Note: this question was asked by Facebook a month ago.

It’s better to compare this question with Print Binary Tree problem** **together since both of them are about traversal. In essence, a linked list with two pointers is almost the same as a binary tree.

If you think about the question for a little while, it shouldn’t be hard to realize that this question is exactly the same as print a binary tree by level.

**When we need to traverse a tree or graph or any data structure by level, the first thing comes to our mind should be breadth-first search (BFS) and the data structure associated with it is queue. **This should be something coming to your mind in less than a second.

So a very straightforward solution can be described like this:

- Start from the head node and traverse the linked list following the next pointer.
- When a node has a child node, put the child node into the queue.
- When the next pointer of the current node is null, pop the queue and repeat from the step 1 using the popped node.

The complexity is linear for both time and space since we only need to traverse or store each node at most once.

For linked list question, sometimes you can do the operation in-place without external storage. Let’s borrow the idea from reversing a linked list. If somehow we could modify the pointer while traversing the linked list, we can flatten the list to single-level without a queue.

Similar to reversing a linked list, we should need two pointers (A, B) that both points to the head initially. The basic idea is to let A keeps moving forward and B is used to locate the first node of next level. So whenever A gets stuck, it will point to the node of next level using B. The flow is like following:

- Move A forward (following the next pointer) until the next node is null.
- Move B forward until the current node has a child.
- Set A’s next pointer to B’s child and clear B’s child pointer (set to null).
- Repeat the whole process. You’ll notice that the next time A gets stuck, B will go thru the original path A has gone thru.

I have to say that this solution is a really hard one and I expect that most candidates won’t be able to come up with this. In this solution, the time complexity is still linear (although some nodes will be traversed twice) but space complexity is O(1).

- Traverse by level, BFS and queue go hand in hand. You should spend less than a second to come up with these.
- Linked list problem sometimes can be solved without external storage. Try to borrow ideas from reverse a linked link problem.
- I would expect you to spend less than few seconds coming up with the time/space complexity. Otherwise, you need more practice.

The post Flatten a Linked List appeared first on Gainlo Mock Interview Blog.

]]>The post Second Largest Element of a Binary Search Tree appeared first on Gainlo Mock Interview Blog.

]]>BST is a very good data structure to ask in coding interviews. Here are the reasons:

- BST is a widely used data structure and it’s important to know it.
- It’s neither too complicated nor too simple to ask in a coding interview.
- Many concepts like traversing, recursion can be covered with BST related questions.

This post will cover topics like BST, recursion and some tips about coding as well.

*Given a BST (Binary search tree), find the second largest element of it.*

I won’t further explain the question since it’s very straightforward. It’s also worth to mention that this question has been asked by companies including Microsoft, Google, Facebook and so on recently.

Unlike some array related interview questions, tree questions usually require some basic knowledge to solve. If this is the first time for you to know BST, I highly recommend you go check the definition and related operations of BST before moving on to the rest of this post.

More specifically, you are expected to know the following concepts:

- What a BST is and how BST is different from hash when doing the lookup.
- How to insert, lookup, delete a node from a BST and
**the corresponding time complexity**. - How to traverse a BST.

If you are quite familiar with all of these, you should be able to come up with this very straightforward solution: First, find the largest element of the BST and delete it. Then, find the largest element of the current BST, which is the 2nd largest element of the original BST. Apparently, this is not efficient and I’ll cover the better solution next.

If you have read our previous post about tree problem, you should know that **recursion is an extremely common technique to solve tree related question. **The idea is very simple, by solving the subproblem from the left and right subtree, we can combine these two results with the root node to solve the whole problem.

Another thing you should think about is that **in order traversal of a BST will return all the elements in order**. This is not something fancy, you should just know it since it’s one of the fundamental concepts of BST.

By combining the above analysis, we have the following solution:

- Do the in order traversal of the given BST.
- Keep track of the index of each visited element and when it’s the 2nd one, output as the 2nd largest element.

As you can see, the solution doesn’t come from nowhere and I’d like to illustrate how to get the solution step by step, which is the whole point of this post. If you find it hard to implement the code, don’t worry since we’ll provide some tips soon.

*What if we want to get the k-th largest element of the BST?*

You can see that the above in-order traversal solution can easily be extended to get the k-th largest element and all we need to do is to output the k-th visited element instead of the second. In addition, I hope you can come up with this follow-up question in preparation. By removing or generalizing particular conditions, the question can be much more interesting.

Some people also find it hard to implement the code, I’ll provide few tips here:

- You don’t need to store all the visited elements into an array and find the k-th element. Instead, use a global variable i to record the index of visited elements. Inside the traversal function, every time when you visit an element, just increment i by one and when i == k, output the current element.
- Be careful about cases where there are less than k elements in the BST.
- You should handle empty tree as well.
- Don’t forget to check if left and right are null when traversing the tree.

Tree problems are usually easier since there are many techniques can be re-used to solve other similar questions. Some takeaways include:

- Be very familiar with basic concepts/operations of BST, especially the time complexity and pros and cons when comparing with other data structures.
- In-order traversal will give you all elements in order.
- You should definitely consider recursion when solving tree related problems. It may not work for all questions, but it’s an extremely common techniques and can make your code concise.

The post Second Largest Element of a Binary Search Tree appeared first on Gainlo Mock Interview Blog.

]]>The post Subarray With Given Sum appeared first on Gainlo Mock Interview Blog.

]]>Also, in our previous posts, we didn’t cover much about array problems. So it’s definitely worth to analyze this problem in depth. In this article, we will talk about topics including array, sliding window, recursion and DP (dynamic programming).

*Given an array of non-negative numbers, find continuous subarray with sum to S.*

Not only has this question been asked by Facebook recently, but it can be extended with different follow-up questions. I’m pretty sure you’re going to learn a lot from this post. Again, the goal is not providing “standard answers”. Instead, we want to show you how you can come up with the right approach and how to analyze the problem from scratch.

After reading the description, there are two keywords that should catch your attention – “**non-negative”** and “**continuous”**.

When all the numbers are non-negative and the problem is about the sum of subarrays, one common idea is that adding more numbers to the subarray can only increase the current sum. In other words, if you already had a subarray with sum larger than S, you don’t need to consider including more numbers, which will significantly simplify the problem and make the algorithm faster. Keyword “continuous” is even easier to understand. It tremendously reduces the possible subarray candidates to consider.

Again, think about the problem by yourself before reading the rest of this post.

The most naive solution is obvious. You can use 2 loops to check all continuous subarrays and see if any of them sum up to S. The time complexity is O(N^2) and apparently is not ideal.

Let’s talk about how to optimize it. As analyzed above, we should take advantage of non-negative numbers. If the analysis doesn’t sound familiar to you, I highly recommend you check If a String Contains an Anagram of Another String. You’ll see how similar the approach is, although it’s a string problem.

We’ll use the same technique – sliding window.

- Start with two pointers that represent the subarray.
- If the sum of current subarray is smaller than S, move the end pointer one step forward.
- If the sum is larger than S, move the start pointer one step forward.
- Once the current sum equals to S, we found the target.

The reason we can use sliding window here is that adding more numbers can only increase the current sum and the two pointers never need to go back to search candidates. This makes the algorithm has O(N) time complexity with no extra memory used.

What if we remove the two restrictions, so the problem becomes – *Given an array of numbers, find subarray with sum to S.*

Again, let’s start with the most naive approach – check all subarray candidates. As you can see, even this approach is not easy to implement. First, you can’t use 2 or more loops to get all subarrays. Second, there are totally 2^N subarrays. The most important lesson here – **whenever you see issues like this (can’t use loops to check all possibilities or an exponential number of candidates), you should immediately think of recursion or dynamic programming.**

Another reason you should think of recursion is that the problem can be divided into sub-problems. Suppose you want to include the first element a[0], then the problem becomes to find subarray from a[1:N] with sum to S-a[0]. Similarly, if a[0] is not included, we just need to solve the problem from a[1:N] with sum to S.

This should be very straightforward to you and it shouldn’t be hard to implement the code.

The downside of the above recursive solution is time efficiency. As you can see that many subarrays have been calculated for multiple times. Thus, the most common way to optimize this is to save temporary results into memory, which is so-called dynamic programming. Check A Step by Step Guide to Dynamic Programming if you want to know more.

The first step is to figure out how to denote a sub-problem. The sub-problem here is to find subarray from a[i:N] with sum to M. Therefore, we can use (i, M) to denote a sub-problem and the result can be saved in a 2D array. The high-level pseudo code is like this:

In a nutshell, if the sub-problem has been solved, return directly. Otherwise, solve the two sub-problem (with and without a[i]). The last step is to store the result back to memory.

- I hope you can be very sensitive to words like “non-negative” and “continuous”.
- This is at least the 4th time we’ve seen sliding window technique, although previously we were using in string problems.
- whenever you see issues like (can’t use loops to check all possibilities or an exponential number of candidates), you should immediately think of recursion or dynamic programming.
- The dynamic programming solution can be optional.

The post Subarray With Given Sum appeared first on Gainlo Mock Interview Blog.

]]>The post Duplicate Elements of An Array appeared first on Gainlo Mock Interview Blog.

]]>There are many other factors being evaluated during an interview. For instance, your analysis process is at least equally important. More specifically, interviewers care a lot about how you approach a problem step by step, how you optimize your solution, how you compare different approaches and so on so forth.

So in this post, we want to focus more on discussion and analysis. You will learn a lot about what I mean by “solution is not important”. We start with a simple question, but there are a bunch of follow-up questions after that.

*Given an array of string, find duplicate elements.*

For instance, in array [“abc”, “dd”, “cc”, “abc”, “123”], the duplicate element is “abc”. Let’s start with this simple scenario and I’ll cover more follow-up questions soon. Also, as before, we only select questions that are asked by top companies. This one was asked by Uber, Facebook, Amazon recently.

I’ll skip the O(N^2) brute force solution that you compare each of two strings because it’s too obvious. One common technique is the **trade-off between time and space**. Since we want to make the algorithm faster, we can think of how to use more memory to achieve this.

I hope when you see “find duplicate”, you can think of hash set immediately since hash is the most common technique to detect duplicates. If we store every element into a hash set, we can make it O(N) for both time and space complexity.

Let’s extend this question a little bit. **What if the array is too large to put in memory?** Apparently, we have to store all those strings in files. Then how can we find duplicate elements?

Many people have almost no experience with “big data” that cannot fit into memory. But no worries, you will see the problem is not as hard as you thought. Let’s think about it in this way. We can load as many data as possible into memory and find duplicates with the same approach above, however, the problem is that we can’t compare data from separate batches. Does this problem sound familiar to you?

Again, I hope you can think about external merge sort, which solves exactly the same problem. Ok, the most obvious solution is to do an external sort over all the strings and then we can just compare adjacent strings to find duplicates.

There’s another way to do that. Since we can only load limited data into memory, we can only load strings that are possible to be duplicate. Let’s say we can pick k pivots like quick sort. Each time, we only load strings that are between [pivot i, pivot i+1] into memory and find duplicates if any.

How do we select k? We need to make sure each bucket can fit into memory, otherwise, we need to divide the bucket into multiple ones.

How do we evaluate the efficiency? Unlike normal big-O analysis, when file operation is involved, the bottleneck is always how many times of file operations are used. So there’s no obvious answer which approach is better, as long as you are trying to estimate the number of file operations, it’s good.

Let’s go one step further. **What if the array is too large to store on one machine and we have to distribute it to multiple nodes? **You will see how similar the problem is to the in-disk version.

We can first sort arrays in each of the machines. Then, we select a master machine and all the other machines send each string element one by one to the master in order. Thus, the master machine can easily find duplicate elements. This is exactly the same as the external merge sorting except it is using network to communicate.

Similarly, we can also split the array into shards and each machine stores one shard. More specifically, suppose machine k stores strings from “1000” to “5000”, then every other machine is responsible for sending strings within this range to machine k via network. Once it’s done, we can just find duplicate strings within a single machine. This is same as the pivot solution.

How do you evaluate the performance of the algorithm? This is not an easy question since in distributed systems there are quite a few factors we need to consider. The basic idea is that we need to quickly pinpoint the bottleneck. In a single machine, the key is to reduce the number of file operations. In a distributed system, more often than not the key is to reduce network requests.

If you can try to estimate the number of network requests needed with some reasonable assumption, interviewers will be impressed for sure. As you can see, for many interview questions, there’s no clear answer and even interviewers don’t know the solution. The point here is that as long as you are trying to solve the problem and provide reasonable analysis, you will get a good score.

I think the most important takeaway is to know that analysis is way important than the solution. As an interviewer, I don’t really like to hear answers like “I don’t know”. Instead, I’d like to see that candidates try hard to figure out the solution and keep telling me what’s in hid mind.

Besides, all the techniques used here like external merge sort are very common for disk problems and distributed system problems. You should not be scared when asking what if we scale this problem to disk or multiple machines.

Another advice is that whenever you solve some questions, try to ask yourself what if we expand the question to a larger scale.

The post Duplicate Elements of An Array appeared first on Gainlo Mock Interview Blog.

]]>The post Group Anagrams appeared first on Gainlo Mock Interview Blog.

]]>Instead, we focus on telling you how to analyze each question and how to re-use the same techniques in similar problems. At the end of each post, we’ll summarize some common strategies used in the question.

In this post, we are going to cover topics including hash map, string manipulation and sorting as well.

*Given a set of random string, write a function that returns a set that groups all the anagrams together.*

For example, suppose that we have the following strings:

“cat”, “dog”, “act”, “door”, “odor”

Then we should return these sets: {“cat”, “act”}, {“dog”}, {“door”, “odor”}.

Few reasons I selected this problem:

- It was asked by Facebook a month ago.
- Anagram is a really popular topic in recent interviews.
- Tons of techniques used in this problem can be reused in similar questions.

Again, try to think about this problem before moving on.

If you keep following our blog posts, this shouldn’t be the first time you see anagrams. In question If a String Contains an Anagram of Another String, we also covered this topic and some techniques will be used here as well.

If you have tried with some examples in this question, you should notice that the key is to check if two strings are anagram because with this issue solved, you can easily tell which strings should be grouped together.

To check if two strings are anagram – with same set of characters, one approach is to sort all characters and then compare if two sorted strings are identical. Since you will need to output the original string, you may need to keep it together with the sorted string. Therefore, we have this initial idea:

- Transform each string to a tuple (sorted string, original string). For instance, “cat” will be mapped to (“act”, “cat”).
- Sort all the tuples by the sorted string, thus, anagrams are grouped together.
- Output original strings if they share the same sorted string.

In fact, you will notice that step 2 is not efficient – O(nlogn) time complexity for sorting. In order to make it in linear time, you can use a hash map whose key is the sorted string and value is an array of corresponding original strings.

By doing this, you can reduce the time complexity of step 2 to O(N). However, the downside is that you need more space to store the hash map. Given that in step 1 we already need extra space (O(N)) for the sorted string, a hash map won’t change the final space complexity.

From our previous post, we mentioned about another simple approach to check anagram. If we map each character to a prime number and the whole string is mapped to the multiples of all the prime numbers of its characters, anagrams should have the same multiple. The benefit of this approach is that we can check if two strings are anagrams in linear time instead of O(MlogM) by sorting (M is the length of a string).

This question is a perfect example of time space trade-off. With a hash map, we can reduce the time complexity to linear, which is true for both the overall grouping and anagram checking. However, it requires additional space. Without a hash map, we need to do the sorting, which is slower.

The idea here is that when we want to make the algorithm faster, one direction to think about is to use additional space. Hash map or hash set are one of the most common data structures to consider. On the flip side, if we want to reduce usage of memory, we may consider slower the program.

As before, let’s summarize few techniques we used in this question:

- When we need to group similar things together, I expect data structures like hash map to come to your mind in 1 second. This is commonly used not only in coding interview questions but real life projects as well.
- To check anagrams, we can use the prime number approach or sorting.
- Time space trade-off is a very common approach when optimizing algorithms.

The post Group Anagrams appeared first on Gainlo Mock Interview Blog.

]]>The post Minimum Number of Deletions Of a String appeared first on Gainlo Mock Interview Blog.

]]>After a second thought, this makes sense in fact. String is a quite flexible data structure and many concepts can be covered from a string problem like hash, memory and so on so forth. In addition, it’s also a data structure you’re gonna use almost every day. That’s why many string interview questions are quite relevant to real world projects.

In this post, we’re going to talk about topics including string manipulation, dictionary, time complexity etc. and in the end, I’ll summarize several commonly used techniques as before.

*Given a dictionary and a word, find the minimum number of deletions needed on the word in order to make it a valid word.*

For example, string “catn” needs one deletion to make it a valid word “cat” in the dictionary. And string “bcatn” needs two deletions.

Dictionary has always been an interesting topic in string interview problems, which is part of the reason I’d like to cover this here. Also, this question was asked by Google recently.

Given that dictionary is so common in coding interview questions that I’d like to briefly summarize few strategies/techniques here.

- To store a dictionary, usually people will use data structures including Hash set, Trie or maybe just array. You’d better understand pros and cons of each of them.
- You may choose to have a pre-processing step to read the whole dictionary and store into your preferred data structure. Since once it’s loaded, you can use it as many times as you want.
- If the dictionary is not too large, you may take the dictionary traverse time as a constant.

Coming back to this problem, if we assume the dictionary can be traversed quickly (not too many entries), one approach is to go through each word in the dictionary, calculate the number deletions required, and return the minimum one.

To calculate the number of deletions efficiently, we’ll use the common technique here. One fact is that if a longer string can be transformed to a shorter one by deleting characters, the longer string must contain all the characters of the smaller one in order. If you have noticed this fact, then you should know that we only need to traverse the two strings once in order to get the deletion number.

More specifically, we put two indices (L for the longer string, S for the shorter string) pointing to beginning of each string. If the two characters under the indices are different, move L forward by one character. If the two characters are same, move both forward. If S comes to the end, it means the longer string contains all the characters in order, so the number of deletion needed is just len(longer) – len(shorter).

Assuming the size of the dictionary is M and length of the given word is N, the time complexity is O(MN) because for each word in the dictionary, we may need to iterate over the given word.

What if the dictionary is really large? Actually we can solve this problem from the other side – traverse all the possible words generated from deletion of the given word.

So for the given word, we try to delete each of the characters and check if the new word exists in the dictionary. **Since we need to quickly check the existence of a work in dictionary, we need to load the dictionary into a hash set.**

So the time complexity for pre-processing is O(M) (traverse the whole dictionary once) and for the rest of the algorithm is O(2^N) because we need to get all the possible words generated from the given word. It’s also worth to note that once the dictionary is loaded, we don’t need to do the pre-processing again and that’s why sometimes we can ignore the time spent here.

So which solution is better? It depends on the size of the dictionary and length of the given word.

To sum up some techniques in this question:

- You should be aware of common data structures for dictionary and pros and cons of each of them.
- Given that the size of the dictionary is fixed, it’s not a bad idea to just iterate over it.
- Having two indices to traverse/compare two string/arrays is quite common. For example, we use the same approach to merging two sorted arrays.

You may notice that it’s not easy to write the code for “traverse the word” solution. So please try to finish the code for this part.

The post Minimum Number of Deletions Of a String appeared first on Gainlo Mock Interview Blog.

]]>