This week, we’ll talk about question – longest substring without repeating characters.
If you are following our blog, you might notice that we start to provide coding solutions at the end of each question. At Gainlo, many people have the feeling that questions asked by Google/Facebook are much easier than they expected, but they still didn’t make it. The most common reason is that they failed to provide solid code in a short time.
The surprising truth is that 90% of questions asked in interviews are basic, but the expectation is high – candidates should be able to write bug-free code quickly. And this is exactly what our blog posts are focused on.
In this post, we’re going to talk about how to analysis this popular question and more importantly, tips and tricks on writing bug-free code.
Given a string, find the longest substring without repeating characters.
For example, for string “abccdefgh”, the longest substring is “cdefgh”.
This question has been asked by Facebook recently. In addition, the question is not too hard to solve, but providing solid code in a short time requires some work. And this is the most common type of question in interviews.
As you may already know, it’s recommended to come up with the simplest solution first and then you can keep optimizing. Obviously, we can go over all the substrings and find the longest one without repeating characters. This will give us O(N^4) time complexity (O(N^2) to get all substrings, O(N^2) to check if each substring has dups) and O(1) space complexity. Although it’s clear that the solution is too slow, we should still mention that to the interviewer.
Despite the fact that the solution is naive, it makes us clear that the next step is to make it faster. And the most common approach to speeding up your program is to use more memory. In other words, we can sacrifice the space complexity to improve the time complexity.
If you are able to think in this way, things become much clearer. A hashset is the most common data structure to detect dups and with a hashset, we can check if a substring has dups in O(N) time.
Now, the bottleneck is at iterating over all substrings, which has O(N^2) complexity. How can we make it faster?
In fact, there are lots of unnecessary checks. If we already know a substring S has duplicate characters, there’s no need to check any substring that contains S. If you have read our previous post – If a String Contains an Anagram of Another String, you should realize how similar these two questions are.
Take string “abccdefgh” as an example. When we realize that substring “abcc” has dups, we can immediately abandon the whole substring and check substrings from “cdefgh”. Thus, we have the following approach:
- Keep two pointers (start, end) that denotes the current substring we’re checking and both has an initial value of 0.
- Each time, move forward end pointer by 1 character and use the hashset to check if the newly added character has already existed.
- If not, keep moving end pointer forward.
- If the new character is a duplicate one, move the start pointer all the way to pass the duplicate character and pop out all these characters from the hashset.
- Repeating step 2 and output the longest substring without dup after finishing the whole string.
In essence, as we know processing all substrings are unnecessary, sliding window can be a great technique to improve that and the same technique has been used in many problems like Subarray With Given Sum.
The final solution has O(N) time complexity and O(N) space complexity.
If you haven’t tried to write the code please do so. Once you finished, try the following three test cases:
Here’s the solution:
If you write the code to solve it, you may notice that it’s not easy to make it bug-free easily. You may need to address several edge cases to make it work. That’s exactly the point of this question. Many people are able to come up with the solution, but very few of them can write the perfect solution, especially with a time constraint.
Here I’d like to share a couple tips in terms of the code above:
- It’s important to validate the input in the beginning. This should be the first thing when you write the solution without even thinking. If you don’t know what to do with invalid input, ask the interviewer.
- There are several edge cases you need to consider. To begin with, when you use start and end pointer to denote a substring, make it clear whether it’s an open or close interval, which means if the substring is [a, b) or [a, b]. In our code, we make it [a, b), but you can use either so long as it’s consistent across the code.
- When a duplicate character is found, you should be very careful about the logic after “while start < end – 1”. Although there are only 7 lines of code, it’s quite easy to make mistakes. What we are doing here is to pop all the characters from start until we find the duplicate character, in which case we won’t pop out the dup as it’s not added.
- It’s hard for everyone to make it bug-free for the first time, but you should examine with 2-3 test cases.
I hope the biggest takeaway from the post is to care more about the coding part if you are used to solving problem in your mind. Most of the interview questions are not too hard to think, but very few people can get the code right. That’s why most people found the question simpler than expected but still failed in the end.
By the way, if you want to have more guidance from experienced interviewers, you can check Gainlo that allows you to have mock interview with engineers from Google, Facebook etc..