If it’s your first time to see this series of posts, I’d like to emphasize again that the whole point of the article is not providing something like standard answers, but tell you how to analyze a coding interview question step by step. At the end of each post, I’ll summarize some common techniques that can be reused in other questions.
By reading this post, you can have a clear idea about how I would approach this problem from the beginning. I believe that solutions don’t come from nowhere and I would like to uncover all these hidden patterns that can help you come up with the right approach.
Find the longest substring with K unique characters.
Take string “aabbccdd” as an example.
if K is 1, the longest substring can be “aa”.
If K is 2, the longest substring can be “aabb”.
If K is 3, the longest substring can be “aabbcc”.
This question was asked by Facebook a month ago. It seems that string related problems are quite popular recently among companies like Google, Amazon etc..
If you want to come up with a naive solution, brute force should be definitely the number one thing in your head. In this example, it should be extremely straightforward for you to get this approach – Iterate all the substrings and then check if any of them contains K unique characters. After the loop, you can return the longest one.
Let’s say the string has length N, the substring iteration has time complexity of O(N^2). To check if substring contains K unique characters, you have to go over the substring once, which is O(N). Therefore, the overall time complexity would be O(N^3), which is quite slow as expected.
Time complexity analysis
Let’s talk about some high-level strategies first. If we want to make it faster than O(N^3), it should be something like O(N^2), O(NlogN), O(N) or even O(1).
Since we have to iterate the string at least once, the lower bound should be O(N). So O(1) or O(logN) can be ignored. Also, the problem seems unrelated to sorting or search (if you sort the array, the substring order is not kept), so it’s unlikely to have logN. Thus O(NlogN) is out. As a result, we can focus on O(N^2) and O(N).
If you have read our previous post – If a String Contains an Anagram of Another String, you should realize how similar these two questions are.
Similarly, you can take use of sliding window. Use a start pointer and an end pointer to represent a substring in a sliding window, if the current substring has K unique characters or less, it means the string can potentially be longer and you just need to move forward the end pointer by one character.
If after moving forward the end pointer the current substring exceeds K unique characters, it indicates that the previous substring is a potential candidate that has K unique characters. In this case, you just need to move forward the start pointer until the current substring has K or less unique characters.
Remember that the reason sliding window works here (and other places as well) is that you never need to move the end pointer back, which makes the time complexity O(N).
Along the way, you can use a hash map to keep track of the character frequency of each substring and a global counter to track current number of unique characters. When moving the sliding window, adjust the hash map accordingly and once new key is inserted or an old one gets cleared, increase/decrease the global counter.
It’s worth to summarize several techniques used in this coding interview question. Also, i highly recommend you compare this question with If a String Contains an Anagram of Another String.
- To come up with a naive solution, brute force can be the first thing in you head. It works a starting point and you can keep optimizing based on that.
- You can get a lower bound of the time complexity, which sometimes can give you ideas of what to optimize. In this question, apparently iteration over all substrings is the bottleneck.
- We use sliding window once again. If the end pointer don’t need to go back, this can make the iteration time to O(N).
- Remember that the extra hash map only need 256 slots (assuming all characters are ASCII), so the space complexity is still O(1).