2016-02-11-roman-drits-barnimages-003_small

Facebook Interview – Longest Substring Without Repeating Characters

This week, we’ll talk about question – longest substring without repeating characters.

If you are following our blog, you might notice that we start to provide coding solutions at the end of each question. At Gainlo, many people have the feeling that questions asked by Google/Facebook are much easier than they expected, but they still didn’t make it. The most common reason is that they failed to provide solid code in a short time.

The surprising truth is that 90% of questions asked in interviews are basic, but the expectation is high – candidates should be able to write bug-free code quickly. And this is exactly what our blog posts are focused on.

In this post, we’re going to talk about how to analysis this popular question and more importantly, tips and tricks on writing bug-free code.

 

Question

Given a string, find the longest substring without repeating characters.

For example, for string “abccdefgh”, the longest substring is “cdefgh”.

This question has been asked by Facebook recently. In addition, the question is not too hard to solve, but providing solid code in a short time requires some work. And this is the most common type of question in interviews.

 

Analysis

As you may already know, it’s recommended to come up with the simplest solution first and then you can keep optimizing. Obviously, we can go over all the substrings and find the longest one without repeating characters. This will give us O(N^4) time complexity (O(N^2) to get all substrings, O(N^2) to check if each substring has dups) and O(1) space complexity. Although it’s clear that the solution is too slow, we should still mention that to the interviewer.

Despite the fact that the solution is naive, it makes us clear that the next step is to make it faster. And the most common approach to speeding up your program is to use more memory. In other words, we can sacrifice the space complexity to improve the time complexity.

If you are able to think in this way, things become much clearer. A hashset is the most common data structure to detect dups and with a hashset, we can check if a substring has dups in O(N) time.

 

Sliding window

Now, the bottleneck is at iterating over all substrings, which has O(N^2) complexity. How can we make it faster?

In fact, there are lots of unnecessary checks. If we already know a substring S has duplicate characters, there’s no need to check any substring that contains S. If you have read our previous post – If a String Contains an Anagram of Another String, you should realize how similar these two questions are.

Take string “abccdefgh” as an example. When we realize that substring “abcc” has dups, we can immediately abandon the whole substring and check substrings from “cdefgh”. Thus, we have the following approach:

  1. Keep two pointers (start, end) that denotes the current substring we’re checking and both has an initial value of 0.
  2. Each time, move forward end pointer by 1 character and use the hashset to check if the newly added character has already existed.
    1. If not, keep moving end pointer forward.
    2. If the new character is a duplicate one, move the start pointer all the way to pass the duplicate character and pop out all these characters from the hashset.
  3. Repeating step 2 and output the longest substring without dup after finishing the whole string.

In essence, as we know processing all substrings are unnecessary, sliding window can be a great technique to improve that and the same technique has been used in many problems like Subarray With Given Sum.

The final solution has O(N) time complexity and O(N) space complexity.

 

Coding

If you haven’t tried to write the code please do so. Once you finished, try the following three test cases:

  • “abcadbef”
  • “abac”
  • “aaaaa”

Here’s the solution:

If you write the code to solve it, you may notice that it’s not easy to make it bug-free easily. You may need to address several edge cases to make it work. That’s exactly the point of this question. Many people are able to come up with the solution, but very few of them can write the perfect solution, especially with a time constraint.

Here I’d like to share a couple tips in terms of the code above:

  1. It’s important to validate the input in the beginning. This should be the first thing when you write the solution without even thinking. If you don’t know what to do with invalid input, ask the interviewer.
  2. There are several edge cases you need to consider. To begin with, when you use start and end pointer to denote a substring, make it clear whether it’s an open or close interval, which means if the substring is [a, b) or [a, b]. In our code, we make it [a, b), but you can use either so long as it’s consistent across the code.
  3. When a duplicate character is found, you should be very careful about the logic after “while start < end – 1”. Although there are only 7 lines of code, it’s quite easy to make mistakes. What we are doing here is to pop all the characters from start until we find the duplicate character, in which case we won’t pop out the dup as it’s not added.
  4. It’s hard for everyone to make it bug-free for the first time, but you should examine with 2-3 test cases.

 

Summary

I hope the biggest takeaway from the post is to care more about the coding part if you are used to solving problem in your mind. Most of the interview questions are not too hard to think, but very few people can get the code right. That’s why most people found the question simpler than expected but still failed in the end.

By the way, if you want to have more guidance from experienced interviewers, you can check Gainlo that allows you to have mock interview with engineers from Google, Facebook etc..

The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..

I'd like to learn more

Share on Facebook1Tweet about this on TwitterShare on LinkedIn0Share on Reddit0

13 thoughts on “Facebook Interview – Longest Substring Without Repeating Characters

  1. Your solution looks for duplicate characters, not repeated ones.

    e.g. your example string “abcadbef” the longest substring without duplicates is “dbef” but the longest substring without repeating characters is “abcadbef”.

      1. Hey Jake,

        Great article! It was very helpful.

        While writing my own solution, I came across a specific case that is very similar to the one mentioned by Pete, and he is right. However, since you clarified that this solution looks for the longest non-duplicate rather than non-repeating substring, the set becomes no longer necessary, right? All we need to do is to compare the current character in the iteration to the previous one. 🙂

  2. The space complexity isn’t O(N) (N being the length of the input), it is O(C) where C is the length of the alphabet (valid set of chars in the input string). In general it is considered O(1), but I guess that it is something to discuss with the interviewer.

    1. Hi Matias,

      I don’t think so. Suppose the input string is super long, you still need to traverse the whole string at least once in order to get the longest substring. So it’s definitely not O(C).

      1. I agree, but I was talking about the _space_ complexity. Going through the whole string adds up to the time complexity.

        Also, though we may want to store the complete input into memory (that would be O(N)) it is not strictly needed (since we are processing only one char at the time and not going backwards or randomly jumping around).

        However, I’ve missed that the space required to store the result is O(N) (when I first read the article, I thought the result would be the pair of indexes).

  3. Hi,

    I think this solution might miss a couple of cases, please correct me if I am wrong. Let’s say the input is ‘abcad’, according to the solution given, when the second ‘a’ is encountered, all the characters added to the set till then are removed, which in this case would be b, c (a won’t be removed but start is still incremented), since start is now 3, and then 4 is the last index, it would return a substring of length 3,(abc) but shouldn’t the answer be ‘bcad’ of length 4 ?
    We had to ‘bc’ again to get the solution. Thanks in advance.

  4. See here i have given it a try.
    Can anyone would test my code give me feedback.
    import java.util.*;

    public class LongSubString{
    public static void main(String[] args) {

    String str = “aaaaa”;

    Set char_set = new LinkedHashSet();
    List<Set> list_char = new ArrayList();

    for (int i = 0; i < str.length() ; i++){
    boolean duplicate = char_set.add(str.charAt(i));
    //list_char.add(char_Set);
    System.out.println("Set:\t\t\n"+char_set);
    if( !duplicate){
    list_char.add(char_set);
    char_set = new LinkedHashSet();
    char_set.add(str.charAt(i – 1));
    char_set.add(str.charAt(i));
    System.out.println(“new set:\t\t\n”+char_set);
    }

    }

    list_char.add(char_set);
    System.out.println(“List contains:”+list_char);
    Set sub_string = new LinkedHashSet();

    for(Set subString : list_char){
    if(subString.size() > sub_string.size()){
    sub_string = subString;
    }
    }
    System.out.println(“Long Sub string is:\t”+sub_string);
    }
    }

  5. Don’t you need an else statement if set contains the curr_char?

    // translated yours to Java and added the else statement for it to work
    public static String longestSubstring(String input){
    if(input == null)
    return “”;

    String longest = “”;
    HashSet set = new HashSet();
    int start = 0;
    int end = 0;
    int inputLen = input.length();

    while(end longest.length()){
    longest = input.substring(start, end + 1);
    // System.out.println(longest);
    }

    }

    else{
    while(start < end){
    if(input.charAt(start) != curr){
    set.remove(input.charAt(start));
    start++;
    }
    else{
    start++;
    break;
    }
    }
    }
    end++;
    }
    return longest;
    }

  6. Hi Jake,
    I have written the code in Java. Wish to seek your thoughts –

    public static String longestSubstring(String s) {
    if(s.isEmpty() || s==null)
    return “”;

    List result= new ArrayList();
    char[] a = s.toCharArray();
    Set buffer = new HashSet();
    StringBuilder temp=new StringBuilder();

    for(int i = 0 ; i < a.length ; i++) {
    if(!buffer.contains(""+ a[i])){
    buffer.add(""+ a[i]);
    temp.append(a[i]);
    } else {
    result.add(temp.toString());
    buffer.clear();
    temp.setLength(0);
    buffer.add(""+ a[i]);
    temp.append(a[i]);
    }
    }
    result.add(temp.toString());

    if(result.size() == 1) {
    return result.get(0);
    }

    String subStr = result.stream()
    .max(Comparator.comparing(String::length))
    .get();

    return subStr;
    }

  7. The code misses a corner case. Suppose string is “qpabcadef”. Your code will get output of “bcade”, it should be “bcadef”. Here is my fix to your code:

    //////
    from sets import Set

    def longest_substring(s):
    if not s:
    return None

    longest = ”
    start = 0
    end = 0
    char_set = Set()

    while end < len(s):
    end = end + 1
    cur_char = s[end – 1]
    print cur_char
    print char_set

    if cur_char not in char_set:
    char_set.add(cur_char)
    continue
    else:
    sub = s[start:end – 1]
    print "longest =" + longest
    print "sub=" + sub
    if len(longest) < len(sub):
    longest = sub

    while start len(longest):
    longest = s[start:end]

    return longest

    if __name__ == “__main__”:
    print longest_substring(“qpabcadef”)

  8. Here is a little more intuitive python implementation.

    ”’
    Evaluate each char from the start.
    Maintain two strings MAX_SO_FAR and CUR. If current char is not in CUR then
    it can be extended. Also, check and update CUR to MAX_SO_FAR if CUR length
    is greater. If current char is in CUR then start a new CUR with the char appended
    to the CUR[prev occurence of char + 1: end].
    ”’

    def longest_substring_no_repeat_char (s):

    max_so_far = “”
    cur = “”
    for c in s:
    if c not in cur:
    cur = cur + c
    if len(cur) > len(max_so_far):
    max_so_far = cur
    else:
    cur = cur[cur.index(c)+1:] + c

    print max_so_far

Leave a Reply

Your email address will not be published. Required fields are marked *