coding interview questions

If a String Contains an Anagram of Another String

From last week’s survey, most Gainlo users said that they want to read posts about how to come up with solutions to coding questions. So we decided to start talking about this topic from this week.

I think it’ll be much more helpful by analyzing popular coding interview questions as examples. I’d also like to highlight few points that make this article worth reading:

  • We manually select popular coding interview questions that are asked by big companies recently.
  • The whole point of the post is not providing something like standard answers. I want to tell you how to analyze and solve a coding question step by step.
  • In each post, I’ll summarize techniques that you may use in other problems.

Alright, so here we go.

Question

How to check if a string contains an anagram of another string?

This question was asked by Google few weeks ago and other companies as well (reported by Glassdoor). In case you don’t know what anagram is – An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once. For example, string “logain” is an anagram of “gainlo”.

To provide an example for this question, string “coding interview questions” contains an anagram of string “weivretni”, however not for “abcde”.

Start simple

If you have read our previous post – A Step-By-Step Guide to Solve Coding Problems, you should know that it’s recommended to start with a naive solution first and then optimize it step by step. Many people think that the naive solution is too trivial to mention, however, it can be used as a starting point and at the same time you can at least provide a solution.

A most straightforward solution is brute force. Suppose we want to check if string A contains an anagram of string B, we can get all the substrings of A and check each of them.

To check anagram, one way to do that is to compare the hash of two strings. More specifically, you can map each character to a different prime number and the hash of string is the multiples of all the prime numbers of each character. By doing this, you can check anagram in linear time.

If the length of the string is N and M, then the time complexity should be O(N^2 * M). Apparently, this is not ideal and you may tell the interview without even being asked.

Optimization

One way to think about optimization is to figure out at which point the algorithm is costly. In the above solution, there are two parts: get all substrings O(N^2) and check if it’s an anagram O(M). If we can make both faster, it’ll be great.

Obviously, if a string is an anagram of another, they should have the same length. In other words, we don’t need to check all substrings, but only substring of length M. This operation is of O(N) time.

To check if it’s an anagram, you should realize that you won’t be able to reduce it since you should at least iterate a string once. So there’s no point thinking about how to reduce O( M) to O(1). However, we can try to merge this iteration into the substring iteration process.

To sum up, we keep a sliding window of length M using two indices pointing to the start and end of the substring. The sliding window keeps moving forward. Each time it moves one character, we can re-calculate the hash of the current substring and check if it’s same as the hash of string B. By doing this, we can reduce the overall time complexity to O(N) without using extra memory.

Optimization – hash map

You should be able to know that O(N) is the minimal time we can have since you should at least iterate string A once. So don’t need to waste time thinking about how to make it O(logN).

Another way to check the anagram is to use a hash map whose key is the character and value is the frequency of the character in this substring. We first store the character frequency of B inside the hash map. When the sliding window moves forward by one step, update the hash map based on the current substring inside the window. Once the hash map gets zero for all keys, the current sliding window has the anagram substring.

Takeaways

This is the most important section. Again, the whole point of this article is not providing answers, but focused on analysis. Here are several techniques you can reuse in other problems:

  • When you need to compare things regardless of order (like anagram), you may consider hash, hash map.
  • Sliding window or keep two indices pointing to the start and end is a great way to iterate an array in linear time. This is also used in questions like “find 2 numbers in a sorted array that sum to M”.
  • You should be able to tell the bottleneck of an algorithm and how much it can be optimized. If you have to iterate an array at least once, you don’t need to think about O(logN) solution.

The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..

I'd like to learn more

Share on Facebook2Tweet about this on TwitterShare on LinkedIn8Share on Reddit0

15 thoughts on “If a String Contains an Anagram of Another String

  1. The Hash Map version of checking for substring would still need to check if the Hash Map has zero for all keys currently in the substring, which will take time O(M) for every window – so the overall time complexity in this case would be O(N*M). Whereas in the version where you compare the hash of the substring and the hash of the given string, you could use a rolling hash function to compute the hash which would then take just O(1) to compare if the two strings match – making the overall time complexity O(N). So, how did we optimize the solution using a Hash Map?

    1. Hi Tejas,

      Actually you don’t need to go thru the hash map every time. You can just keep a counter that counts the current number of keys in hash map. When moving the window, you need to keep updating the counter. When the counter gets to 0, it’s done.

      1. Could you provide an example of how the hash map solution with the counter would work (and how it’s O(N) and not O(NM))?

        1. Alright. I guess I should have made this clearer in the post.

          Initially the hash map stores the (character, frequency) of the string B and the counter is the length of B. For the string in “sliding window”, if the character is in the hash map, we decrease both the corresponding frequency and the counter. When moving the window forward, we may increase the counter and frequency since the leftmost character is gone. Keep moving the window forward until the counter is 0.

          If a character is not in the hash map, we should move the window until this character is outside.

          1. Do you make a copy of the hashmap and update the frequency in the copy hashmap? If you just update the original hashmap, how are you going to restore the frequency back to original every time you move the window forward?

          2. The hashmap initially stores (char, freq) for string B. When moving the window, you may increase/decrease the freq of the corresponding char. You don’t need to restore the frequency back to the original. As long as the hashmap becomes empty, you know the current string in the window is an anagram.

          3. Hi Jake,
            Instead of HashMap why can’t we use just boolean array of size 256 and flip to true for second string char’s based on the char ascii value as array index. Then start checking the first string by each char in it;s corresponding index in the array. If all indexes is set to true then its a anagram. By doing this we can avoid array re-sizing in HashMap and also no need to generate hash value. Do you see any issues with this approach? Below is the pseudo code.
            function bool IsAnagram(string string1, string string2)
            {
            bool[] array = new bool[256];
            foreach(char c in string2)
            array[c] = true;
            foreach(char c in string1)
            {
            if(!array[c])
            return false;
            }
            return true;
            }

  2. Isn’t the number of anagrams for a string of length N = N!
    Why are we considering substrings here? Can someone please clarify this?

    1. He means take all substrings of A, whose number is O(n**2) and then check each substring against B, since a substring is basically the original string from index i to j, where i < j < N.

      A super-naive solution would be to list all permutations of B, n! and check if it exists in A, but this is more expensive but a valid starting solution.

  3. Solution with counter O(N + M):

    boolean containsAnagram(String text, String anagram) {

    Map m = new HashMap();

    for (int i = 0; i < anagram.length(); i++) {
    m.put(anagram.charAt(i), m.getOrDefault(anagram.charAt(i), 0) + 1);
    }

    int currCount = anagram.length();

    for (int i = 0; i 0) {
    currCount–;
    }

    m.put(c, cnt – 1);
    }

    if (i >= anagram.length()) {
    char prev = text.charAt(i – anagram.length());
    if (m.containsKey(prev)) {
    int cnt = m.getOrDefault(prev, 0);
    if (cnt >= 0) {
    currCount++;
    }
    m.put(prev, cnt + 1);
    }
    }

    if (currCount == 0) {
    return true;
    }
    }

    return false;
    }

  4. There is famous technique to check anagram :
    Anagram splits your string in two substrings effectively and rotate / swap. Example : logain => lo + gain => gain + lo
    If we consider them in terms of say x and y , then => xy -> anagram = xy.
    Now the trick is , if we append the same input string to itself (xy + xy = xyxy => logain+logain = logainlogain)
    Notice yx is always substring of xyxy (gainlo is always substring of logainlogain)

    1. This is not a technique for checking anagram, this is for checking rotated string.

      abcd and dcba are anagrams but dcbadcba will not contain ‘abcd’ or the other way around.

  5. Another simple yet optimized solution could be to sort the two strings. Post sorting, both the strings should be same if they are anagrams of each other.
    Complexity – O(nlogn)

  6. We can use a simple XOR based solution to check if two strings are anagram.
    bool isanagram(string s1, string s2)
    {
    int retval = 0;
    for(int i = 0; i < s1.size(); i++)
    retval = retval ^ s1[i];
    for(int k = 0; k < s2.size(); k++)
    retval = retval ^ s2[k];
    if(retval == 0)
    return true;
    else
    return false;
    }

    For this particular question, we can split larger string to individual strings and then compare the split string with other string.
    This will give us O(NM) complexity. To bring it down to O(n), we can save the xor value of the smaller string (which is always same) and pass it on to the comparing function. Total complexity is O(n) I did not see this solution here, anything wrong with this that I cannot think of? Below is my complete code.

    bool isanagram(string s1, int s2Xor)
    {
    int retval = s2Xor;
    for(int i = 0; i < s1.size(); i++)
    retval = retval ^ s1[i];

    if(retval == 0)
    return true;
    else
    return false;

    }
    int main (int argc, char *argv[])
    {
    string s2 = "weivretni" ;
    string s1 = "coding interview questions";

    int s2Xor =0;
    for(int k = 0; k < s2.size(); k++)
    s2Xor = s2Xor ^ s2[k];

    vector v;
    string buf;
    stringstream ss(s1);

    while (ss >> buf)
    v.push_back(buf);

    bool retval = false;
    for(auto it= v.begin(); it!= v.end(); it++)
    {
    if((*it).length() == s2.length())
    retval = isanagram(*it, s2Xor);
    else continue;
    if(retval)
    { cout << "Anagram \n"; break;}
    }
    if(!retval)
    {cout << "Not Anagram \n";}
    }

Leave a Reply

Your email address will not be published. Required fields are marked *