Another string question in our coding interview questions collection. It seems that string is getting really popular and many companies like Google, Facebook are asking about it in recent interviews.
After a second thought, this makes sense in fact. String is a quite flexible data structure and many concepts can be covered from a string problem like hash, memory and so on so forth. In addition, it’s also a data structure you’re gonna use almost every day. That’s why many string interview questions are quite relevant to real world projects.
In this post, we’re going to talk about topics including string manipulation, dictionary, time complexity etc. and in the end, I’ll summarize several commonly used techniques as before.
Given a dictionary and a word, find the minimum number of deletions needed on the word in order to make it a valid word.
For example, string “catn” needs one deletion to make it a valid word “cat” in the dictionary. And string “bcatn” needs two deletions.
Dictionary has always been an interesting topic in string interview problems, which is part of the reason I’d like to cover this here. Also, this question was asked by Google recently.
Given that dictionary is so common in coding interview questions that I’d like to briefly summarize few strategies/techniques here.
- To store a dictionary, usually people will use data structures including Hash set, Trie or maybe just array. You’d better understand pros and cons of each of them.
- You may choose to have a pre-processing step to read the whole dictionary and store into your preferred data structure. Since once it’s loaded, you can use it as many times as you want.
- If the dictionary is not too large, you may take the dictionary traverse time as a constant.
Coming back to this problem, if we assume the dictionary can be traversed quickly (not too many entries), one approach is to go through each word in the dictionary, calculate the number deletions required, and return the minimum one.
To calculate the number of deletions efficiently, we’ll use the common technique here. One fact is that if a longer string can be transformed to a shorter one by deleting characters, the longer string must contain all the characters of the smaller one in order. If you have noticed this fact, then you should know that we only need to traverse the two strings once in order to get the deletion number.
More specifically, we put two indices (L for the longer string, S for the shorter string) pointing to beginning of each string. If the two characters under the indices are different, move L forward by one character. If the two characters are same, move both forward. If S comes to the end, it means the longer string contains all the characters in order, so the number of deletion needed is just len(longer) – len(shorter).
Assuming the size of the dictionary is M and length of the given word is N, the time complexity is O(MN) because for each word in the dictionary, we may need to iterate over the given word.
Traverse the word
What if the dictionary is really large? Actually we can solve this problem from the other side – traverse all the possible words generated from deletion of the given word.
So for the given word, we try to delete each of the characters and check if the new word exists in the dictionary. Since we need to quickly check the existence of a work in dictionary, we need to load the dictionary into a hash set.
So the time complexity for pre-processing is O(M) (traverse the whole dictionary once) and for the rest of the algorithm is O(2^N) because we need to get all the possible words generated from the given word. It’s also worth to note that once the dictionary is loaded, we don’t need to do the pre-processing again and that’s why sometimes we can ignore the time spent here.
So which solution is better? It depends on the size of the dictionary and length of the given word.
To sum up some techniques in this question:
- You should be aware of common data structures for dictionary and pros and cons of each of them.
- Given that the size of the dictionary is fixed, it’s not a bad idea to just iterate over it.
- Having two indices to traverse/compare two string/arrays is quite common. For example, we use the same approach to merging two sorted arrays.
You may notice that it’s not easy to write the code for “traverse the word” solution. So please try to finish the code for this part.