# Minimum Number of Deletions Of a String

Another string question in our coding interview questions collection. It seems that string is getting really popular and many companies like Google, Facebook are asking about it in recent interviews.

After a second thought, this makes sense in fact. String is a quite flexible data structure and many concepts can be covered from a string problem like hash, memory and so on so forth. In addition, it’s also a data structure you’re gonna use almost every day. That’s why many string interview questions are quite relevant to real world projects.

In this post, we’re going to talk about topics including string manipulation, dictionary, time complexity etc. and in the end, I’ll summarize several commonly used techniques as before.

## Question

Given a dictionary and a word, find the minimum number of deletions needed on the word in order to make it a valid word.

For example, string “catn” needs one deletion to make it a valid word “cat” in the dictionary. And string “bcatn” needs two deletions.

Dictionary has always been an interesting topic in string interview problems, which is part of the reason I’d like to cover this here. Also, this question was asked by Google recently.

## Dictionary

Given that dictionary is so common in coding interview questions that I’d like to briefly summarize few strategies/techniques here.

• To store a dictionary, usually people will use data structures including Hash set, Trie or maybe just array. You’d better understand pros and cons of each of them.
• You may choose to have a pre-processing step to read the whole dictionary and store into your preferred data structure. Since once it’s loaded, you can use it as many times as you want.
• If the dictionary is not too large, you may take the dictionary traverse time as a constant.

## Traverse dictionary

Coming back to this problem, if we assume the dictionary can be traversed quickly (not too many entries), one approach is to go through each word in the dictionary, calculate the number deletions required, and return the minimum one.

To calculate the number of deletions efficiently, we’ll use the common technique here. One fact is that if a longer string can be transformed to a shorter one by deleting characters, the longer string must contain all the characters of the smaller one in order. If you have noticed this fact, then you should know that we only need to traverse the two strings once in order to get the deletion number.

More specifically, we put two indices (L for the longer string, S for the shorter string) pointing to beginning of each string. If the two characters under the indices are different, move L forward by one character. If the two characters are same, move both forward. If S comes to the end, it means the longer string contains all the characters in order, so the number of deletion needed is just len(longer) – len(shorter).

Assuming the size of the dictionary is M and length of the given word is N, the time complexity is O(MN) because for each word in the dictionary, we may need to iterate over the given word.

## Traverse the word

What if the dictionary is really large? Actually we can solve this problem from the other side – traverse all the possible words generated from deletion of the given word.

So for the given word, we try to delete each of the characters and check if the new word exists in the dictionary. Since we need to quickly check the existence of a work in dictionary, we need to load the dictionary into a hash set.

So the time complexity for pre-processing is O(M) (traverse the whole dictionary once) and for the rest of the algorithm is O(2^N) because we need to get all the possible words generated from the given word. It’s also worth to note that once the dictionary is loaded, we don’t need to do the pre-processing again and that’s why sometimes we can ignore the time spent here.

So which solution is better? It depends on the size of the dictionary and length of the given word.

## Takeaways

To sum up some techniques in this question:

• You should be aware of common data structures for dictionary and pros and cons of each of them.
• Given that the size of the dictionary is fixed, it’s not a bad idea to just iterate over it.
• Having two indices to traverse/compare two string/arrays is quite common. For example, we use the same approach to merging two sorted arrays.

You may notice that it’s not easy to write the code for “traverse the word” solution. So please try to finish the code for this part.

The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..

000

## 9 thoughts on “Minimum Number of Deletions Of a String”

1. Luca says:

Hi, nice post! Aren’t the possible words generated 2^N?

1. Jake says:

Good catch. Fixed the typo.

1. Abhishek Ghosh says:

Well if there are N characters, won’t the possible words be of the order of N!?

1. Prateek Jassal says:

You can only delete characters from the string not change their order. So it can’t be N!

1. max says:

Isn’t possible words generated gives us N^2 (N squared) not 2^N? Because by removing 1 letter we’re getting N + (N – 1) + (N – 1) … = N * (N – 1) / 2 which is N^2.

2. There’s an O(N) [~3N] solution for traversing the word. Consider the following. I did this code off the top of my head like in an interview, so it’s untested and possibly buggy. In an interview that doesn’t matter; only the concept matters.

class SpellCheck
{
private \$dictionary = [];

// Initialize \$dictionary as a mapping. [“word” => 1]

public function getMinDeletes(\$misspelled)
{
\$dictWord = \$this->getWord(\$misspelled);
return strlen(\$misspelled) – strlen(\$dictWord);
}

private function getWord(\$word)
{
if (\$start >= \$end) {
return ”;
}

if (array_key_exists(\$word, \$this->dictionary)) {
return \$word;
}

\$leftWord = \$this->getWord(substr(\$word, 0, strlen(\$word)-1));
if (\$leftWord !== ”) {
return \$leftWord;
}

return \$this->getWord(substr(\$word, 1, strlen(\$word)));
}
}

1. Tay Yang Shun says:

I think you are assuming that the deletions can only occur at the start or at the end of the word, which is not necessarily the case. It can occur in the middle: “cabt” -> “cat”

I just wrote below solution.
Store dictionary as unordered_map, i.e. a hashtable as dict key, dict value pair.
For each key in dictionary perform c strstr operation, i.e. find if the smaller string exists in the larger one.
If it does find the different in length. Maintain a global minimum of these differences. Return the global minimum.
strstr basically uses similar logic to find existence of string within another as logic mentioned in this description. Before reading the blog wrote it so though of mentioning here for a short solution.

int mindeletions(unordered_map dict, string s)
{
int mindelta = INT_MAX;
bool flag = false;
for(auto it= dict.begin(); it != dict.end(); it++){

string temp = it->first;
if(strstr((char*)s.c_str(), (char *)temp.c_str())){

flag = true;
int diff = (int)s.length() – (int)temp.length();
if(diff < mindelta)
mindelta = diff;
}
}
if(flag)
return mindelta;
else
return -1;
}

4. bodla says:

Solution written in C#:

public int FindMinimumDeletionsFromAString(HashSet dictionary, string str)
{
// Edge Case
if (str.Length > 0 && dictionary.Count == 0)
throw new ArgumentException(“No words found in the dictionary”, nameof(dictionary));
else if (str.Length == 0 || (dictionary.Count > 0 && dictionary.Contains(str)))
return 0;

string longest = str.ToLower();
int minimumDeletions = str.Length;

foreach (string word in dictionary)
{
string shortest = word.ToLower();

// Only iterate through dictionary word if it’s length is <= length of input string.
if (shortest.Length <= longest.Length)
{
int longestStringIndex = 0; // Index of input string.
int shortestStringIndex = 0; // Index of dictionary word.

while(shortestStringIndex < word.Length)
{
if (shortest[shortestStringIndex] == longest[longestStringIndex])
{
shortestStringIndex++;
longestStringIndex++;
}
else if(shortest[shortestStringIndex] != longest[longestStringIndex])
{
longestStringIndex++;
}

// If the dictionary word index reached the end, calculate the characters in the original string to be deleted.
if (shortestStringIndex == shortest.Length – 1)
{
minimumDeletions = Math.Min(minimumDeletions, (longest.Length – shortest.Length));
}
}
}
}
return minimumDeletions;
}