Recursion
This problem works fits well with a recursive way of thinking:
does the phrase match a single word? If so, that we’re done
otherwise, find all the words that start that phrase
and then recursively do the same thing for the remainder of the phrase
def parse(phrase, vocab):
"""Break a string into words.
Input:
- string of words without space breaks
- vocabulary (set of allowed words)
Output:
set of all possible ways to break this down, given a vocab
"""
# START SOLUTION
possible_breaks = set()
# idea: find all words that match beginning of string
# and recursively parse the remainder of string
for word in vocab:
if phrase == word:
# base case: no parsing required; add word to result set
possible_breaks.add(word)
elif phrase.startswith(word):
# general case: word matches beginning of string
rest = phrase[len(word):] # rest of string
# Use this word and recurse to solve the rest of it
word_and_rest = {word + ' ' + parsed
for parsed in parse(rest, vocab)}
# finished answer to list of answers
possible_breaks.update(word_and_rest)
return possible_breaks
This solution works well — but we can make it more efficient.
For our sample data, consider cases like this:
i a ten oodles for dinner to night
i a ten oodles for dinner tonight
i ate noodles for dinner to night
i ate noodles for dinner tonight
There are two possible ways to complete this phrase once we start at the word “dinner”. However, we can reach this case of “dinner is starting word” in two ways (either “i ate noodles for” and “i a ten oodles for”. Regardless of the two starts that lead here, we should be able to re-use the rest of the phrase, not have to re-calculate it each time.
We can do this with a technique called “memoization”, where you remember previously done work.
Here’s a version that adds memoization to our solution: we keep a dictionary of previously-done work and, if this function has been called before with the same values, we can simply return that without having to recompute the work.
def parse_memoized(phrase, vocab, cache=None):
"""Break a string into words.
Input:
- string of words without space breaks
- vocabulary (set of allowed words)
Output:
set of all possible ways to break this down, given a vocab
"""
# First time: let's make a shared dictionary for our cache
# of "rest-of-phrase": "computed solution"
if cache is None:
cache = {}
# Do the memoization: if we've already done this exact work,
# we can return it
if phrase in cache:
return cache[phrase]
# We can't use our cache, so go ahead with the normal work
possible_breaks = set()
# idea: find all words that match beginning of string
# and recursively parse the remainder of string
for word in vocab:
if phrase == word:
# base case: no parsing required; add word to result set
possible_breaks.add(word)
elif phrase.startswith(word):
# general case: word matches beginning of string
rest = phrase[len(word):] # rest of string
# Use this word and recurse to solve the rest of it
word_and_rest = {word + ' ' + parsed
for parsed in parse(rest, vocab)}
# finished answer to list of answers
possible_breaks.update(word_and_rest)
return possible_breaks
When you combine recursion with memoization, it is often called “Dynamic programming”.
Advanced Memoization
This is a pretty straightforward implementation of memoization — you could make this more general and clever by using much more advanced techniques, like those shown at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize