Hackbright Code Challenges

Word Break: Solution

Word Break: Solution

Problem

Word Break

Whiteboard

Harder

Challenge

Medium

Concepts

Recursion

Download

word-break-solution.zip


This problem works fits well with a recursive way of thinking:

  • does the phrase match a single word? If so, that we’re done

  • otherwise, find all the words that start that phrase

    • and then recursively do the same thing for the remainder of the phrase

word_break.py
def parse(phrase, vocab):
    """Break a string into words.

    Input:
        - string of words without space breaks
        - vocabulary (set of allowed words)

    Output:
        set of all possible ways to break this down, given a vocab
    """

    # START SOLUTION

    possible_breaks = set()

    # idea: find all words that match beginning of string
    # and recursively parse the remainder of string

    for word in vocab:
        if phrase == word:
            # base case: no parsing required; add word to result set
            possible_breaks.add(word)

        elif phrase.startswith(word):
            # general case: word matches beginning of string

            rest = phrase[len(word):]  # rest of string

            # Use this word and recurse to solve the rest of it
            word_and_rest = {word + ' ' + parsed
                             for parsed in parse(rest, vocab)}

            # finished answer to list of answers
            possible_breaks.update(word_and_rest)

    return possible_breaks

Advanced: Memoization

This solution works well — but we can make it more efficient.

For our sample data, consider cases like this:

i a ten oodles for dinner to night
i a ten oodles for dinner tonight
i ate noodles for dinner to night
i ate noodles for dinner tonight

There are two possible ways to complete this phrase once we start at the word “dinner”. However, we can reach this case of “dinner is starting word” in two ways (either “i ate noodles for” and “i a ten oodles for”. Regardless of the two starts that lead here, we should be able to re-use the rest of the phrase, not have to re-calculate it each time.

We can do this with a technique called “memoization”, where you remember previously done work.

Here’s a version that adds memoization to our solution: we keep a dictionary of previously-done work and, if this function has been called before with the same values, we can simply return that without having to recompute the work.

word_break.py
def parse_memoized(phrase, vocab, cache=None):
    """Break a string into words.

    Input:
        - string of words without space breaks
        - vocabulary (set of allowed words)

    Output:
        set of all possible ways to break this down, given a vocab
    """

    # First time: let's make a shared dictionary for our cache
    # of "rest-of-phrase": "computed solution"

    if cache is None:
        cache = {}

    # Do the memoization: if we've already done this exact work,
    # we can return it

    if phrase in cache:
        return cache[phrase]

    # We can't use our cache, so go ahead with the normal work

    possible_breaks = set()

    # idea: find all words that match beginning of string
    # and recursively parse the remainder of string

    for word in vocab:
        if phrase == word:
            # base case: no parsing required; add word to result set
            possible_breaks.add(word)

        elif phrase.startswith(word):
            # general case: word matches beginning of string

            rest = phrase[len(word):]  # rest of string

            # Use this word and recurse to solve the rest of it
            word_and_rest = {word + ' ' + parsed
                             for parsed in parse(rest, vocab)}

            # finished answer to list of answers
            possible_breaks.update(word_and_rest)

    return possible_breaks

When you combine recursion with memoization, it is often called “Dynamic programming”.

Advanced Memoization

This is a pretty straightforward implementation of memoization — you could make this more general and clever by using much more advanced techniques, like those shown at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize