2014-02-16

Memoization for lazier functions.


What is memoization and why is it useful?

While working on the first major assignment for CSC148 I found myself with a recursive algorithm that successfully found the value I needed but took an unacceptably long time when given anything but very small inputs. I asked about the problem on the course forum and instructor Dustin Wehr suggested a I look up a technique called memoization.

The basic idea of memoization is to have a function that has predictable inputs and expensive computation remember things it has already computed and use these remembered results instead of recalculating whenever possible.

Simple Python memoization using a mutable default argument.

Undoubtedly I'm not the first to think of this, but if you only have a function or two you need to memoize you can take advantage of the way Python handles default arguments that are mutable data structures. Ever since I first found out that changes to mutable default arguments persist from one call on a function to the next it has bothered me. While it's still not the choice I would have made were it up to me, it's a lot less irritating now that I've found a way to make use of it in certain situations.

Since the Wikipedia entry on memoization uses a pseudocode factorial calculating function in its example, I'll use a Python factorial function in mine:

def factorial(n: 'non-negative int') -> int: '''Ruturn n factorial == n! == (n * n-1 * n-2 * n-3 * ... * 1)''' if n == 0: # By convention 0! = 1 return 1 else: return n * factorial(n - 1)

On a modern computer this function won't take all that long to run even up to the maximum n allowed by Python's default level of recursion limit. We'll just imagine we need to optimize it for a slow ancient machine. To do this we create an argument with a dictionary as its default value. We use this dictionary to store previously computed factorials and then check the dictionary before computing a factorial.

def factorial(n: 'non-negative int', known_factorials: dict={0: 1}) -> int: '''Ruturn n factorial == n! == (n * n-1 * n-2 * n-3 * ... * 1)''' if n in known_factorials: return known_factorials[n] else: return n * factorial(n - 1)

In this case, because we can start with the base case of 0! = 1 in the known_factorials dictionary, the memoized version doesn't even have more lines of code.

To memoize all things.

If you have more than one or two functions that need to be memoized, doing this to each of them would get wastefully repetitive. Instead, make a generalized memoizer to wrap around all these functions or use the least-recently-used (LRU) cache memoizer functools.lru_cache from the Python library.

Image made from a CC image by Marc Wisniak.

No comments:

Post a Comment