Computer Science Journal: February 2014

2014-02-24

On recursion

General recursion.

The English word recursion is derived from a Late Latin word meaning to run back. Something is recursive, in the general sense of the word, whenever a part of the thing is a copy of the entire thing. As with the curves in the image above, at each step of examining it, either more closely or more broadly, the same form is observed.

Recursive computing algorithms.

In computer programming, a recursive function is one which, in some cases, calls itself as part of its process. To be a useful function it is important that there be at least one case where it does not call itself and that each level of recursion (each call to itself) move closer to this case. If both of these conditions are not met the function may became stuck in a state of infinite recursion. The program will hang, never reaching the next step it is meant to accomplish.

The cases where recursion depth does not increase are refered to as the base cases. For example, in the following CSC148 example function for finding the sum of all numbers in a set of nested lists the base case is when the item being examined is a number rather than a list.

def sum_list(L: list) -> float: '''Return sum of the numbers in L. L - a possibly nested list of numbers. L, and all sublists, are non-empty >>> sum_list([1, 2, 3]) 6 >>> sum_list([1, [2, 3, [4]], 5]) 15 ''' return sum([sum_list(x) if isinstance(x, list) else x for x in L])

A draw back of recursion.

The main problem I am aware of with use of recursion in computer programming is memory usage. Each time a function calls itself a new frame has to be added to the stack containing new instances of every object used by the function. This can reduce performance of other programs or even crash the computer if allowed to continue for a very large number of levels of recursion. To prevent this Python, by default, only allows 1000 levels of recursion. This prevents the worst problems, but also places some limits on the usefulness of recursion.

Tail recursion.

The problem of recursion gobbling up memory can be fixed in some cases by using tail call optimization. A tail call recursion is when a function calls itself as the very last thing it does before returning. In this case, unlike a recursive call anywhere else in the function, any information in the stack frame that isn't being passed to the new instance of the function can be safely discarded. If a language supports tail recursion optimization it will identify these situations and clean up the completed function call before beginning the next one. This keeps the stack from growing at all, no matter how many times a tail recursive function call itself.

As interesting as this is, Python does not have tail recursion optimization and is very unlikely to support it any time soon. Working in Python, very deep recursion may need to be either rewritten as iteration or manually tail call optimized using helper functions.

The last visible dog.

Talking about recursion reminds me of the film The Mouse and His Child (based on the excellent and bizarre book of the same name). In this story some characters ascribe a mystical significance to the "last visible dog" in a recursive logo image printed on cans of a specific brand of dog food.

Image made from CC images by Elijah Porter and Kevin Dooley.

2014-02-16

Memoization for lazier functions.

What is memoization and why is it useful?

While working on the first major assignment for CSC148 I found myself with a recursive algorithm that successfully found the value I needed but took an unacceptably long time when given anything but very small inputs. I asked about the problem on the course forum and instructor Dustin Wehr suggested a I look up a technique called memoization.

The basic idea of memoization is to have a function that has predictable inputs and expensive computation remember things it has already computed and use these remembered results instead of recalculating whenever possible.

Simple Python memoization using a mutable default argument.

Undoubtedly I'm not the first to think of this, but if you only have a function or two you need to memoize you can take advantage of the way Python handles default arguments that are mutable data structures. Ever since I first found out that changes to mutable default arguments persist from one call on a function to the next it has bothered me. While it's still not the choice I would have made were it up to me, it's a lot less irritating now that I've found a way to make use of it in certain situations.

Since the Wikipedia entry on memoization uses a pseudocode factorial calculating function in its example, I'll use a Python factorial function in mine:

def factorial(n: 'non-negative int') -> int: '''Ruturn n factorial == n! == (n * n-1 * n-2 * n-3 * ... * 1)''' if n == 0: # By convention 0! = 1 return 1 else: return n * factorial(n - 1)

On a modern computer this function won't take all that long to run even up to the maximum n allowed by Python's default level of recursion limit. We'll just imagine we need to optimize it for a slow ancient machine. To do this we create an argument with a dictionary as its default value. We use this dictionary to store previously computed factorials and then check the dictionary before computing a factorial.

def factorial(n: 'non-negative int', known_factorials: dict={0: 1}) -> int: '''Ruturn n factorial == n! == (n * n-1 * n-2 * n-3 * ... * 1)''' if n in known_factorials: return known_factorials[n] else: return n * factorial(n - 1)

In this case, because we can start with the base case of 0! = 1 in the known_factorials dictionary, the memoized version doesn't even have more lines of code.

To memoize all things.

If you have more than one or two functions that need to be memoized, doing this to each of them would get wastefully repetitive. Instead, make a generalized memoizer to wrap around all these functions or use the least-recently-used (LRU) cache memoizer functools.lru_cache from the Python library.

Image made from a CC image by Marc Wisniak.

2014-02-08

So many SLOGs. Random samples and mass searchs.

Part of keeping a proper course log is reading and referencing other students' logs. To that end I browsed a random sample of the SLOGs listed on the course side. I found some nice looking and pokéfied ones and an interesting post about separation of concerns, a concept I didn't pick up on in class. But, what about when I'm writing on a specific topic and want to find out what others have said about it?

Searching slogs with help from Python.

Most search engines have me covered when it comes to checking if a certain person has written about a specific topic. For example, to find out if I or a given other student have written about OOP I can search something like:

object oriented site:cscij.blogspot.ca OR site:another_slog.blog_site.something

But, with hundreds of SLOGs, typing the addresses out in the proper format by hand would take far too long. My approach to this problem was to copy all the SLOG URLs from the course page into a text file and write a short Python script to reformat them for use in a search engine.

multi_site_search.py

input_file_name = input('Enter name of .txt file with sites to search: ') input_file = open(input_file_name) output_text = [] output_file = open('multi_search_output.txt', 'w') for line in input_file.readlines(): line = line.strip('/ \n') if line.startswith('https://') or line.startswith('http://'): line = line[line.find('/') + 2:] if line.startswith('www.'): line = line[4:] output_text.append(line) output_file.write('site:' + ' OR site:'.join(output_text))

An imperfect solution.

Unfortunately, all the search engines I tried have a relatively short limit on the length of search string they will accept or freeze when given the full SLOGs list. Dividing the list into 25 pieces allowed me to search them all in a fairly reasonable amount of time but needing to do this lessened my sense of accomplishment considerably. Perhaps, when I've learned some more about using Python directly on the web, I can automate those 25 searches further and present the integrated results.

Another approach.

Interestingly, while testing out these searches, I came across another student using Python to manipulate the SLOG list. You should check out David's script to read the addresses directly from the site and randomly select some number of them to visit here.

Image CC by See-ming Lee.