2014-02-08

So many SLOGs. Random samples and mass searchs.


Part of keeping a proper course log is reading and referencing other students' logs. To that end I browsed a random sample of the SLOGs listed on the course side. I found some nice looking and pokéfied ones and an interesting post about separation of concerns, a concept I didn't pick up on in class. But, what about when I'm writing on a specific topic and want to find out what others have said about it?

Searching slogs with help from Python.

Most search engines have me covered when it comes to checking if a certain person has written about a specific topic. For example, to find out if I or a given other student have written about OOP I can search something like:
object oriented site:cscij.blogspot.ca OR site:another_slog.blog_site.something
But, with hundreds of SLOGs, typing the addresses out in the proper format by hand would take far too long. My approach to this problem was to copy all the SLOG URLs from the course page into a text file and write a short Python script to reformat them for use in a search engine.

multi_site_search.py

input_file_name = input('Enter name of .txt file with sites to search: ') input_file = open(input_file_name) output_text = [] output_file = open('multi_search_output.txt', 'w') for line in input_file.readlines(): line = line.strip('/ \n') if line.startswith('https://') or line.startswith('http://'): line = line[line.find('/') + 2:] if line.startswith('www.'): line = line[4:] output_text.append(line) output_file.write('site:' + ' OR site:'.join(output_text))

An imperfect solution.

Unfortunately, all the search engines I tried have a relatively short limit on the length of search string they will accept or freeze when given the full SLOGs list. Dividing the list into 25 pieces allowed me to search them all in a fairly reasonable amount of time but needing to do this lessened my sense of accomplishment considerably. Perhaps, when I've learned some more about using Python directly on the web, I can automate those 25 searches further and present the integrated results.

Another approach.

Interestingly, while testing out these searches, I came across another student using Python to manipulate the SLOG list. You should check out David's script to read the addresses directly from the site and randomly select some number of them to visit here.

Image CC by See-ming Lee.

2 comments: