the answers

Week 3: The List (Part 2)

Finally, it's here: all the answers. Two hundred and ninety-seven of them in a tidy paperback. Thanks to the Village Copier I now have my book of answers, scraped from DuckDuckGo's autosuggest. For a prototype I'm very pleased. Next time I might increase the gutter a tad more (and somehow the last page ended up on the back cover), but overall it's very gratifying to hold. I'm especially fond of the answer pairings across the page spreads considering that I printed the list in order. 

the answers is a quarter of the size of my list, sometimes. It's too many pages to publish with the Village Copier or online at Blurb, Lulu, or even with Issuu (shown here) without breaking it into volumes. Of all the lists I scraped, I'm most moved by the range of human experience represented in these saved search submissions. Note for future Ellen: code your own site for an uninterrupted page-turning experience. 

The future is on stickers! Excited to explore an alternative physical list manifestation, I printed autocompletions from the phrase "the future is" onto 568 round, clear stickers to share with others and deposit around the city. 

Week 2: The List (Part 1)

We conducted our first scraping exercises this week. After reviewing some command line and Python basics, we installed pip, a Python package manager, and virtualenv to create isolated Python environments—useful if projects require different libraries and/or versions of Python.

For our assignment to generate a looong list from scraped web text, I thought about search engines as both oracles and confessionals. Initially, I hoped to scrape the headlines from the returns of search queries, specifically the results to “the answer is.” I mean, aren’t ALL the answers online? Seriously, where do you turn if you have a question? Your phone, a person, the card catalog? More importantly, what might you ask if you thought you were anonymous or perhaps didn't realize that your query was being logged for future publication by an algorithm or someone like me?

Scraping from Google’s search page proved a different animal from the comparatively straight forward examples in class with Craigslist and my experiments with the NYTimes and Reddit. With my many repeated attempts to solve the puzzle, it didn’t take long for Google to block my IP. Sam suggested I try Bing or DuckDuckGo, and in the process of exploring those options, we couldn’t help but notice the search engines’ autocomplete search suggestions for my query. Though I could not locate the specifics of DuckDuckGo's auto-suggest algorithm, Google's autocomplete predictions are "based on several factors, like how often others have searched for a term" and trending, popular topics. 

Screen Shot 2018-02-03 at 5.43.02 PM.png

With his help using the browser’s Developer Tools, we figured out that on DuckDuckGo this information was formatted as JSON. Fortunately, there’s a JSON parser built directly into Python (so no need to use the beautifulsoup library required for parsing HTML), and together we walked through writing the initial lines of code for scraping with this condition.

My program passes any phrase into the URL request parameters along with each letter from the alphabet, so “the answer is a…” followed by the “the answer is b…”. Each individual pass generates a list of auto-suggestions. After it exhausts all 26 letters, it then passes the phrase plus double letters to increase the number of possible returns. So again, “the answer is aa…” followed by “the answer is ab…” Once I figured out the code and created a working template, I could request results with any phrase I wished. SO MUCH FUN! Thank you, Sam!!

Here are my steps for this process:
1. Create a project directory
2. Within that directory, create a virtual environment and activate it
3. Install the requests library
4. Run my Python program and save the results to a new file:
    python > rawtext.txt
5. Sort the results, remove duplicate lines, and save to a new file:
    sort rawtext.txt | uniq > sorted_noduplicates.txt

The code for is on Github, along with my favorite and the most poignant auto-suggest lists so far:

the answer is…
why is my…
the future is…
sometimes i…