Week 9: Vintage Fountains

Learning how to automate Twitter status updates this week inspired me to do so with images. Still a reserved social media user, I had an unused email account lying around that presented the perfect opportunity to practice my scraping skills and poke my head into the Twitterverse. 

To some the background story is familiar: in the spring of 1917, Marchel Duchamp anonymously submitted an artwork titled, Fountain, signed R. Mutt, to the inaugural exhibition of the Society of Independent Artists, of which he himself was a board member. According to show rules, all submissions would show, and all were, except for Fountain, which was deemed not art by the exhibition committee on the account of it being a urinal. This was not Duchamp’s first readymade but perhaps one of his more well known pieces and a hallmark of an emerging conceptual art movement.

My project tweets vintage urinals as R. Mutt at #fountain and #arthistory. The images are randomly selected from DuckDuckGo’s image search results. At first I found some success using Selenium and the method I used for Top Rep$ (finding and moving through elements using XPath), but this returned only the first 50 results--mostly likely a scrolling issue. Indeed, from scrolling down the page and inspecting the last image, I knew that there were ~330 possibilities. Sam reminded me to check the Ajax calls through the browser’s developer console (Network > XHR), and from there I pulled a link within which I found another link that gave me the image sources in JSON-formatted data. I quickly discovered how I could iterate through all of the results by manipulating a value in this URL. 

From there I wrote two scripts: one to search, scan, and download a picture of a vintage urinal, and another to authenticate into and post the photo to @iamrmutt’s Twitter account. The first script stores all image URLs into an array from which a random one is selected, and the associated file is subsequently downloaded to disk. (Update! In retrospect, after running this for a week, I should also store which links are randomly chosen into a text file and check against that to prevent repeat posts.) From the first script, the second script imports the variable containing the filename of the saved photo and then uses the Tweepy library to post it via Twitter’s API. So though I have two scripts, I only have to call one to complete the entire process. (Of note, since I download the photo with the same filename every time, and because my first script will not download an image if the same name already exists, my update status script deletes the file on my local disk after sending it to Twitter to prevent the same image from posting each time.) 

Troubleshooting update! Since I drafted this post, two issues arose. The first was that my requests to the original DuckDuckGo URL stopped working with the keywords, vintage urinal. Plopping it into my browser returning a blank page except for, "If this error persists, please let us know: ops@duckduckgo.com." However, after making it plural, I was back in business...for a while, until that broke, and I changed it to urinals vintage... I also received this Tweepy error twice in a row: "tweepy.error.TweepError: [{u'message': u'Error creating status.', u'code': 189}]." Though I was  able post text-only updates at the time, the issue eventually resolved itself after a couple of hours. (Update on the update: Adding in the vqd number to the request URL allows me search with the original keywords, but I have yet to uncover what this is exactly. Also, I noticed that the Tweepy error occurs when the downloaded image is zero bytes. Could this be because the image no longer exists online?) All good to know for future projects. 

Code on GitHub