Working with Word Lists & Office Hours with Allison Parrish (Monday, April 1)
I’ve been wondering how create dictionaries of “Goldilocks words”—words that aren’t too easy to slip into conversation on the sly (e.g. an, the, in, but) but are not too obscure either. One idea to determine word difficulty is to use its frequency within a particular corpus.
I got this idea from noodling around with wordsapi.com (a dataset of 350,000 words, of which 18% include a zipf or frequency score) and was later confirmed in my conversation with Clay Shirky. I started pulling random words for different parts of speech along with their zipf, a decimal number to the hundredths between 1-7. My notes are incomplete here, but it seems that I didn’t trust the data for some reason, and I kept getting duplicates. In retrospect, I could have kept track of repeats, but any event, I research word frequencies and discovered the work of Mark Davies, a linguistics professor at Brigham Young University. His projects include Word Frequency Data, from which I retrieved a clean and robust word sampling with frequency and parts of speech data from the Corpus of Contemporary American English (COCA). This contains over 560 million words and is the “largest freely-available corpus of English, and the only large and balanced corpus of American English.” I trusted this data, but it listed word frequencies at a large scale for which I wasn’t sure how handle at 3am, from 21 to 6332195 so I queried the words against Words API to get friendlier numbers. (Allison later taught me how I to calculate the zipf score myself: it’s called math.)
In the end, I created three lists of words with from different ranges of frequencies, each with an equal allotment of nouns, adjectives, verbs, and adverbs. Each word was labeled with one part of speech, which of course is problematic considering that words can be several parts of speech depending on how they are used. I created physical word cards from these lists, which the thought that one day I’d test a point system. For example, easiest words worth two points, some worth three, and the hardest ones worth four. What kind of mechanics are needed when you’re aiming to reach a total point score instead of a total word score?
Allison and I spoked about technical ways to create word lists from texts, and some of her tutorials include:
She also pointed me to an open source word list here and books of parlor games from the 1800s! Many of the word games in those reminded me of mid-20th century games that I found on boardgamegeek.com during some of my early research, many of which involve some form of deciphering hidden words from rhyming clues or charades play.
I also ran two very different playtests this week! Here are the highlights:
Playtest 6 • Tuesday, April 2 (ITP Quick & Dirty Show)
The Quick & Dirty Show provided an opportunity to test out a few things:
Parts of speech: are there some that are easier to detect / sneak than others?
A new introduction script with more precise language and organization
A new mechanic when guessing someone else’s word: if you’re right you take the card-in-hand of the other person and add it to your pile; if you’re wrong, they take your current card-in-hand into their stack of points.
Slogans and login design: which resonates more?
In total, seven different groups of people played for nearly 2.5 hours straight. The included a mix of current ITP students (1st-years, 2nd-years, and residents), friends of current students, and prospective students. I personally knew about half of the people who played. Of course the context of the event is to test work, so folks who sat at the table did so ready to play. It was a blast! The introduction and the guessing word mechanics felt right—much fewer questions overall compared to past playtest sessions. International students suggested it would be a fun way to practice English (this is a repeating theme). My unscientific assessment was that adjectives and adverbs are too easy. “Sincere Competitive Chitchat” seems to be winner.
Playtest 7 • Tuesday, April 3 (ITP Feedback Collective)
Early on in this process, Greg Trefry suggested that I play the game in a variety of groups, even with folks who don’t really want to play. My sense is that I checked this box by forcing the game into context of the only formal crit group at ITP. The atmosphere was completely different from the festiveness of the night before. In comparison it was eerily quiet and only three people played across wide classroom tables while others looked on—which added an off-putting performance vibe into the mix. Attendees included one professor, one resident, four 1st-year students, and myself. It was useful, however, because it helped me see a recurring theme when people who do not know each other personally well, and who are not seeking to play the game, are wrangled into it: the conversations invariably lag and there’s feedback to include a timer to pressure people into speaking. I also tested something new and presented a choice of themed words: the most-searched Shakespeare keywords (source), keywords from A Brief History in Time (The Foreword to Chapter 6) (source), and keywords from the two most recent State of the Union Addresses (source). Players chose Stephen Hawking. This throws an extra layer of meta into the game: 1) keeping up with the conversation, 2) planning how to insert your word, and now 3) considering the context of the words’ theme to help you catch them.
Going Forward, some notes for the next three weeks:
Pick a target audience: This game has a different feel and might need different mechanics for different contexts (an ice-breaker for people who just met, a parlor game for friends and family, or language fluency practice with vocab words—an inkling that I’ve had since the midterm presentation. I probably need to focus on just one group for the remainder of this semester: let’s do friends!
Schedule events: I need to plan rendezvous with my friends.
Documentation: …and start filming said rendezvous. Since the final thesis assessment is a presentation, it will be imperative that I show the game in action in order to explain well it to the audience. I’ve ordered a shotgun mic for my smartphone and additional filming accessories for a lightweight yet quality video recording rig.
A digital version? Ideally, I’d like to code a digital version to test out a variety of dictionaries as it takes so much time to make word cards) so I’ve started coding a possibility. I feel good about the mechanics of the current paper prototype, but I’m not sure how to translate them into a web app exactly. This will be the focus up the upcoming week.