Creating Computer-Generated Poetry

This is the notebook that Sven Anderson and I used for our “Family Weekend” class over the weekend. We began the class with two poems from an NPR article called, “Human or Machine: Can you tell who wrote these poems?

Privacy Performed at Scale

You wear a Fitbit during your jog in the morning, swipe your grocery loyalty card when you checkout in the evening, and share your contacts when you play Candy Crush at night. This “Internet of Things” offers automation, customization, and convenience and, in turn, demands access. The daily choices to allow access or not repeatedly require you to produce the contours of your private life, to open certain spaces and keep others closed. In this collaborative potluck, we asked: how do those contours take shape? (If your first question, is what the heck is a critical potluck? In short, it’s a workshop-like collaborative event; this one was held on September 15 at Bard. Check our handy guide for more details.)

The Language of Notation and Imaginative Writing

What are the conditions under which very different words are brought together in writing? Are varied word combinations predisposed to particular genres or discourses? Are there types of words that could be said to constitute lexical situations that would not otherwise occur? And, when words that are not typically used in the same context co-occur, what are they doing? I am going to report the results of a text analysis experiment designed to begin to address these questions. Recent advances in semantic modeling (from topic modeling to word embeddings) make it relatively easy to describe the statistical likelihood of a given set of words to co-occur. Methods in semantic modeling start from the premises that words tend to occur in particular linguistic contexts, and we can decipher the meaning of an unknown word based upon the words that appear near it. While many humanists (like me) interested in text analysis have begun to explore the mathematics of operationalizing these premises, corpus linguists have been thinking about modeling language in this way since the 1940s and 50s. Early theorist of the “distributional structure” of language, Zellig Harris, explains that, “The perennial man in the street believes that when he speaks he freely puts together whatever elements have the meanings he intends; but he does so only by choosing members of those classes that regularly occur together and in the order in which these classes occur.”1 His contemporary, J. R. Firth, put the claim even more plainly in his now famous formulation: “You shall know a word by the company it keeps.”2 Since these foundational theories, linguists have produced methods for mathematically representing the tendencies of language. In this experiment, I focus on word space models, which represent the distribution of words in a corpus as vectors with hundreds or thousands of dimensions wherein proximity in the vector space corresponds to semantic similarity. The vector position for any given word represents a probabilistic profile regarding the lexical contexts in which one would expect to find that word. As a consequence, word vectors can be added and subtracted to find the words most similar in the model to the resulting composite vector.