In the nineteenth century, most lyric poetry written in English was rhymed: 95% of the 108,142 nineteenth-century poems contained in the Chadwyck-Healey English Poetry database are rhymed, and as Peter McDonald has argued, most nineteenth-century poets simply assumed that to write poetry meant writing in rhyme (McDonald 2012: 5-6). McDonald poses a provocative question about how far the constraints of rhyme influenced poetic practice: “What is meant, then, when we say that a poet ‘uses’ a rhyme? Are we quite sure that the rhyme isn’t in fact ‘using’ him or her?” (McDonald 2012: 5). In other words, we might ask how (or if) rhyme words are related to the ideas expressed in a poem. Although literary critics may analyze individual words in individual poems as being significant, this question reaches beyond specific poems to the larger relationship between the constraints of poetic form and the ideas, themes, or emotions expressed in poetry. This paper presents a series of experiments that explore the relationship between rhyme words in English sonnets and their semantic meaning as discovered through LDA topic models.
Rhyme in English exists between two words that end in syllables which “have identical stressed vowels and subsequent phonemes but differ in initial consonant(s) if any are present” (Brogan / Cushman 2012: 1184). Line-end rhymes of one syllable predominate in English poetry, in part because English is less inflected than other languages. Rhymes in English verse can range from “perfect” or complete rhymes, such as cat/hat, to “imperfect” or near rhymes, such as fact/hat. Some poets also use “eye rhymes” that are only visible on the page but not linked in sound, such as love/prove.
The sonnet is a tightly defined poetic form consisting of 14 lines rhymed in a set pattern. The two main rhyme schemes used for sonnets in English are the “Petrarchan” or “Italian” sonnet form, rhymed ABBA ABBA CDECDE (where the three rhyme pairs that make up the last six lines can be arranged in various permutations) and the “Shakespearean” or “English” sonnet form, rhymed ABAB CDCD EFEF GG. Sonnets are thus generally very similar in the number of words contained in the text. Nineteenth-century theories of the sonnet argued that because of its brevity and formal complexity, each sonnet should focus on only one main idea (Russell 1876: 408). These qualities make sonnets inherently well-suited to topic modeling. A dataset consisting of 12,230 sonnets written by 297 nineteenth-century poets selected from the Chadwyck-Healey database was created for this project.
Latent Dirichlet Allocation (LDA) topic modeling is used in this study to discover the semantic topics in the corpus (Blei et al. 2003). LDA has been shown to be effective for exploring and classifying short poetic texts like sonnets (Navarro-Columbo 2018; Plecháč / Haider, 2020).
In order to explore the question of how rhyme words are related to semantic content, several approaches are used. Three approaches examine the distribution of rhyme words in the corpus and their position within the probabilities output by the LDA model.
(1) Examine the frequency and distribution of rhyme words in the corpus. As shown in prior work, rhyme word distribution can usefully be examined through three metrics: rhyme frequency ranking; rhyme word keyness; and the ratio of a rhyme word’s frequency to its frequency in the entire corpus, which represents the likelihood that a reader would encounter a specific word as a rhyme word (Houston 2022).
(2) Examine where rhyme words are ranked in the topic-word probabilities to assess how specific rhyme words contribute to each topic in an LDA model.
(3) Examine which rhyme words in a given document get assigned to the top document-topic probabilities for that document in order to assess how specific rhyme words contribute to the topics within each document in an LDA model.
Three additional approaches seek to compare models run with and without rhyme words, to further test whether rhyme words contribute to the semantic discourses of the poems. These approaches are inspired in part by Schofield et al. 2017 which assessed the effects of stopword removal before and after model training. If rhyme words are highly important to the semantic meaning of the poem, then there would be a significant difference between models trained with and without rhyme words.
(4) Examine the similarity of topic models run with and without rhyme words, using Jaccard similarity on the topic-word lists.
(5) Examine the similarity of document clusters in topic models run with and without rhyme words, using the Adjusted Rand Index.
(6) Examine the topic coherence of topic models run with and without rhyme words using co-document word co-occurrence (Mimno et al. 2011).
To futher assess the contribution of rhyme words, each of these methods is also conducted on topic models with a random selection of 14 words removed from each poem (equivalent to the number of rhyme words in a sonnet).
This paper discusses which of the outlined approaches are most effective in examining the semantic contribution of rhyme words in poetry and how this research contributes to computational poetics.