Is rhyme meaningful? Examining rhyme words in topic models of nineteenth-century English sonnets

1. Introduction

In the nineteenth century, most lyric poetry written in English was rhymed: 95% of the 108,142 nineteenth-century poems contained in the Chadwyck-Healey English Poetry database are rhymed, and as Peter McDonald has argued, most nineteenth-century poets simply assumed that to write poetry meant writing in rhyme (McDonald 2012: 5-6). McDonald poses a provocative question about how far the constraints of rhyme influenced poetic practice: “What is meant, then, when we say that a poet ‘uses’ a rhyme? Are we quite sure that the rhyme isn’t in fact ‘using’ him or her?” (McDonald 2012: 5). In other words, we might ask how (or if) rhyme words are related to the ideas expressed in a poem. Although literary critics may analyze individual words in individual poems as being significant, this question reaches beyond specific poems to the larger relationship between the constraints of poetic form and the ideas, themes, or emotions expressed in poetry. This paper presents a series of experiments that explore the relationship between rhyme words in English sonnets and their semantic meaning as discovered through LDA topic models.

2. Context

Rhyme in English exists between two words that end in syllables which “have identical stressed vowels and subsequent phonemes but differ in initial consonant(s) if any are present” (Brogan / Cushman 2012: 1184). Line-end rhymes of one syllable predominate in English poetry, in part because English is less inflected than other languages. Rhymes in English verse can range from “perfect” or complete rhymes, such as cat/hat, to “imperfect” or near rhymes, such as fact/hat. Some poets also use “eye rhymes” that are only visible on the page but not linked in sound, such as love/prove.

The sonnet is a tightly defined poetic form consisting of 14 lines rhymed in a set pattern. The two main rhyme schemes used for sonnets in English are the “Petrarchan” or “Italian” sonnet form, rhymed ABBA ABBA CDECDE (where the three rhyme pairs that make up the last six lines can be arranged in various permutations) and the “Shakespearean” or “English” sonnet form, rhymed ABAB CDCD EFEF GG. Sonnets are thus generally very similar in the number of words contained in the text. Nineteenth-century theories of the sonnet argued that because of its brevity and formal complexity, each sonnet should focus on only one main idea (Russell 1876: 408). These qualities make sonnets inherently well-suited to topic modeling. A dataset consisting of 12,230 sonnets written by 297 nineteenth-century poets selected from the Chadwyck-Healey database was created for this project.

Latent Dirichlet Allocation (LDA) topic modeling is used in this study to discover the semantic topics in the corpus (Blei et al. 2003). LDA has been shown to be effective for exploring and classifying short poetic texts like sonnets (Navarro-Columbo 2018; Plecháč / Haider, 2020).

3. Method

In order to explore the question of how rhyme words are related to semantic content, several approaches are used. Three approaches examine the distribution of rhyme words in the corpus and their position within the probabilities output by the LDA model.

(1) Examine the frequency and distribution of rhyme words in the corpus. As shown in prior work, rhyme word distribution can usefully be examined through three metrics: rhyme frequency ranking; rhyme word keyness; and the ratio of a rhyme word’s frequency to its frequency in the entire corpus, which represents the likelihood that a reader would encounter a specific word as a rhyme word (Houston 2022).

(2) Examine where rhyme words are ranked in the topic-word probabilities to assess how specific rhyme words contribute to each topic in an LDA model.

(3) Examine which rhyme words in a given document get assigned to the top document-topic probabilities for that document in order to assess how specific rhyme words contribute to the topics within each document in an LDA model.

Three additional approaches seek to compare models run with and without rhyme words, to further test whether rhyme words contribute to the semantic discourses of the poems. These approaches are inspired in part by Schofield et al. 2017 which assessed the effects of stopword removal before and after model training. If rhyme words are highly important to the semantic meaning of the poem, then there would be a significant difference between models trained with and without rhyme words.

(4) Examine the similarity of topic models run with and without rhyme words, using Jaccard similarity on the topic-word lists.

(5) Examine the similarity of document clusters in topic models run with and without rhyme words, using the Adjusted Rand Index.

(6) Examine the topic coherence of topic models run with and without rhyme words using co-document word co-occurrence (Mimno et al. 2011).

To futher assess the contribution of rhyme words, each of these methods is also conducted on topic models with a random selection of 14 words removed from each poem (equivalent to the number of rhyme words in a sonnet).

4. Discussion

This paper discusses which of the outlined approaches are most effective in examining the semantic contribution of rhyme words in poetry and how this research contributes to computational poetics.

Appendix A

Bibliography

Blei, David M. / Ng, Andrew Y. / Jordan, Michael I. (2003): “Latent Dirichlet Allocation”, in: Journal of Machine Learning Research 3: 993-1022.
Brogan, Terry V. F. / Cushman, Stephen (2012): “Rhyme”, in: Greene, Roland / Cushman Stephen / Cavanagh Claire (ed.): The Princeton Encyclopedia of Poetry and Poetics . Princeton: Princeton University Press 1182-1192.
Houston, Natalie M. (2022): “Rhyme Frequency in Nineteenth-Century English Poetry”, in: Bories, Anne-Sophie / Plecháč, Petr / Ruiz Fabo, Pablo (eds.): Computational Stylistics in Poetry, Prose, and Drama . Berlin, Boston: De Gruyter 117-132.
McDonald, Peter (2012): Sound Intentions: The Workings of Rhyme in Nineteenth-Century Poetry . Oxford: Oxford University Press.
Mimno, David / Wallach, Hanna M. / Talley, Edmund / Leenders, Miriam / McCallum, Andrew (2011): “Optimizing semantic coherence in topic models”, in: Association for Computational Linguistics (ed.): Proceedings of the 2011 conference on empirical methods in natural language processing , Edinburgh, July 2011: 262-272.
Navarro-Colorado, Borja (2018): “On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry”, in: Frontiers in Digital Humanities 5, 15: 1-12.
Plecháč, Petr and Haider, Thomas (2020): “Mapping Topic Evolution Across Poetic Traditions,” in: Alliance of Digital Humanities Organizations (ed): Proceedings of the International Digital Humanities Conference DH2020, Ottowa.
<https://dh2020.adho.org/wp-content/uploads/2020/07/600_MappingTopicEvolutionAcrossPoeticTraditions.html> [12.01.23].
[Russell, Charles W.] (1876): “Critical History of the Sonnet”, Part 1, in: Dublin Review n.s. 27: 407-408.
Schofield, Alexandra / Magnusson, Måns / Mimno, David (2017): “Pulling out the stops: Rethinking stopword removal for topic models”, in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics , Vol. 2., Valencia, April 2017: 432-436.