Our project sits at the intersection of two critical areas of interest for the Digital Humanities. First, there have been increasing calls for scholars of Digital Humanities to take an active interest in human-caused climate change and the ongoing environmental catastrophe (Baillot et al. 2021; Whitmarsh et al. 2021). As such, there have been a number of recent projects that have centered environmental concerns within the framework of Digital Humanities investigations (Nowviskie 2015; Pendergrass, 2019; Supran and Oreskes 2021; Grubert and Algee-Hewitt 2017). Simultaneously, genre has been a key area of interest for scholars in the digital humanities (Underwood 2018; Piper 2018; King 2021; Hart 2015). By using computational models to classify texts within (and without) a given generic category, scholars have put pressure on the established genres of literary history. Our project combines these interests by exploring the emerging genre category of Climate Fiction (CliFi) in terms of its relationship to public education and environmental activism.
A major hurdle facing contemporary climate action is the large-scale public disbelief in climate change itself. Public-facing education on climate, like other forms of science communication, has struggled to gain traction with a resistant public due to a fundamental principle of science communication: simply giving people more information does not change minds (Bolsen et al. 2019). The recent movement towards “empirical ecocriticism” seeks to redress this, in part by exploring how other forms of entertainment media can play a role in public education (Schneider-Mayerson 2018). It is generally accepted that CliFi novels have played a role in spreading awareness about anthropogenic climate change; however, what has largely gone unstudied are the ways in which they incorporate real-world information about the environment into their fictional worlds. Our project leverages the computational methods of the Digital Humanities to explore how climate information is embedded within this emerging genre.
For our investigation, we assembled both a corpus of climate fiction novels, as well as a comparative corpus of 20 th century novels. Our climate change corpus was sourced from an extensive survey of online resources for Climate Fiction. Drawing all of these sources together, we created a list of 435 titles written in the 20 th and 21 st century (Because the Climate Fever data set, described below, is currently an English-only resource we elected to assemble an English-language corpus.).We were able to legally purchase 275 of these texts as born-digital works and we extracted the text of the novels as text files. Our comparison corpus was drawn from the Chicago TextLab corpus of 20 th century novels (Chicago TextLab 2015). From the overall corpus of 9449 novels, we randomly sampled 275 novels within the same date-range and length-range of our climate corpus.
Finally, as we are interested in identifying the presence of climate facts within CliFi, we turned to the Climate Fever Dataset (Diggelmann et al. 2020). It consists of 1535 sentences sampled from the internet containing claims about the environment and climate change, including both climate-positive sentences, as well as climate-negative facts that deny climate change. Each sentence was independently rated by climate scientists as either supported by the science, refuted by science, or if there was not enough information to judge. With this data, we can not only locate sentences within novels that appear to be “climate fact-like”, but also, replicating Diggleman et al.’s study, classify whether facts in novels appear to be supported by or refuted by the science.
Given that the Climate Fever dataset is classified at the scale of the sentence, we elected to classify individual sentences in our two novel corpora using a machine learning (ML) model as to whether or not they classified with a higher probability of belonging to the Climate Fever data set or a novel. We hypothesized that a robust model would classify most sentences in our novel corpora as “novel sentences,” but that sentences which resembled the Climate Fever sentences in both form and content would be surfaced within the novels of both corpora. For a model, we adopted a TensorFlow-based transfer learning approach using the Keras API and based on Google’s multilingual sentence encoders (Chollet et al. 2015; Cer et al. 2018).
We extracted 1535 sentences from our Chicago Corpus, labeling them as “novel sentences”, which we combined with the 1535 “Fever sentences” from the Climate Fever dataset. The sentence encoder model proved exceptionally capable of differentiating novel vs climate sentences with accuracy and f1 scores (using a 1/3 withheld test sample) well above 0.99 (Figure 1).
Figure 1: Model accuracy, recall, precision and f1 scores on the task of classifying climate fever sentences vs novel sentences.
As we also wanted to classify whether the fever sentences that we identified in novels were climate positive or negative, we also replicated Diggleman et al.’s study, training a separate Keras model on the climate fever dataset itself to differentiate between climate positive and climate negative sentences. Although not as robust as our fever vs novel model, our process still performed more than adequately with accuracy and f1 scores approaching 80% (Figure 2).
Figure 2: Model accuracy, recall, precision and f1 scores on the task of classifying supported vs refuted sentences.
Our initial study classified all sentences from both corpora using our fever vs novel sentence model. As we predicted, the vast majority of sentences in both novels were classified as novel sentences, however, there were a non-trivial number of sentences in both corpora that were classified as belonging to the climate fever data set. More importantly, there is a large discrepancy between our two corpora, with over 4% of sentences in our CliFi Corpus classified as fever sentences vs 0.61% of the Chicago corpus (Figure 3). This indicates that Climate Fiction does incorporate far more real-world facts, particularly those about climate, than a random novel from the same period.
Figure 3: Number of sentences classified as novel (green) or fever (red) sentences in both the sampled Chicago corpus and the CliFi corpus (Wilcoxon Rank Sum Test p-value on the difference between corpora was ~2x10-16).
There was a similar discrepancy in our results for whether the fever sentences in novels are supported or refuted by climate science. Using just those sentences from both corpora that were classified as fever sentences, we then classified these using our second model as to whether they were predicted to be refuted or supported by climate science. Once again, there was a significant difference between our two corpora with a smaller, but still highly significant margin of sentences from the CliFi corpus classified as supported by the science vs those from the Chicago corpus (Figure 4).
Figure 4: Percentage of fever-like sentences classified as supported (red) or refuted (yellow) in both the sampled Chicago corpus and the CliFi corpus (Wilcoxon Rank Sum Test p-value on the difference between corpora was ~2x10 -16).
As a final analysis, we selected a sample of novels from our CliFi dataset to close read for how the climate facts that the model discovered were embedded within the narrative. In addition to the literary analysis, we also tracked where in the narrative the fever-like sentences were located to see how facts were distributed within novels. In a surprising result, we discovered that climate fever-like sentences are not evenly dispersed throughout narratives, but instead occur within discrete segments (Figure 5).
Figure 5: Raw number of climate fever-like (or fact) sentences per 100 sentence segments of Jemisin’s The Fifth Season. Red points indicate segments with a number of fever-like sentences more than 2 standard deviations above the mean.
Our results show that climate fiction is unique in its use of climate facts – both in terms of how many factual sentences are embedded within the fictional narratives, as well as in how many are “true” facts that are supported by current environmental science. Moreover, these facts are not arranged evenly throughout the narrative, but instead are grouped together at discrete points of the text. Taken together, these results indicate that climate fiction takes a unique approach to embedding many more “true” fact-based sentences within discrete parts of narratives as it seeks to teach its readers about the real-world dangers of climate change.