Sustaining Cultural Analytics Research and Teaching: The Future of Code and Data Work in DH

1. Overview

In recent years, cultural analytics (CA) has emerged as a core constituency of the “big tent” of digital humanities (DH). Known previously as culturomics, distant reading, computational literary studies, or macroanalysis, the label “cultural analytics” is preferred by many because it attempts to be inclusive of humanities subjects outside of literary studies, and it suggests a confluence among humanities, social sciences, and computational disciplines. Under any name, CA generally aspires to go beyond “computer science applied to culture” to propose “a wholesale rethinking of both of these categories” (Piper 2). And, as Ted Underwood has argued, acknowledging CA as a multi-disciplinary project requires recognizing that it belongs equally to both the humanities and quantitative social science ("A Broader Purpose"). 

The distinct nature of CA as a confluent area of specialization with a diverse set of stakeholders creates an environment where many different demands are made on its practitioners. Expectations for teaching and research, existing incentives, and institutional priorities perpetuate and exacerbate these challenges. The concept of “code and data work” is crucial to these imbalances because it encompasses the labor of data collecting, curation, wrangling, etc., as well as the degree to which the ostensibly mechanical or trivial details of coding can have “heightened downstream impact” (Sambasivan et al.). These kinds of labor, as Nithya Sambasivan et al. and Lauren Klein have both argued in very different contexts, are often undervalued “relative to the lionized work of building novel models and algorithms” (Sambasivan et al. 1). In turn, they are more likely to be designated as carework; that is, feminized labor that is meant to be conducted “because you care,” and thus requires little or no compensation (Klein). 

This panel of short papers focuses on the role of code and data work in CA teaching and research, with an emphasis on areas where we might reimagine the socio-technical infrastructure supporting this labor in order to have a positive impact on CA specifically and, by extension, DH as a whole. More specifically, we discuss the following topics: a new genre of peer-reviewed, article length scholarship called “praxis papers” as a means to encapsulate forms of code and data work distinct to CA in the form of a citable publication (Lavin); the complexities of representing the humanities in classroom spaces where the main learning goal is a computational skill (Ladd); the importance of critically examining our coding practices, with an emphasis on sustainability (LeBlanc); how CA pedagogy has taken on the challenge of evaluating data-driven and code-based assignments (Preus); and how we might develop a collective framework of best practices to circumvent the unique challenges that CA practitioners face.

2. Individual Abstracts 

Can DH Do Details? Toward a Praxis Paper Model of Publication

Matthew J. Lavin

 

Many in digital humanities (DH) have argued that the “mechanical processes” of computational inquiry have direct and substantive impact on downstream findings and their interpretation (Drucker, 628; see also, Bode, Cordell, and D’Ignazio & Klein). Such claims of interconnection are generally made without providing detailed examples of automation systems and the values they may inscribe. In this presentation, I describe the idea of a “praxis paper” as one possible solution to a range of structural factors in DH that make “doing details” challenging, which is especially problematic for computational humanities or cultural analytics concerns. DH scholarship has established a conceptual foundation for genre, but it is too often the case that the specifics of data and code are regarded as either trivially straightforward or requiring too technical detail to understand. Procedural decisions are often omitted entirely, glossed over, or addressed only in code files or project documentation. Journals more focused on computational humanities or cultural analytics, meanwhile, tend to require either methodological novelty or an emphasis on new findings, which excludes some of the most important considerations that scholars may wish to engage with at different stages of their research. I imagine a praxis paper as a new genre of article-length, peer-reviewed scholarship that synthesizes algorithmic precision with an attempt to consider the broader consequences of seemingly neutral computing decisions. In some ways analogous to a data paper—which encapsulates the preparation, curation, and description of a dataset in the form of a citable publication—a praxis paper would focus on using procedural specificity to demonstrate how methodological decisions might embody and encourage particular sets of values.

Humanities in the Data Classroom

John R. Ladd

Definitions of digital humanities (DH) tend to emphasize its bidirectional interdisciplinarity: the digital (meaning computation, data, and digital media) is meant to inform the humanities, while the humanities in turn is meant to inform and apply to the digital (Fitzpatrick, Risam). Cultural analytics is often considered as comprising the first part of that formula: CA research and teaching involves applying computational techniques to humanities objects. Whereas the reverse, applying humanist techniques to computational objects, has traditionally been the purview of critical digital studies, STS, and other fields which examine the historical and cultural impacts of digital media. Discussions of cultural analytics pedagogy often focus on computational tools and techniques that can be brought into humanities learning spaces: ways of applying the digital to the humanities. But introductory data science or computer science courses seldom include the humanist style of critical engagement that DH promises. In this short paper I hope to explore this imbalance and to think about ways of applying the humanities to the digital in classroom spaces where the main learning goal is a computational skill. I will share my experiences as a humanist scholar who teaches non-humanities courses and will showcase the ways that I have brought the humanities into the coding and data classroom. Rather than sitting on one side of the digitalhumanities / humanitiesdigital divide, cultural analytics can be a bridge that allows students outside of the humanities to be introduced to the field in the context of their technical learning.

File Path Not Found: (Re)Coding Practices in Cultural Analytics

Zoe LeBlanc

Coding holds an influential yet ill-defined role in digital humanities (DH). DH scholars have debated whether coding is a requisite skill or a defining activity of the field (Posner, Ramsay, Dombrowski). Scholars in Critical Code and Software Studies have advocated for recognizing code as both a scholarly object and a form of scholarship (Van Zundert and Haentjens Dekker). Simultaneously, a proliferation of resources have emerged, teaching DH scholars to code, which increasingly focuses on Cultural Analytics (CA) methods. However, this paper argues that we need to shift from debating Python versus R towards a critical examination of DH, and specifically CA coding practices. Historically, coding in DH has been primarily for web development, which, though imperfect, has some established best practices. However, coding for CA, which is primarily data science focused, is relatively novel, leading many scholars and students to reinvent the wheel, especially for creating collaborative and replicable coding practices. 

Drawing from my research into CA coding practices on GitHub, experiences from teaching a graduate seminar that has students replicate existing CA code, and challenges faced in maintaining my own CA code, this paper attempts to propose how we might improve these practices. Specifically considering how we might prioritize sustainability to make CA coding not only more accessible but also something that supports the long-term viability of CA scholarship and teaching. Given this brave new AI world, it is increasingly urgent that we transcend the binary view of coding – to code or not, open or closed – to ensure that coding is both inclusive and impactful in CA.

Assessment and Data Work in Humanities Classes 

Anna Preus

Doing data work in humanities classes presents particular challenges in relation to assignments and evaluation. Courses in cultural analytics (CA) and humanities data science cover skills-based and content-focused material, requiring students to learn technical proficiencies for manipulating data as well as skills in CA and knowledge of historical and contemporary contexts. The data work involved in answering even straightforward-seeming questions about aspects of human culture is often beyond the scope of a semester-long project, but working only with pre-existing datasets risks de-emphasizing key areas underlying CA research: careful data collection, clear rationales for data curation and annotation, contextually relevant metadata standards, etc. Intensifying these challenges, courses in CA and humanities data science are often aimed at broad audiences, drawing students from a range of disciplinary backgrounds who have differing levels of confidence when it comes to implementing technical tools and crafting cultural arguments. How do we create assignments that illustrate the value of slow, careful data work and also invite students to address interesting, substantive questions? How do we credit different kinds of work undertaken by students with differing levels of experience with technical approaches, especially programming? And at the graduate level, how can we use assessment as an opportunity to make legible labor that has not traditionally been credited in humanities fields? Drawing on experience in undergraduate and graduate courses in digital humanities and humanities data science, I will discuss designing and grading open-ended technical assignments, using writing prompts to encourage students to narrate choices and labor involved in data work, and assigning value to in-progress work on larger-scale or longer-term digital projects in humanities graduate education. 

Towards Best Practices for (Humanistic) Collection of Social Media-Based Collection: The Case of TikTok
Tess McNulty

Throughout the past decade, much emerging work in cultural analytics (CA) has focused on traditional objects of humanistic study, considered at scale, like large collections of Victorian novels or Renaissance paintings. In recent years, however, multiple practitioners of the field have begun to turn their attention to social-media based “content,” from tweets about poetry (Walsh 2018) to Instagram images or heritage sites (Loke et al. 2022). The collection of this content, however, has presented new challenges (like API closures or the lack of users’ consent). And while researchers in the social sciences have devoted significant discussion to these issues, often generating best practices (see, for example, AOIR’s ethical guidelines), humanists have yet to collectively reckon with the complications that they uniquely face.

In this brief talk, I use the example of TikTok to propose three areas in which humanists face particular challenges regarding social-media based data-collection, relative to their peers in the social-sciences: first, in the kinds of data they might want to collect (e.g., “historical” vs. present-tense); second, in the ways in which legal and TOS-based restrictions complicate this process (e.g., in publishing for popular vs scholarly venues); and third, in the ethical concerns surrounding this data’s presentation (e.g., how the need to paraphrase posts may take greater tolls on interpretive work). While I will broach some solutions to these problems, the aim of this brief talk will be to open—rather than close—the discussion in these areas, so that, as practitioners of CA, we can together develop best practices.

Appendix A

Bibliography
  1. Bode, Katherine. “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History.” Modern Language Quarterly 78, no. 1 (March 1, 2017): 77–106. https://doi.org/10.1215/00267929-3699787 .
  2. Cordell, Ryan. “‘Q i-Jtb the Raven’: Taking Dirty OCR Seriously.” Book History 20, no. 1 (October 25, 2017): 188–225. https://doi.org/10.1353/bh.2017.0006 .
  3. D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism . MIT Press, 2020.
  4. Dombrowski, Quinn. "Does Coding Matter for Doing Digital Humanities?". In B loomsbury Handbook to DH , ed. James O'Sullivan. 2022.
  5. Drucker, Johanna. “Why Distant Reading Isn’t.” PMLA 132, no. 3 (2017): 628–35.
  6. Klein, Lauren. “ The Carework and Codework of Digital Humanities,” Digital Antiquarian Society Conference, May 29-30, 2015. https://lklein.com/archives/the-carework-and-codework-of-the-digital-humanities/
  7. Loke, Tania, et al. “Heritage Site-Seeing through the Visitor’s Lens on Instagram.”  Journal of Cultural Analytics , October 24, 2022. https://culturalanalytics.org/article/38966  
  8. Lopez, Andrew, et al. On Scholarly Communication and the Digital Humanities: An Interview with Kathleen Fitzpatrick – In the Library with the Lead Pipe . https://www.inthelibrarywiththeleadpipe.org/2015/on-scholarly-communication-and-the-digital-humanities-an-interview-with-kathleen-fitzpatrick/ . Accessed 5 Dec. 2023.
  9. Piper, Andrew. “There Will Be Numbers,” Journal of Cultural Analytics , May 23, 2016. https://doi.org/10.22148/16.006 .
  10. Posner, Miriam. “Some Things to Think about before You Exhort Everyone to Code,” February 29, 2012. https://miriamposner.com/blog/some-things-to-think-about-before-you-exhort-everyone-to-code/ .
  11. Ramsay, Stephen. “Who’s In and Who’s Out.” In Defining Digital Humanities: A Reader , by Melissa M. Terras, Julianne Nyhan, and Edward Vanhoutte. London: Routledge, 2016.  
  12. Risam, Roopika. New Digital Worlds: Postcolonial Digital Humanities In Theory, Praxis, And Pedagogy . Northwestern UP, 2019.
  13. Sambasivan, Nithya, et al., “‘Everyone Wants to Do the Model Work, Not the Data Work’: Data Cascades in High-Stakes AI,” Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , CHI ’21 (Association for Computing Machinery, 2021). https://doi.org/10.1145/3411764.3445518 .
  14. Underwood, Ted. “A Broader Purpose,” Varieties of Digital Humanities panel, MLA, Jan 5, 2018. https://tedunderwood.com/2018/01/04/a-broader-purpose
  15. Walsh, Melanie. “Tweets of a Native Son: The Quotation and Recirculation of James Baldwin from Black Power To# BlackLivesMatter.” American Quarterly 70, no. 3 (2018): 531–59.
  16. Zundert, Joris J. van, and Ronald Haentjens Dekker. “Code, Scholarship, and Criticism: When Is Code Scholarship and When Is It Not?” Digital Scholarship in the Humanities 32, no. suppl_1 (April 1, 2017): i121–33. https://doi.org/10.1093/llc/fqx006 .
Matthew Lavin (lavinm@denison.edu), Denison University, United States of America and Zoe LeBlanc (zleblanc@illinois.edu), University of Illinois Urbana-Champaign, United States of America and John Ladd (jladd@washjeff.edu), Washington & Jefferson College, United States of America and Anna Preus (apreus@uw.edu), University of Washington, United States of America and Tess McNulty (tmcnulty@illinois.edu), University of Illinois Urbana-Champaign, United States of America