Divergent Data Cultures in the Humanities

Unlike other examples of data intensive scholarship, digital and computational research in the humanities did not emerge from a common or shared ‘data culture.’ Rather, the production and use of data in the humanities evolved within a subset of communities of practice often concerned with the use of statistical records (Crymble 2021), technical issues involving computation (Nyhan / Flinn 2016.) and digital infrastructure development ( Kálmán, et al. 2019; Niccolucci / Richards 2013; Oldman 2021 ). While it is possible to document data practices in the broad history of the humanities (Borgman 2017), when compared to other areas of scholarship, like biology for example, the emergence of data intensive humanities research does not follow patterns premised simply upon intensification, as research transitions from one type of data production and use to another type of production and use (Agar 2006). This paper presents a new model for understanding the trajectory of this development for the case of the humanities, premised upon the notion of ‘data cultures.’ It documents and analyzes four cases of ‘divergent data cultures,’ and draws out the implications of this approach to the emergence of data intensive scholarship, both for the study of humanities data practices, and for the development and design of research data management infrastructure, and humanities research policy more broadly.

 

We argue that data in the humanities is best understood as a type of cultural ‘entity under redescription’ and hence as a fluctuating concept that is actively being defined and made relevant for the scholarly and cultural work of humanities researchers. The development of data practices in humanities research have not simply resulted from a jump in scale, but rather the introduction of ‘data,’ as concept, entity, and practice, through a multiplied reorganization of the disciplinary relationships between methods, epistemologies, and research objects. We explore the implications of this portrait by looking at how this approach deviates from extant approaches to humanities data, while advocating for a more nuanced portrait of the history of the humanities that, while taking context seriously, also examines patterns of variation and change.

Approaching the history of data intensive scholarship in the humanities from this angle is particularly important given the unique nature of ‘data’ in the humanities, as a much-debated concept (Posner 2015; Schoch 2013 ). Although we may find examples, in the history of humanities scholarship of reference to or use of ‘data’ (Greg,1927), the incorporation of ‘data’—as ontology, entity, and practice—has been either argued directly against (Marche 2012; Lapore 2021 ), or treated as a contentious concept, and hence as either a distortion of cultural practice (Edmond, et al., 2021), or the result of the scientization of humanities research (Pawlicka 2017). The resulting broad incorporation of data within humanities disciplines has often occurred in tension with humanities researchers’ perceptions of appropriate scholarly identity ( Antonijević, 2015 ) or standards of good scholarship ( Drucker 2011. ), and hence in tension with issues such as research methods ( Underwood 2014; Antonijevic / Cahoy 2018 ), epistemology ( Michel et al. 2011 ), and broader normative questions involving the value of humanities research ( Hayot 2021 ). This stands in stark contrast to the history of the sciences, where, although data practices developed in context and disciplinary specific ways, the incorporation of the very idea of data was not as contested.

Although recent work on the development of data intensive practice in the sciences has focused on in depth examination of how an intensification of data use transforms existing research practices (Leonelli / Tempini, 2020.), the same cannot be said for the humanities. Often, rather, research on humanities data is pursued with an explicit assumed contrast to data work and research practices in the sciences (De Rijke / Penders 2018; Peels 2019). The assumed contrast has in many cases been adopted into the emerging dialogue on research data sharing, and in research data management more broadly, as part of the evidence base of humanities research policy (Ruediger / MacDougall 2023). This is a development that potentially distorts how we conceive of the work of humanities researchers, a proposition that has path dependent consequences for large scale ventures like the development of humanities research funding and infrastructure development ( Dombrowski 2014; Kaltenbrunner 2017; Waters 2022. ).

To correct for this, we construe data practices in the humanities as part of broader data cultures, and hence as part of dynamic elements in the organization of scholarly research, subject to change through contestation, innovation, and redefinition. Although there is little current consensus on the definition of data cultures (Oliver et al. 2023), we adopt a provisional definition of data cultures as “the social, technical, and cultural characteristics, values and practices that influence/determine the nature of data production, generation, acquisition, cultivation, use, curation, preservation, sharing, and reuse by individuals, organizations, governments, and societies.” (Oliver et al. 2023) We find specific value in their insistence that data cultures “may co-exist and compete at multiple levels and are dynamic and normative in nature.” (Oliver et al. 2023) However, by mobilizing this definition in our analysis, we emphasize that ‘data cultures’ in our treatment are best understood as a subset of epistemic cultures (Cetina 1999), or the practices, norms, methods, and styles of reasoning by which different disciplines make knowledge through establishing standards of evidence, defining the relationship between theory, explanation, or interpretation, and make sense of norms for the analysis of research objects.

We argue that the emergence or redefinition of a data culture within an established epistemic culture potentially produces shifts in how evidence is evaluated, methods are authorized, and more broadly how the work of research and scholarship is evaluated and used. We find evidence for this in contemporary humanities research, and identify four pathways, or prevalent, yet divergent, data cultures in the humanities by which data intensive scholarship is being constructed. These are: representational data, distant evidence, humanities analytics, and cultural data. We document these four pathways and demonstrate how the ‘data work’ articulated for each pathway indicate a differing composition of concerns regarding the relationship between methods, research objects, and evidence.
 

To build out and explore this portrait of divergent data cultures in the humanities, we draw on data and analysis from the Humanities Data Inquiry, a three year ethnographically informed, mixed-method examination of data issues in the humanities and cultural heritage sector. We present analysis of interviews with humanities researchers and case study analysis from a series of comparative case studies of scholar-led projects that deploy digital and data intensive research infrastructure for scholarly ends. Cases were purposively selected using the following criteria: a. They are scholar-driven; b. Involve representation of ‘real world’ objects involving small, intensively curated datasets; c. Have a documented record of accomplishment and innovation. Additionally, we interviewed humanities researchers from a variety of humanities disciplines who are research active in both non-data intensive research, and those that are actively pursuing data intensive research programs. These researchers were interviewed regarding their professional background and scholarly formation, disciplinary formation and attitudes, research practices involving methods and research objects, considerations of evidence and, their relationship to research infrastructure and scholarly tools, and where appropriate, data practices.

By emphasizing data cultures in the humanities, we seek to advance and broaden the conversation on humanities data, which has been largely shaped by questions of research data management. These debates around data have developed as an intensive dialogue amongst researchers working in humanities disciplines, and between humanities scholars and information scientists. As a result, they have primarily focused on two interlinked issues, data types (
Lavin 2021 ), concerning the special qualities of humanities data, and humanities data practices (Palmer / Neumann 2002), concerning the unique way humanities scholars recognize and use data (Borgman 2017; Leonelli 2015 ).  

 

Although we recognize the value of both approaches, we also recognize their respective limitations. With a focus on data type, for example, where data is often parsed as scholarly evidence (Gualandi et al. 2022; Mohr et al. 2015; Thoegersen 2018. ), it’s difficult to determine how scholars distinguish between data, research objects, and other information objects such as documents ( Hjørland,2019 ) and records (Caswell, Michelle, 2023). In the humanities, this distinction is particularly important for understanding how different disciplines carve up the relationship between data and sources, for example ( Lipartito, 2014 ), and increasingly important as more scholars adjust their research practices to work with large digital or data intensive collections, a scenario made increasingly prevalent in the Covid 19 health emergency ( Appleton 2021; Noehrer et al., 2021).

Similarly, while the focus on data type and data practice have emphasized the importance of context and variation, this approach often neglects questions of change—to practices, concepts, and norms—over time. These types of questions are particularly important when considering how scholarly dialogue innovates with historical and emerging research infrastructure and is particularly important in the long-term planning and development of large, scaled, data intensive research infrastructure.

In conclusion, we advocate for considering anew the uniqueness of data intensive scholarship in the humanities, particularly when compared to other histories of scholarship, such as the sciences. To appropriately understand the history and contemporary status of data intensive scholarship in the humanities, we advocate for models that help us to holistically understand the relationships between research practices, data types, and broader cultures of evidence and evaluation. Adopting a data cultures model has the potential to help us do this for the recent history of data intensive scholarship, where understanding the conditions for the emergence of ‘divergent data cultures’ has broad implications for the current shape of humanities research policy and practice.

Appendix A

Bibliography
  1. Agar, Jon (2006): “What difference did computers make?” in: Social Studies of Science , 36,6: 869-907.
  2. Almas, Bridget ( 2017): “Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities” in: Data Science Journal ,16: 19-19.
  3. Antonijević, Smiljana (2015): Amongst digital humanists: An ethnographic study of digital knowledge production . New York: Palgrave Macmillan.
  4. Antonijevic, Smiljana / Cahoy, Ellysa Stern (2018): “Researcher as Bricoleur: Contextualizing humanists' digital workflows” in: DHQ: Digital Humanities Quarterly , 12 , 3.
  5. Appleton, Leo (2021): “Accelerating the digital shift: how a global pandemic has created an environment for rapid change in academic libraries” in: New Review of Academic Librarianship , 27, 3:257-258.
  6. Cetina, Karin Knorr (1999): Epistemic cultures: How the Sciences Make Knowledge . Harvard University Press.
  7. Borgman, Christine L. (2017): Big data, little data, no data: Scholarship in the networked world. MIT press.
  8. Boyd, Ceilyn ( 2022): "Data as assemblage" in: Journal of Documentation , 78,6: 1338-1352. https://doi.org/10.1108/JD-08-2021-0159
  9. Caswell, Michelle (2023): “Against Archival Collections as Data” in: Chambers, S. “Position Statements: Collections as Data: State of the field and future directions”. Zenodo. doi: 10.5281/zenodo.7897735.
  10. Crymble, Adam (2021): Technology and the historian: transformations in the digital age (Vol. 1 ). Chicago:University of Illinois Press.
  11. De Rijke, Sarah / Bart Penders (2018). “Resist Calls for Replicability in the Humanities.” in: Nature 560, 29.
  12. Dombrowski, Quinn (2014): “What ever happened to Project Bamboo?” in: Literary and Linguistic Computing , 29, 3: 326-339.
  13. Drucker, Johanna (2011): “Humanities approaches to graphical display” in: Digital Humanities Quarterly , 5,1.
  14. Edmond, Jennifer / Horsley, Nicola/Lehmann, Jörg /Priddy, Mike (2021): The Trouble with Big Data: How Datafication Displaces Cultural Practices . Bloomsbury Academic.
  15. Greg, W.W . (1927): The calculus of variants: an essay on textual criticism . Clarendon Press.
  16. Gualandi, Bianca / Luca Pareschi / Silvio Peroni ( 2022): "What do we mean by “data”? A proposed classification of data types in the arts and humanities." in: Journal of Documentation 79,7: 51-71
  17. Hayot, Eric (2021): Humanist Reason: A History. An Argument. A Plan . Columbia University Press.
  18. Hjørland, Birger (2019) “Data (with big data and database semantics)” in: KO Knowledge Organization 45,8: 685-708.
  19. Kálmán, Tibor / Ďurčo, Matej / Fischer, Frank / Larrousse, Nicolas /Leone, Claudio / Mörth, Karlheinz / Thiel, Carsten (2019): “A landscape of data–working with digital resources within and beyond DARIAH.” in: International Journal of Digital Humanities , 1:113-131.
  20. Kaltenbrunner, Wolfgang (2017): “Digital infrastructure for the humanities in Europe and the US: Governing scholarship through coordinated tool development.” in: Computer Supported Cooperative Work (CSCW) , 26 : 275-308.
  21. Lavin, Matthew (2021):  “Why Digital Humanists Should Emphasize Situated Data over Capta.” Digit. Humanit. Q. , 15:13
  22. Leonelli, Sabina and Tempini, Niccolo, eds. (2020): Data journeys in the sciences . Springer Nature.
  23. Leonelli, Sabina (2015): “What counts as scientific data? A relational framework” in: Philosophy of Science 82,5: 810-821.
  24. Lepore, Jill . (2023): “The Data Delusion.” in: The New Yorker
  25. Lipartito, Kenneth (2014): “Historical sources and data” in: Buchell, Marcello/ R. Daniel Wadhwani (eds.), in: Organizations in time: History, theory, methods : 284-304.
  26. Marche, Stephen (2012): “Literature is not data: Against digital humanities” in: LA Review of Books , 28.
  27. Michel, Jean-Baptiste / Shen, Yuan Kui / Aiden, Aviva Presser / Veres, Adrian / Gray, Matthew K. / Google Books Team / Pickett, Joseph / Hoiberg, Dale / Clancy, Dale / Norvig Peter / Orwant, Jon (2011): “Quantitative analysis of culture using millions of digitized books.” in: Science , 331 ,6014:176-182.
  28. Mohr, Alica Hofelic / Bishoff, Josh / Bishoff, Carolyn / Braun, Steven / Storino Christine / Johnston, Lisa R. (2015): "When data is a dirty word: a survey to understand data management needs across diverse research disciplines." in: Bulletin of the Association for Information Science and Technology 42,1: 51-53.
  29. Niccolucci, Franco / Richards, Julian D . (2013): “ARIADNE: Advanced research infrastructures for archaeological dataset networking in Europe” International Journal of Humanities and Arts Computing , 7 ,1-2):70-88.
  30. Noehrer, Lukas / Gilmore, Abigail / Caroline, Jay / Yehudi, Yo (2021): “The impact of COVID-19 on digital data practices in museums and art galleries in the UK and the US.” in: Humanit Soc Sci Commun 8, 236.
  31. Nyhan, Julianne. and Flinn, Andrew (2016): Computation and the humanities: towards an oral history of digital humanities . Springer Nature.
  32. Oldman, Dominic (2021): “Digital research, the legacy of form and structure and the ResearchSpace system.” in: Golub, Koraljka / Liu, Ying-Hsang (eds.) Information and Knowledge Organisation in Digital Humanities (pp. 131-153). Routledge.
  33. Oliver, Gillian / Cranefield, Jocelyn / Lilley, Spencer / Lewellen, Matthew (2023): “Data Cultures: a scoping literature review” in: Information Research , 28,1:3-29.
  34. Palmer, Carol L / Neumann, Laura J . (2002): “The information work of interdisciplinary humanities scholars: Exploration and translation” in: The Library Quarterly , 72 ,1: 85-117.
  35. Pawlicka, Urszula (2017): “Data, collaboration, laboratory: Bringing concepts from science into humanities practice” in: English Studies 98,5:526-541.
  36. Peels, Rik (2019): “Replicability and replication in the humanities” Research Integrity and Peer Review , 4,1:1-12.
  37. Posner, Miriam (2015): "Humanities data: A necessary contradiction." Miriam Posner’s Blog 25. https://miriamposner.com/blog/humanities-data-a-necessary-contradiction/
  38. Ruediger, Dylan / MacDougall, Ruby (2023): Are the Humanities Ready for Data Sharing? Ithaka S+R. Last Modified 6 March 2023, https://www.jstor.org/stable/resrep49500
  39. Schöch, Christof (2013): “Big? Smart? Clean? Messy? Data in the Humanities?” in: Journal of the Digital Humanities , 2,3.
  40. Thoegersen, Jennifer L. (2018): "Yeah, I guess that's data": data practices and conceptions among humanities faculty” in: portal: Libraries and the Academy , 18,3:491-504.
  41. Underwood, Ted (2014): “Theorizing research practices we forgot to theorize twenty years ago.” In: Representations , 127 ,1:64-72.
  42. Waters, Donald J. (2023): “The emerging digital infrastructure for research in the humanities” in: International Journal on Digital Libraries , 24 ,2:87-102.
Nathan D. Woods (nathan.woods@uleth.ca), University of Lethbridge, Canada and Daniel Paul O'Donnell (daniel.odonnell@uleth.ca), University of Lethbridge, Canada and Barbara Bordalejo (barbara.bordalejo@uleth.ca), University of Lethbridge, Canada