Reinventing (Literary) Operationalization

1. Introduction

Advances in the digital humanities often involve reinventing methodologies and informal humanities practices in a more structured, computational and formal way, using mathematics, statistics and computer science. The aim is to make workflows and research more tractable. While toolkits and libraries that support these goals can be found in almost all fields, one exception lies at the heart of humanistic thinking: conceptual work. Conceptual work in the Digital Humanities is often carried out as operationalization of the concepts in question (cf. Pichler / Reiter 2022).

In this contribution, we introduce KatKit, a Python toolkit designed to fill this gap. KatKit allows the modeling of humanistic concepts within a Jupyter notebook, enabling the documentation of theoretical assumptions and operationalizations alongside data analysis.

2. Operationalization

Operationalization as a task takes a theoretical concept and transforms it into a set of application conditions which enable classifications and measurements. Due to the complex nature of humanistic concepts, which are open to interpretation and context dependent, it is necessary to explicate the concept in question according to the intended use. This procedure will result in elements (objects and relations between them) that are the basis of a model, which serves as the conduit between theoretical concepts and empirical data. This conceptual model acts as the link between the general terms of the theory and the specific textual phenomena, which cannot be directly assigned to the theoretical terms.

In text-based digital humanities, the process of creating a conceptual model oftentimes corresponds to the construction of category systems, such as tagsets and annotation guidelines. Although there are aids for this step (cf. Reiter 2020, Pichler / Reiter 2020), this process tends to not be clearly structured or guided by coherent principles.

3. Using Applied Category Theory

In order to provide more structure to the operationalization process, KatKit introduces principles derived from applied category theory (ACT). ACT can be understood as a mathematical framework intended to help think clearly about complex matters. In ACT everything is either an object (an idea, a thing, a word, a concept, a number, etc.) or a relationship between objects: A category C consists of a collection of objects ob( C) and the relations (or morphisms) mor( C) between them. These simple elements are composable, i.e. they can be combined in various ways, subsequently making formal definitions – or re-definitions of concepts – possible (see e.g. practical examples in Spivak 2014). ¹

One such application of category theory in scientific contexts is called ontology logs (oLogs, Spivak 2014). The idea behind it is to help create formal ontologies from observations and verbal theories. This account is the basis of the KatKit-toolbox, which provides a Jupyter Notebook for documenting the operationalization process and providing support to visualizing and modeling data structures. Using Jupyter Notebooks for sketching helps bridging the gap between humanistic conceptual work and coding.

4. Example: Operationalization events

We will use the EvENT concept (Vauth / Gius 2022), as an example to demonstrate the application of the KatKit-toolbox. The EvENT project employed minimal sentence annotations to model plot by differentiating between changes of state, processes, stative events, and non-events.

As the guidelines are structured as a decision tree, it is easy to extract functional relationships and create oLogs that match the information in a minimal sentence/that match the modeled circumstances on the discourse level (fig. 1). Due to the way oLogs are conceptualized, they can be mapped to natural language sentences. Therefore, when transcribing an oLog for a change of state or defining the concept, it can be described as a: “a minimal sentence unit with a verb that has an in-world reference and a temporal occurrence, where the in-world reference affects an entity’s property”.

Fig 1. The four tags in the EvENT concept as oLogs. a. non-event, b. stative event, c. process, d. change of state.

Having built a model of the EvENT concept with functional relationships, our next objective is to integrate it into a category theoretical structure: a preorder. In a preorder, objects are organized in a way where if an object A is related to an object B which is related to an object C, we can directly infer information about object C from object A (transitivity). To achieve this, we create an event concept with three components: starting point (S), end point (E), and a process (P) connecting them. A complete event is represented as SPE and a 'non-event' is denoted as •••. We define the concept as a preorder, as shown in Figure 2's Hasse-diagram. With preorders satisfying transitivity (and reflexivity), no event type in the taxonomy can exist without its logical relation to the entire taxonomy. We can think of defining the event types as building on the ‘non-event’ (•••), by subsequently adding one (e.g. A••), two (AP•) or three components (complete event, APE). And e.g. how we define an event type at the stage of two components is defined by how an event type at the stage of single components is defined (e.g. defining the event type AP• has to build on the definition of A•• and •P• and vice-versa). For more information on conceptualizing tagsets in this manner, e.g. for the annotation using CATMA, see Sperberg-McQueen and Huitfeld (2023).

Fig 2. Hasse diagram of the event concept as a preorder.

Now to demonstrate how this reformulation might aid in analyzing text and building categories for text analysis: For example, changes of state are defined as alterations to the physical or mental states of animate or inanimate entities, such as "Gregor Samsa one morning from uneasy dreams awoke." (Gius / Vauth 2021) We can extract a concept of an endpoint (the altered physical or mental condition of an object) and a process (the alteration itself) from this context. Examining Figure 2, this definition could apply to the •PE node. To define the SPE node, without considering the dependence on other nodes, we simply need to include a definition of a starting point in this description. Example: "Physical or mental state changes of animate or inanimate entities, where we can differentiate between start and end point." We need to examine each node in the lattice to find definitions that entail its properties and are practical for the annotation process. This example demonstrated how this workflow might allow the re-evaluation of such guidelines to cover phenomena more broadly and/or concrete.

5. Why use the KatKit toolkit for ‘category theorizing’ literary concepts?

The KatKit toolkit supports conceptual modeling in two main ways. First, utilizing an oLog-class, we can effortlessly outline oLogs by defining a starting aspect and then progressively adding additional aspects and facts, with the flexibility to choose which aspect to append a new aspect. Second, the definition of oLogs is structured to easily convert them into English phrases (see Spivak 2014).

event = new oLog(“verb phrase”, p)

event.add (“in-word reference”, r, p)

even.add(“temporal occurrence, t, p)

print(event)

“A/an verb phrase p, with a/an in-world reference r, with a/an temporal occurrence t”.

For information on the benefits of defining oLogs as a data structure and their potential use as databases, see Fong and Spivak (2019). In the future, we will investigate the utilization of databases structured after oLogs.

For example, we could use the inputs of the oLogs as possible states to construe a model, as demonstrated previously.

event_new = new Concept(“S”,”P”,”E”)

event_new.as_hasse()

This would return fig. 2. Further use of this data structure would be the possibility to change the denotation of the nodes or recording data for each node.

6. Conclusion

The KatKit toolkit helps integrate informal humanities activities, like conceptual modeling as a core activity in operationalization within digital humanities workflows. This allows for easy documentation, visualization, and reuse of resulting data structures for analysis. We hope for the KatKit toolbox to be a tool for good thinking and sound operationalization, which can aid at the start of the operationalization workflow as well as allow for comparison with or mapping to already established standards, ontologies, etc.

7. Data availability

The Jupyter Notebook of the implementation will be published at: https://github.com/forTEXT/katkit_toolbox.

Appendix A

Bibliography

Fong, Brendan / Spivak, David (2019): An Invitation to Applied Category Theory: Seven
Sketches in Compositionality. Cambridge; New York, NY: Cambridge University Press.
Gius, Evelyn / Vauth, Michael (2022): “Towards an Event Based Plot Model. A
Computational Narratology Approach”, in: Journal of Computational Literary Studies 1, 1.
DOI: 10.48694/jcls.110.
Pichler, Axel / Reiter, Nils (2020): “Reflektierte Textanalyse”, in: Reiter, Nils / Pichler, Axel / Kuhn, Jonas (eds.): Reflektierte algorithmische Textanalyse: Interdisziplinäre(s) Arbeiten in der CRETA-Werkstatt. Berlin, Boston: De Gruyter 43–60. DOI: 10.1515/9783110693973-003.
Pichler, Axel / Reiter, Nils (2022): “From Concepts to Texts and Back: Operationalization as a Core Activity of Digital Humanities”, in: Journal of Cultural Analytics 7, 4. DOI: 10.22148/001c.57195.
Reiter, Nils (2020): “Anleitung zur Erstellung von Annotationsrichtlinien”, in: Reiter, Nils / Pichler, Axel / Kuhn, Jonas (eds.): Reflektierte algorithmische Textanalyse: Interdisziplinäre(s) Arbeiten in der CRETA-Werkstatt. Berlin, Boston: De Gruyter 193–202. DOI: 10.1515/9783110693973-009.
Sperberg-McQueen, C.M. / Huitfeld, Claus (2023): “Are Ontologies Trees or Lattices?”, in: Digital Humanities Quarterly 17, 3. http://www.digitalhumanities.org/dhq/vol/17/3/000725/000725.html.
Spivak, David (2014): Category Theory for the Sciences. Cambridge, Massachusetts: The
MIT Press.
Vauth, Michael / Gius, Evelyn (2021): Richtlinien für die Annotation narratologischer
Ereigniskonzepte. Zenodo. DOI: 10.5281/zenodo.5078175.

In the terminology of programming language theory objects correspond to types and morphisms to functions.