Reinventing historical source criticism with style and culture. The evidential value of computational authorship analysis and its consequences for data culture in historical scholarship using the example of an alleged autobiography by Adolf Hitler from 1923

Using a case of contested authorship from the early Nazi period, this project aims to explore the benefits of data-driven source criticism while better understanding the implications of these methods for historical research, particularly with regard to the need to discuss and redefine its data culture.

1. A Case Study

In 1923, the recently founded Nazi Party expanded its reach and targeted national-conservatives to propagate its ideology. The idea of attracting authors anchored in this milieu was evident, which places the book 'Adolf Hitler. Sein Leben, seine Reden' (Adolf Hitler. His Life, His Speeches) in a prominent position for historical research. The 112-page book, featuring seven of Hitler's speeches and an eleven-page biography, was published under the name of the conservative author Adolf-Viktor von Koerber. Published in 70,000 copies, the book and especially the biography are crucial for understanding the early phase of National Socialism and the rise of Hitler. Yet, questions were raised about the authorship of this text.

1.1. Controversial Question

The authorship of the biography sparked debates in 2016/2017, with some researchers suggesting Hitler himself wrote it (Weber 2016), while others disagreed. A detailed rebuttal, relying on contextualizing the different claims of authorship and newly discovered sources, argued convincingly for von Koerber as the author (Meyer 2017). The article also included a cursory comparison of the content and style of the text, but its argument primarily relied on external, circumstantial arguments. The source itself, i.e., the biography's text, did not play an essential part in it, nor did its stylistic features in detail.

However, traditional source criticism, often reliant on circumstantial evidence and plausibility, can be enhanced with computational techniques like stylometric authorship analysis, providing a more text-immanent perspective, which is the starting point for our case study.

1.2. Data

Our analysis uses the disputed biography and four comparative corpora of contemporary texts similar in genre and topic. These include five works by Adolf-Viktor von Koerber (1917-1924), 42 essays and three memoranda by Hitler (1921-1924), and, to serve as control corpora, four publications by the later NSDAP chief ideologist Alfred Rosenberg (1920-1924), as well as excerpts from Hitler’s 'Mein Kampf' (1925/26).

1.3. Methods

To explore the question of authorship of this biography in a data-driven manner, we applied stylometry to investigate the biography’s authorship, analyzing stylistic similarities across texts. In our case study, we utilized three well-established computational methods ¹ :

Burrows' Delta, employing straightforward statistical measures to quantify the average difference in the standardized frequencies of the most frequently used words,
Principal Component Analysis (PCA), a dimension reduction algorithm reducing the number of variables while preserving the most significant stylistic differences between texts, and
Hierarchical Agglomerative Clustering (HAC) as an unsupervised learning method that performs cluster analysis to group texts based on their stylistic proximity.

1.4. Results

Utilizing the 75 most frequent words, all three methods conclusively attributed the text, based on the inherent textual properties, to Adolf-Viktor von Koerber, ruling out Hitler as the author. The use of computational methods provided a significant and potentially conclusive resolution to the question of Hitler’s (non-)authorship of his own biography. This approach enhances and validates prior conjectural and impression-based research with methodologically sound and reproducible analysis, demonstrating the worth of data-driven methods in source criticism, especially when circumstantial sources may be lacking.

2. … and new perspectives on Data Culture

However, this study not only underscores the importance of data-driven methods in historical source criticism but also prompts reconsideration of how these methods may be integrated into historians' daily practice, emphasizing a shift towards a more data-driven research paradigm and the necessity to discuss and define those practices in the context of a domain-specific data culture. This means legal questions, such as the use of copyrighted material for the analysis and its publication to serve as evidence for the study, but also ethical questions, such as whether this research justifies making sensitive data with National Socialist content accessible and visible. Finally, it also means questions about how to publish such studies relying on a historical narrative but also on data and code, up to the visual and interactive representation of the results and their interpretation.

These questions are at the centre of the German NFDI4Memory initiative (Paulmann et al. 2022) and its task area on Data Culture, which aims to discuss, define, and help shape the evolving role of data culture in data-driven historical research, based on concrete real-world examples such as this one.

Appendix A

Bibliography

Argamon, Shlomo (2008): “Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations”, in: Literary and Linguistic Computing 23, No. 3, pp. 131–147, doi.org/10.1093/llc/fqn003 .
Burrows, John (2002): “‘Delta’: a Measure of Stylistic Difference and a Guide to Likely Authorship”, in: Literary and Linguistic Computing 17, No. 3, pp. 267–287, doi.org/10.1093/llc/17.3.267 .
Jäckel, Eberhard / Kuhn, Axel (eds.) (1980): Hitler. Sämtliche Aufzeichnungen: 1905-1924 , Stuttgart.
Juola, Patrick (2006): „Authorship Attribution“, in: Foundations and Trends in Information Retrieval 1/3, pp. 233-334, doi.org/10.1561/1500000005 .
Karsdorp, Folgert et al. (2021), ‘Stylometry and the Voice of Hildegard — Humanities Data Analysis: Case Studies with Python’, in: Humanities Data Analysis. Case Studies with Python, Princeton, pp. 248–80, https://www.humanitiesdataanalysis.org/stylometry/notebook.html .
Meyer, Winfried (2017): „Eine Autobiografie Hitlers aus dem Jahr 1923? Kritische Sichtung einer vermeintlichen Entdeckung“, in: Zeitschrift für Geschichtswissenschaft 65, 3, pp. 213-235.
Paulmann, Johannes et al. (2022): NFDI4Memory. Consortium for the historically oriented humanities. Proposal for the National Research Data Infrastructure (NFDI) . Zenodo. doi.org/10.5281/zenodo.7428489 .
Stamatatos, Efstathios (2009): “A survey of modern authorship attribution methods”, in: Journal of the Association for Information Science and Technology 60/3, pp. 538-556, doi.org/10.1002/asi.21001 .
Weber, Thomas (2016): Wie Adolf Hitler zum Nazi wurde. Vom unpolitischen Soldaten zum Autor von „Mein Kampf“ , Berlin.

We mainly followed the approach of Karsdorp et al. 2021: 248–80.