Using a case of contested authorship from the early Nazi period, this project aims to explore the benefits of data-driven source criticism while better understanding the implications of these methods for historical research, particularly with regard to the need to discuss and redefine its data culture.
In 1923, the recently founded Nazi Party expanded its reach and targeted national-conservatives to propagate its ideology. The idea of attracting authors anchored in this milieu was evident, which places the book 'Adolf Hitler. Sein Leben, seine Reden' (Adolf Hitler. His Life, His Speeches) in a prominent position for historical research. The 112-page book, featuring seven of Hitler's speeches and an eleven-page biography, was published under the name of the conservative author Adolf-Viktor von Koerber. Published in 70,000 copies, the book and especially the biography are crucial for understanding the early phase of National Socialism and the rise of Hitler. Yet, questions were raised about the authorship of this text.
The authorship of the biography sparked debates in 2016/2017, with some researchers suggesting Hitler himself wrote it (Weber 2016), while others disagreed. A detailed rebuttal, relying on contextualizing the different claims of authorship and newly discovered sources, argued convincingly for von Koerber as the author (Meyer 2017). The article also included a cursory comparison of the content and style of the text, but its argument primarily relied on external, circumstantial arguments. The source itself, i.e., the biography's text, did not play an essential part in it, nor did its stylistic features in detail.
However, traditional source criticism, often reliant on circumstantial evidence and plausibility, can be enhanced with computational techniques like stylometric authorship analysis, providing a more text-immanent perspective, which is the starting point for our case study.
Our analysis uses the disputed biography and four comparative corpora of contemporary texts similar in genre and topic. These include five works by Adolf-Viktor von Koerber (1917-1924), 42 essays and three memoranda by Hitler (1921-1924), and, to serve as control corpora, four publications by the later NSDAP chief ideologist Alfred Rosenberg (1920-1924), as well as excerpts from Hitler’s 'Mein Kampf' (1925/26).
To explore the question of authorship of this biography in a data-driven manner, we applied stylometry to investigate the biography’s authorship, analyzing stylistic similarities across texts. In our case study, we utilized three well-established computational methods 1 :
Utilizing the 75 most frequent words, all three methods conclusively attributed the text, based on the inherent textual properties, to Adolf-Viktor von Koerber, ruling out Hitler as the author. The use of computational methods provided a significant and potentially conclusive resolution to the question of Hitler’s (non-)authorship of his own biography. This approach enhances and validates prior conjectural and impression-based research with methodologically sound and reproducible analysis, demonstrating the worth of data-driven methods in source criticism, especially when circumstantial sources may be lacking.
However, this study not only underscores the importance of data-driven methods in historical source criticism but also prompts reconsideration of how these methods may be integrated into historians' daily practice, emphasizing a shift towards a more data-driven research paradigm and the necessity to discuss and define those practices in the context of a domain-specific data culture. This means legal questions, such as the use of copyrighted material for the analysis and its publication to serve as evidence for the study, but also ethical questions, such as whether this research justifies making sensitive data with National Socialist content accessible and visible. Finally, it also means questions about how to publish such studies relying on a historical narrative but also on data and code, up to the visual and interactive representation of the results and their interpretation.
These questions are at the centre of the German NFDI4Memory initiative (Paulmann et al. 2022) and its task area on Data Culture, which aims to discuss, define, and help shape the evolving role of data culture in data-driven historical research, based on concrete real-world examples such as this one.
We mainly followed the approach of Karsdorp et al. 2021: 248–80.