Since the millennial turn, access to new archives has invigorated scholarship on the cultural Cold War, but archival information silos and the conflict’s global scope prevent researchers from viewing large-scale networks and relationships. “Linked data in the Cold War archive” describes an early proof-of-concept for a substantive effort to link archival authority records documenting how British and American literary figures were implicated in covert propaganda networks during WWII and the Cold War era.
Drawing on declassified archival material produced by government agencies such as MI5, MI6, the British Foreign Office, the CIA, and the US State Department, we extract and graph relationships among literary figures (writers, agents, publishers, editors, translators) and non-profit or commercial enterprises covertly funded by the Cold War state. Our work will enrich contextual metadata about literary actors and institutions while increasing discoverability and interpretability for government document collections. The present paper outlines a proof of concept focusing on the British novelist George Orwell.
Our broader project responds to a need among Cold War researchers for a comprehensive view of large-scale networks and lines of influence. Research into the cultural Cold War has burgeoned since the late nineties (Deery 1997, Saunders 1999, Aldrich 2001, Shaw 2001, Wilford 2003, Hammond 2006, Wilford 2008, Piette 2009, Rubin 2012, Smith 2013, Davies 2013, Wilford 2013, Barnhisel 2015, Davis 2020, Wilford 2024). Most studies, however, centre on individual cases and narratives. Saunders’s exposé Who Paid the Piper? tells the story of a single agency, the CIA-backed Congress for Cultural Freedom (Saunders 1999). Adam Piette’s The Literary Cold War organises its chapters around about a dozen individual literary figures grouped thematically (Piette 2009). Greg Barnhisel’s Cold War Modernists approaches its subject via book programs, publishing history, and literary magazines like Encounter and Perspectives USA (Barnhisel 2015). A large-scale graph visualisation would allow researchers to contextualise individual narratives more comprehensively and to discover previously unknown nodes of interest.
Our work joins a wider discourse around the efficacies of linked open data, semantic web technologies, and graph analysis for literary and cultural research. Here we note several recent and ongoing projects interested in metadata aggregation, network analysis, and knowledge graph technology for the heritage sector (e.g., Ahnert et al. 2020, Ahnert and Ahnert 2023, Hotson and Walling 2019, Kerschbaumer 2020, Collar 2022, Winters et al. 2022, Golub and Liu 2022, Nurmikko-Fuller 2023, Hannaford et al. 2024, Halsey and Sangster 2020–2023, and Towsey et al. 2019–2024). We also note a robust community of researchers and developers engaged in building linked data and knowledge graph technologies for the cultural heritage sector: e.g., Pompeii Linked Open Data (Heath, 2024); Linked Data from TEI (Giovannetti and Tomasi 2022); Façade-X and SPARQL Anything (Aspirino et al. 2023); Microsoft Academic Knowledge Graph (Farber and Ao 2022); Deep Graph Library (Wang et al. 2019); and gBuilder (Li and Zhou 2023).
Within this wider context, the novelty of our project comes in part from the fact that Cold War power networks were programmatically occluded by governmental agencies executing initiatives in cultural diplomacy. Obscurity in historical fact has led to obscurity within the archival record itself. By improving discoverability and interpretability for declassified Cold War documents, we make large-scale networks of influence newly legible.
The present case study draws on declassified archival material related to Orwell to demonstrate the value of a wider effort to use graph theory and network analysis to understand obscure relationships between state agencies and twentieth-century writers. Our primary archive consisted of Orwell’s notorious Information Research Department (IRD) snitch list of ‘cryptos’ and alleged Stalinists as well as his MI5 and Special Branch files (TNA: FO 1110/189, TNA: KV 2/2699, TNA: MEPO 38/69). We supplemented our source materials with a selection of secondary literature outlining Orwell’s known and suspected links to the British government (Deery 1997, Aldrich 2001, Shaw 2001).
To ensure quality within the limited dataset of the case study, we employed manual human reading to extract named entities, which were then documented in XML using the Encoded Archival Context–Corporate Bodies, Persons, and Families (EAC–CPF) schema (Society of American Archivists TS-EAS 2022). We combined our manually extracted “covert” dataset with open access data from the Social Networks and Archival Context (SNAC) Cooperative (Pitti et al. 2015) to produce graph objects mapping Orwell’s publicly available and covert connections. We visualised our graph objects in Gephi, manually colouring nodes according to whether they derive from SNAC data or from declassified sources. We then exported the graphs in JSON and represented them as 3D force graphs using ThreeJS/WebGL (see https://github.com/vasturiano/3d-force-graph). Interactive visualisations are available at https://krmuth.github.io/orwell.node/.
From the available SNAC data, we begin with a relation constellation consisting of 51 entities associated with Orwell (figure 1). We did not at this time look for further connections among the first-order connections. We then incorporate the named entities extracted from our declassified documents, adding a further 44 nodes and 45 edges for a graph of 96 nodes (figure 2). The associations we identify in our declassified test case archive nearly double Orwell’s first-degree relation constellation as mapped by the SNAC Cooperative. From here, we can begin to build out the graph by adding relation constellations for known entities. In this case, we include the SNAC relation constellations for two figures with known ties to the CIA activities, Stephen Spender, editor of the covertly funded literary magazine Encounter from 1953 to 1967, and Arthur Koestler, who consulted for both the IRD and the CIA on cultural matters (figure 3).
This triad begins to divulge instances of potential document or entity discovery that could be exploited by specialist researchers in literary studies or intelligence history. We note, for example (figure 4), the proximity of The New Yorker to entities with known or suspected links to covert state activities, the Congress for Cultural Freedom (CIA funded), Encounter (CIA funded), and Horizon (possible intelligence links). The location of a node within a particular set of constellations does not necessarily implicate the named entity in covert cultural diplomacy. It does, however, suggest an entity may be of interest as a previously unacknowledged recipient of state support or as an object with ideological sympathies to instruments of state influence.
Finally, to demonstrate potential scalability for our broader project, we expand our case study network one more time by incorporating SNAC constellations for each neighbour node in the original graph of 96 nodes to produce a network of 3655 nodes and 4044 edges (figure 5). We filter the network to show only nodes with a degree or 2 or greater, resulting in a final graph of 347 nodes and 737 edges (figure 6). Fuller statistical analysis of this graph object is needed. Nevertheless, it points toward the generative potential in combining publicly available linked data with networks extracted from declassified documents to reveal large-scale patterns and relationships.
Figure 1. Screenshot of SNAC relation constellation (blue) for George Orwell (yellow)
Figure 2. Screenshot of SNAC relation constellation (blue) for George Orwell plus named entities extracted from declassified documents (yellow/red)
Figure 3. Screenshot of Orwell’s SNAC and covert relation constellation with two additional nodes: Stephen Spender (green) and Arthur Koestler (red)
Figure 4. Screenshot detail of Orwell’s SNAC and covert relation constellation with two additional nodes (Spender and Koestler) highlighting the location of The New Yorker magazine in the neighbourhood of three covert (or suspected) enterprises
Figure 5. Screenshot of Orwell’s SNAC and covert relation constellation expanded to include SNAC relation constellations for all named entities in the SNAC database
Figure 6. Screenshot of Orwell’s SNAC and covert relation constellation expanded to include SNAC relation constellations for all named entities in the SNAC database and filtered to show only entities degree 2 or greater
We have shown how network analysis might produce novel understandings of covert cultural intelligence by making structures of influence more legible. We have also demonstrated how specialist knowledge can augment ongoing linked data initiatives like the SNAC Cooperative. Sample visualisations of our proof-of-concept test case show the potential usefulness of our project for researchers specialising in Cold War literary and intelligence history. Future work will pursue three main aims: (1) scalability through existing digital collections of British and American declassified documents; (2) fuller statistical analysis of the resulting graph objects; and (3) ensuring RDF compliance and interoperability with existing standards and conceptual models such as Records in Contexts (Pitti et al. 2016; Clavaud, Francart and Charbonnier 2023), Matterhorn (Wildi and Dubois 2019), and CIDOC CRM (Bekiari et al. 2024).