Linked data in the Cold War archive

2. Motivation and context

Our broader project responds to a need among Cold War researchers for a comprehensive view of large-scale networks and lines of influence. Research into the cultural Cold War has burgeoned since the late nineties (Deery 1997, Saunders 1999, Aldrich 2001, Shaw 2001, Wilford 2003, Hammond 2006, Wilford 2008, Piette 2009, Rubin 2012, Smith 2013, Davies 2013, Wilford 2013, Barnhisel 2015, Davis 2020, Wilford 2024). Most studies, however, centre on individual cases and narratives. Saunders’s exposé Who Paid the Piper? tells the story of a single agency, the CIA-backed Congress for Cultural Freedom (Saunders 1999). Adam Piette’s The Literary Cold War organises its chapters around about a dozen individual literary figures grouped thematically (Piette 2009). Greg Barnhisel’s Cold War Modernists approaches its subject via book programs, publishing history, and literary magazines like Encounter and Perspectives USA (Barnhisel 2015). A large-scale graph visualisation would allow researchers to contextualise individual narratives more comprehensively and to discover previously unknown nodes of interest.

Our work joins a wider discourse around the efficacies of linked open data, semantic web technologies, and graph analysis for literary and cultural research. Here we note several recent and ongoing projects interested in metadata aggregation, network analysis, and knowledge graph technology for the heritage sector (e.g., Ahnert et al. 2020, Ahnert and Ahnert 2023, Hotson and Walling 2019, Kerschbaumer 2020, Collar 2022, Winters et al. 2022, Golub and Liu 2022, Nurmikko-Fuller 2023, Hannaford et al. 2024, Halsey and Sangster 2020–2023, and Towsey et al. 2019–2024). We also note a robust community of researchers and developers engaged in building linked data and knowledge graph technologies for the cultural heritage sector: e.g., Pompeii Linked Open Data (Heath, 2024); Linked Data from TEI (Giovannetti and Tomasi 2022); Façade-X and SPARQL Anything (Aspirino et al. 2023); Microsoft Academic Knowledge Graph (Farber and Ao 2022); Deep Graph Library (Wang et al. 2019); and gBuilder (Li and Zhou 2023).

Within this wider context, the novelty of our project comes in part from the fact that Cold War power networks were programmatically occluded by governmental agencies executing initiatives in cultural diplomacy. Obscurity in historical fact has led to obscurity within the archival record itself. By improving discoverability and interpretability for declassified Cold War documents, we make large-scale networks of influence newly legible.

3. Method

The present case study draws on declassified archival material related to Orwell to demonstrate the value of a wider effort to use graph theory and network analysis to understand obscure relationships between state agencies and twentieth-century writers. Our primary archive consisted of Orwell’s notorious Information Research Department (IRD) snitch list of ‘cryptos’ and alleged Stalinists as well as his MI5 and Special Branch files (TNA: FO 1110/189, TNA: KV 2/2699, TNA: MEPO 38/69). We supplemented our source materials with a selection of secondary literature outlining Orwell’s known and suspected links to the British government (Deery 1997, Aldrich 2001, Shaw 2001).

To ensure quality within the limited dataset of the case study, we employed manual human reading to extract named entities, which were then documented in XML using the Encoded Archival Context–Corporate Bodies, Persons, and Families (EAC–CPF) schema (Society of American Archivists TS-EAS 2022). We combined our manually extracted “covert” dataset with open access data from the Social Networks and Archival Context (SNAC) Cooperative (Pitti et al. 2015) to produce graph objects mapping Orwell’s publicly available and covert connections. We visualised our graph objects in Gephi, manually colouring nodes according to whether they derive from SNAC data or from declassified sources. We then exported the graphs in JSON and represented them as 3D force graphs using ThreeJS/WebGL (see https://github.com/vasturiano/3d-force-graph). Interactive visualisations are available at https://krmuth.github.io/orwell.node/.

4. Results

From the available SNAC data, we begin with a relation constellation consisting of 51 entities associated with Orwell (figure 1). We did not at this time look for further connections among the first-order connections. We then incorporate the named entities extracted from our declassified documents, adding a further 44 nodes and 45 edges for a graph of 96 nodes (figure 2). The associations we identify in our declassified test case archive nearly double Orwell’s first-degree relation constellation as mapped by the SNAC Cooperative. From here, we can begin to build out the graph by adding relation constellations for known entities. In this case, we include the SNAC relation constellations for two figures with known ties to the CIA activities, Stephen Spender, editor of the covertly funded literary magazine Encounter from 1953 to 1967, and Arthur Koestler, who consulted for both the IRD and the CIA on cultural matters (figure 3).

This triad begins to divulge instances of potential document or entity discovery that could be exploited by specialist researchers in literary studies or intelligence history. We note, for example (figure 4), the proximity of The New Yorker to entities with known or suspected links to covert state activities, the Congress for Cultural Freedom (CIA funded), Encounter (CIA funded), and Horizon (possible intelligence links). The location of a node within a particular set of constellations does not necessarily implicate the named entity in covert cultural diplomacy. It does, however, suggest an entity may be of interest as a previously unacknowledged recipient of state support or as an object with ideological sympathies to instruments of state influence.

Finally, to demonstrate potential scalability for our broader project, we expand our case study network one more time by incorporating SNAC constellations for each neighbour node in the original graph of 96 nodes to produce a network of 3655 nodes and 4044 edges (figure 5). We filter the network to show only nodes with a degree or 2 or greater, resulting in a final graph of 347 nodes and 737 edges (figure 6). Fuller statistical analysis of this graph object is needed. Nevertheless, it points toward the generative potential in combining publicly available linked data with networks extracted from declassified documents to reveal large-scale patterns and relationships.

Figure 1. Screenshot of SNAC relation constellation (blue) for George Orwell (yellow)

Figure 2. Screenshot of SNAC relation constellation (blue) for George Orwell plus named entities extracted from declassified documents (yellow/red)

Figure 3. Screenshot of Orwell’s SNAC and covert relation constellation with two additional nodes: Stephen Spender (green) and Arthur Koestler (red)

Figure 4. Screenshot detail of Orwell’s SNAC and covert relation constellation with two additional nodes (Spender and Koestler) highlighting the location of The New Yorker magazine in the neighbourhood of three covert (or suspected) enterprises

Figure 5. Screenshot of Orwell’s SNAC and covert relation constellation expanded to include SNAC relation constellations for all named entities in the SNAC database

Figure 6. Screenshot of Orwell’s SNAC and covert relation constellation expanded to include SNAC relation constellations for all named entities in the SNAC database and filtered to show only entities degree 2 or greater

Appendix A

Bibliography

Ahnert, Ruth et al. (2020): The Network Turn: Changing Perspectives in the Humanities. Cambridge: Cambridge University Press.
Ahnert, Ruth / Ahnert, Sebastian E. (2023): Tudor Networks of Power. Oxford: Oxford University Press.
Aldrich, Richard (2001): The Hidden Hand: Britain, America, and Cold War Secret Intelligence. London: John Murray Press.
Asprino, Luigi et al. (2023): “Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web,” in: ACM Transactions on Internet Technology 23, 1 <https://doi.org/10.1145/3555312>.
Barnhisel, Greg (2015): Cold War Modernists: Art, Literature, and American Cultural Diplomacy. New York: Columbia University Press.
Bekiari, Chryssoula et al. (eds.) (2024): Volume A: Definition of the CIDOC Conceptual Reference Model Version 7.1.3. <https://www.cidoc-crm.org/sites/default/files/cidoc_crm_version_7.1.3.pdf>.
Boon, Tim (2022): “Origins and Ambitions of the Congruence Engine Project,” in: Science Museum Group Journal 18 <https://dx.doi.org/10.15180/221801>.
Clavaud, Florence / Francart, Thomas / Charbonnier, Pauline (2023): “Ric-O Converter: A Software to Convert EAC-CPF and EAD 2002 XML Files to RDF Datasets Conforming to Records in Contexts Ontology,” in: ACM Journal on Computing and Cultural Heritage 16, 3: 1–13 <https://doi.org/10.1145/3583592>.
Collar, Anna (ed.) (2022). Networks and the Spread Of Ideas in the Past: Strong Ties, Innovation and Knowledge Exchange. London: Routledge.
Davies, Sarah (2013): “The Soft Power of Anglia: British Cold War Cultural Diplomacy in the USSR,” in: Contemporary British History 27, 3: 297-323.
Davis, Caroline (2020): African Literature and the CIA: Networks of Authorship and Publishing. Cambridge: Cambridge University Press.
Deery, Phillip (1997): “Confronting the Cominform: George Orwell and the Cold War Offensive of the Information Research Department, 1948-50,” in: Labour History 73: 219-225.
Färber, Michael / Ao, Lin (2022): “The Microsoft Academic Knowledge Graph Enhanced: Author Name Disambiguation, Publication Classification, and Embeddings,” in: Quantitative Science Studies 22, 1: 51–98 <https://doi.org/10.1162/qss_a_00183>.
Giovannetti, Francesca / Tomasi, Francesca (2022): “Linked Data from TEI (LIFT): A Teaching Tool for TEI to Linked Data Transformation,” in: Digital Humanities Quarterly 16, 2 <https://www.digitalhumanities.org/dhq/vol/16/2/000605/000605.html>.
Golub, Koraljka / Liu, Ying-Hsang. (eds.) (2021): Information and Knowledge Organisation in Digital Humanities: Global Perspectives. London: Routledge.
Halsey, Katie / Sangster, Matthew (2020–2023). Books and Borrowing, 1750–1830. AHRC Reference: AH/T003960/1 <https://borrowing.stir.ac.uk/>.
Hammond, Andrew (ed.) (2006): Cold War Literature: Writing the Global Conflict. London: Routledge.
Hannaford, Ewan D. et al. (2024): “Our Heritage, Our Stories: Developing AI Tools to Link and Support Community-Generated Digital Cultural Heritage,” in: Journal of Documentation <https://doi.org/10.1108/JD-03-2024-0057>.
Heath, Sebastian (2024): “Moving Forward with Linked Open Data at Pompeii,” in: Institute for the Study of the Ancient World (ISAW) Library Blog <https://isaw.nyu.edu/library/blog/pompeii-lod-part-one>.
Hotson, Howard / Wallnig, Thomas (eds.) (2019): Reassembling the Republic of Letters in the Digital Age: Standards, Systems, Scholarship. Göttingen: Universitätsverlag Göttingen.
Kerschbaumer, Florian et al. (eds.) (2020): The Power of Networks: Prospects of Historical Network Research. London: Routledge.
Li, Yanzeng / Zou, Lei (2023): “gbuilder: A Scalable Knowledge Graph Construction System for Unstructured Corpus,” in: arXiv preprint <https://doi.org/10.48550/arXiv.2208.09705>.
Nurmikko-Fuller, Terhi (2023): Linked Data for Digital Humanities. London: Routledge.
Piette, Adam (2009): The Literary Cold War: 1945 to Vietnam. Edinburgh: Edinburgh University Press.
Pitti, Daniel et al. (2015): “Social Networks and Archival Context: from Project to Cooperative Archival Program,” in: Journal of Archival Organization 12, 1–2: 77–97 <https://doi.org/10.1080/15332748.2015.999544>.
Pitti, Daniel / Stockting, Bill / Clavaud, Florence (2016): “Records in Contexts (RiC): a standard for archival description developed by the ICA Experts Group on Archival Description,” in: ICA Seoul Congress 2016, Paris, France, 8 September.
Rubin, Andrew (2012): Archives of Authority: Empire, Culture, and the Cold War. Princeton, N.J.: Princeton University Press.
Saunders, Frances Stonor (1999): Who Paid the Piper? The CIA and the Cultural Cold War. London: Granta Books, 1999.
Shaw, Tony (2001): British Cinema and the Cold War: The State, Propaganda and Consensus. London: I.B. Tauris.
Smith, James (2013): British Writers and MI5 Surveillance 1930–1960. Cambridge: Cambridge University Press.
Society of American Archivists Technical Subcommittee on Encoded Archival Standards (TS-EAS) (2022): EAC-CPF 2.0 <https://eac.staatsbibliothek-berlin.de/schemata-and-tag-library/>.
The National Archives of the UK (TNA): FO 1110/189, Meeting with George Orwell; progress report from RIO(S); paper on Communist Strategy in South East Asia, 1949.
The National Archives of the UK (TNA): KV 2/2699, George ORWELL alias Eric Arthur BLAIR: British, 10 January 1929 – 17 November 1952.
The National Archives of the UK (TNA): MEPO 38/69, Special Branch file on Eric Blair alias George Orwell, author and journalist, 1 January 1936 – 31 December 1977.
Towsey, Mark et al. (2019–2024): Libraries, Reading Communities and Cultural Formation in the Eighteenth-Century Atlantic. AHRC Reference: AH/S007083/1 <https://c18librariesonline.org/>.
Wang, Minjie et al. (2019): “Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks,” in: arXiv preprint <https://doi.org/10.48550/arXiv.1909.01315>.
Wildi, Tobias / Dubois, Alain (2019): “The Matterhorn RDF data model: formalising archival metadata with SHACL,” in: 16th International Conference on Digital Preservation (iPRES 2019), Amsterdam, The Netherlands <https://doi.org/10.17605/OSF.IO/EGCHJ>.
Wilford, Hugh (2003): The CIA, the British Left, and the Cold War: Calling the Tune? London: Frank Cass.
Wilford, Hugh (2008): The Mighty Wurlitzer: How the CIA Played America. Cambridge, Mass.: Harvard University Press.
Wilford, Hugh (2013): America’s Great Game: The CIA’s Secret Arabists and the Shaping of the Modern Middle East. New York: Basic Books.
Wilford, Hugh (2024): The CIA: An Imperial History. New York: Basic Books.
Winters, Jane et al. (2022): “Heritage Connector: A Towards a National Collection Foundation Project Final Report,” in: Zenodo <https://doi.org/10.5281/zenodo.6022678>.

Linked data in the Cold War archive

1. Introduction

2. Motivation and context

3. Method

4. Results

5. Conclusion

Appendix A