Machine learning technology provides scholars with powerful tools for constructing cultural heritage datasets. However, scholars now generally appreciate that the humans who build these systems and the complex algorithms involved can introduce biases (LC Labs, 2020, Hua, 2023). Consequently, scholars need to understand how these systems work before they can use their outputs. When scholars do understand, then scholars can incorporate the particular outputs of machine learning tools into their own scholarship, but the systems themselves can also provide scholars with new ways of understanding their subjects.
Machine learning encompasses a variety of methods whose internal workings can be difficult to grasp, such as the so-called ‘black-box’ methods like neural-networks, but also methods that are relatively easy to explain and interpret, such as decision trees. ‘Black-box’ methods are general regarded as more accurate, but in a cultural heritage context is the additional accuracy an acceptable trade off for the obscurity of the decision making process? Clarity can be extremely valuable when the computer is asked to perform a scholarly significant task such as dating artifacts (Brickler 2021) like medieval seal matrices.
During the Middle Ages, people used seals to authenticate, validate and securely close documents, but as seals present words and images, people could also use them to make statements of identity. Thus, seals offer scholars from many different disciplines, including history, art history, literature, and archaeology, evidence for social, family and occupational networks, as well as such topics as devotional practices, political ideas, gender, and visual culture (New 2019, Pastoureau 1981). Seals are valuable sources of historical information when accurately dated.
Cataloguers have traditionally assigned dates to seal matrices with only brief explanations of their reasoning, which make those dates difficult to revise or critique (Linenthal 2004). Furthermore, in the absence of fulsome explanations, it is difficult to discern how consistent cataloguers are individually or collectively in their dating.
This paper presents and evaluates a machine learning based system designed, in the first instance, to date the approximately 6000 medieval seal matrices listed by England’s Portable Antiquities Scheme (PAS: https://finds.org.uk/). PAS is an exceptionally important archaeological project that involves documenting artifacts discovered by the public. The seals that PAS records were probably typically discarded or lost by their original owners, and thus it is likely that the PAS dataset includes many seals used by relatively humble people. If scholars aim to establish what seals people outside the aristocracy were using, PAS data is uniquely valuable, but the seal matrices first need to be dated as closely and accurately as possible.
The proposed machine learning system uses training data (7000 cases) from the DIGISIG project (McEwan 2022). I will show that a machine learning based approach, employing Scikit-Learn’s decision trees method (Pedregosa 2011), can revise the dates of the seal matrices listed in PAS. The machine learning tool’s dates are not as reliable as those of a skilled cataloger, but they are very close and can be produced at little cost ( http://www.digisig.org/analyze/dates ) . More importantly, the machine learning tool’s decision making processes are consistent and can be graphed, so they are easy to explain to scholars. Indeed, that clarity challenges cataloguers to document their own implicit decision making processes (Epstein, 2008). However, the way that the computer dates seal matrices is fundamentally different from how human cataloguers do it. While humans emphasize style and content, the computer finds significance in small variations in size and shape. The machine learning tool thus reveals an alternative way of dating the artifacts, which is perhaps even more valuable than the individual date predictions that the tool outputs. The machine learning tool offers us insights into how the artifacts develop overtime and is effectively reinventing approaches to learning and accessibility in sigillography. That work shifts the dating tool from an aid to cataloguers to a model (Blanke 2018, Weingart 2013) that contributes to the ongoing scholarly conversation about how cultures in the past developed.