eScriptorium belongs to the long list of “research software” (Pianosi et al., 2020) anchored in the Digital Humanities, along with applications like Transkribus (Muehlberger et al. 2019), Voyant Tools (Sinclair and Rockwell 2016) and TXM (Heiden 2010). It was developed in the context of the SCRIPTA-PSL research project 1 as an open-source web application for automatic text recognition (ATR) campaigns (Stokes, Kiessling, et al. 2021). ATR is now an essential technology in the toolbox of patrimonial institutions and researchers in (digital) humanities: it enables users to obtain transcriptions of printed or handwritten documents swiftly and seemingly without effort, making them compatible with a spectrum of computational investigation techniques. However, ATR workflows are complex and involve several steps, making eScriptorium, like the other tools mentioned above, an “expert software:” they offer a large range of functionalities which is a challenge for newcomers who need to familiarize themselves with a substantial amount of information.
The success of software extends beyond its ability to meet a specific need; it must also be welcoming to new users, which can be ensured by several means: the design of the interface (UX), the compatibility of the tool with the rest of the software environment, but also the availability of reliable documentation. As for eScriptorium, given the limited size of the team (eScripta) responsible for the development, no official extensive documentation was created. Instead, most of the available documentation was user-generated content scattered across the web and tailored to the needs of the user group which generated it.
In 2023, the ALManaCH team from Inria-Paris formed a small group of expert users and worked on a solution to create a centralized documentation for all user groups. Our motivations were two-fold. First, as the authors of the first extensive eScriptorium tutorial (in French) and as the administrators of one of the largest instances of eScriptorium, we are frequently asked to update the tutorial or share our expertise with new users. Since the tutorial was published on a restricted, project-specific Hypothesis blog, we needed a new publication pipeline for our documentation. Second, we wanted to take the opportunity of this reconfiguration to find a solution to the dispersion of the pre-existing documentation, in a way that would contribute positively to the open-source and the scientific community around eScriptorium.
In this paper, we use the documentation created for eScriptorium as a case study to explore how a documentation can be created by a group of people outside the team in charge of developing a software, and the conditions for this to succeed. Additionally, we examine the contribution of such an initiative to accelerating the integration of open-source, research software to larger infrastructures.
Prior to our initiative, information about eScriptorium’s features was scattered across various media:
This situation posed various challenges: locating tutorials or videos could be cumbersome, and they might not cover all features and workflows, especially if they were not updated later on. The absence of updates could be caused by an inadequate publication format, lack of stake to do so or simply the unavailability of individuals to perform the updates.
Considering the difficulty for eScripta to propose a comprehensive reference documentation (Jiang et al., 2022), we proposed to create a website designed to meet two essential criteria: it needed to be easily maintainable since eScriptorium has yet to achieve a stable official release; and it had to be open to external contributions while supporting multilingualism.
The key to creating an easily maintained documentation lies in its modularity, use of a lightweight markup language, and commitment to transparency. We operationalized this vision by adopting the "continuous documentation" paradigm through ReadTheDocs 10 (RTD), with the source code openly accessible on GitHub (Chagué et al. 2023). This documentation follows a versioned structure, composed of multiple Markdown files, each addressing specific aspects of the application. When the source code is updated, 11 RTD activates Mkdocs, which builds web pages from the Markdown files, and publishes the resulting website at escriptorium.readthedocs.io (See Fig. 1).
As of April 2023 (eScriptorium v0.13.6), the RTD-hosted eScriptorium documentation replaced the older English tutorial on the application homepage.
Our solution comes with inherent limitations. Similar to the resources mentioned earlier, it is susceptible to partial obsolescence as the developer team integrates new features. Also, as primarily users of eScriptorium ourselves, the content we propose may initially be exposed to blind spots. Thus, a question worth exploring is that of the trustworthiness of a documentation generated by users. The transparency of the process and its openness to any contributions are the keys to remediate these limitations.
Our proposition successfully solved our initial issue: the necessity to redesign the existing documentations for French- and English-speaking users, which had become impossible to maintain. Our efforts focused on a first proposition written in English only, but its compatibility with multilingualism makes it possible to imagine adding a French version later, or even integrating the German and Polish tutorials. Additionally, we realized that contributions of open-source software can come in diverse forms (Puhlfürß et al., 2022), including in the form of rationalizing its documentation.
Finally, to answer our initial question: the creation of a comprehensive reference documentation for project-generated open-source software, such as the eScriptorium documentation initiative discussed here, can serve as a catalyst for accelerated integration into larger infrastructures. Firstly, by facilitating knowledge diffusion and enhancing accessibility, (well-)documented projects break down entry barriers, ensuring that a broader audience can understand and engage with the software. Secondly, the collaborative nature of documentation projects has the potential to foster community engagement, which could lead to the creation of a network of users actively involved in the software’s development and adoption (Santos & Correia, 2022). Lastly, clear documentation can also attract developers and organizations looking for software solutions that align with their existing systems.
Figure 1: Modelization of the editorial workflow to update the content of the documentation: manual editions are only made on local Markdown or static files, while Mkdocs and ReadTheDocs automatically build, deploy and serve the updated website.
See https://psl.eu/en/scripta.
The manual and Docker-based installations are described here: https://gitlab.com/scripta/escriptorium/-/wikis.
See https://lectaurep.hypotheses.org/documentation/escriptorium-tutorial-en.
See https://lectaurep.hypotheses.org/documentation/prendre-en-main-escriptorium.
See https://ub-mannheim.github.io/eScriptorium_Dokumentation/Nutzungsanleitung_eScriptorium.html.
See https://github.com/pjaskulski/escriptorium_tutorial/blob/master/escriptorium_tutorial.md.
For example, the English tutorial was published in February 2021, and was referred to on the homepage of eScriptorium. However, while new features appeared on the application, it was never updated.
One such workshop occurred during the DH2022 conference (Stokes and Stökl Ben Ezra 2022). Another recent workshop was held at the University of Pennsylvania (Stokes 2023).
For example, the HTRomance project organized a series of three workshop at the French National Library (BNF) at the end of 2023 (https://bnf.hypotheses.org/35711).
Refer to https://docs.readthedocs.io/.
GitHub’s Pull Request system allows for editorial control of proposed updates.