Collaboration and Transparency: A User-Generated Documentation for eScriptorium

1. Introduction

eScriptorium belongs to the long list of “research software” (Pianosi et al., 2020) anchored in the Digital Humanities, along with applications like Transkribus (Muehlberger et al. 2019), Voyant Tools (Sinclair and Rockwell 2016) and TXM (Heiden 2010). It was developed in the context of the SCRIPTA-PSL research project 1 as an open-source web application for automatic text recognition (ATR) campaigns (Stokes, Kiessling, et al. 2021).  ATR is now an essential technology in the toolbox of patrimonial institutions and researchers in (digital) humanities: it enables users to obtain transcriptions of printed or handwritten documents swiftly and seemingly without effort, making them compatible with a spectrum of computational investigation techniques. However, ATR workflows are complex and involve several steps, making eScriptorium, like the other tools mentioned above, an “expert software:” they offer a large range of functionalities which is a challenge for newcomers who need to familiarize themselves with a substantial amount of information.

The success of software extends beyond its ability to meet a specific need; it must also be welcoming to new users, which can be ensured by several means: the design of the interface (UX), the compatibility of the tool with the rest of the software environment, but also the availability of reliable documentation. As for eScriptorium, given the limited size of the team (eScripta) responsible for the development, no official extensive documentation was created. Instead, most of the available documentation was user-generated content scattered across the web and tailored to the needs of the user group which generated it.

In 2023, the ALManaCH team from Inria-Paris formed a small group of expert users and worked on a solution to create a centralized documentation for all user groups. Our motivations were two-fold. First, as the authors of the first extensive eScriptorium tutorial (in French) and as the administrators of one of the largest instances of eScriptorium, we are frequently asked to update the tutorial or share our expertise with new users. Since the tutorial was published on a restricted, project-specific Hypothesis blog, we needed a new publication pipeline for our documentation. Second, we wanted to take the opportunity of this reconfiguration to find a solution to the dispersion of the pre-existing documentation, in a way that would contribute positively to the open-source and the scientific community around eScriptorium.

In this paper, we use the documentation created for eScriptorium as a case study to explore how a documentation can be created by a group of people outside the team in charge of developing a software, and the conditions for this to succeed. Additionally, we examine the contribution of such an initiative to accelerating the integration of open-source, research software to larger infrastructures.

2. Description of the proposed documentation

Prior to our initiative, information about eScriptorium’s features was scattered across various media:

This situation posed various challenges: locating tutorials or videos could be cumbersome, and they might not cover all features and workflows, especially if they were not updated later on. The absence of updates could be caused by an inadequate publication format, lack of stake to do so or simply the unavailability of individuals to perform the updates.

Considering the difficulty for eScripta to propose a comprehensive reference documentation (Jiang et al., 2022), we proposed to create a website designed to meet two essential criteria: it needed to be easily maintainable since eScriptorium has yet to achieve a stable official release; and it had to be open to external contributions while supporting multilingualism.

The key to creating an easily maintained documentation lies in its modularity, use of a lightweight markup language, and commitment to transparency. We operationalized this vision by adopting the "continuous documentation" paradigm through ReadTheDocs 10 (RTD), with the source code openly accessible on GitHub (Chagué et al. 2023). This documentation follows a versioned structure, composed of multiple Markdown files, each addressing specific aspects of the application. When the source code is updated, 11 RTD activates Mkdocs, which builds web pages from the Markdown files, and publishes the resulting website at escriptorium.readthedocs.io (See Fig. 1).

As of April 2023 (eScriptorium v0.13.6), the RTD-hosted eScriptorium documentation replaced the older English tutorial on the application homepage.

3. Discussion

Our solution comes with inherent limitations. Similar to the resources mentioned earlier, it is susceptible to partial obsolescence as the developer team integrates new features. Also, as primarily users of eScriptorium ourselves, the content we propose may initially be exposed to blind spots. Thus, a question worth exploring is that of the trustworthiness of a documentation generated by users. The transparency of the process and its openness to any contributions are the keys to remediate these limitations.

Our proposition successfully solved our initial issue: the necessity to redesign the existing documentations for French- and English-speaking users, which had become impossible to maintain. Our efforts focused on a first proposition written in English only, but its compatibility with multilingualism makes it possible to imagine adding a French version later, or even integrating the German and Polish tutorials. Additionally, we realized that contributions of open-source software can come in diverse forms (Puhlfürß et al., 2022), including in the form of rationalizing its documentation.

Finally, to answer our initial question: the creation of a comprehensive reference documentation for project-generated open-source software, such as the eScriptorium documentation initiative discussed here, can serve as a catalyst for accelerated integration into larger infrastructures. Firstly, by facilitating knowledge diffusion and enhancing accessibility, (well-)documented projects break down entry barriers, ensuring that a broader audience can understand and engage with the software. Secondly, the collaborative nature of documentation projects has the potential to foster community engagement, which could lead to the creation of a network of users actively involved in the software’s development and adoption (Santos & Correia, 2022). Lastly, clear documentation can also attract developers and organizations looking for software solutions that align with their existing systems.

Figure 1: Modelization of the editorial workflow to update the content of the documentation: manual editions are only made on local Markdown or static files, while Mkdocs and ReadTheDocs automatically build, deploy and serve the updated website.

Appendix A

Bibliography
  1. Chagué, Alix / Chiffoleau, Floriane / Scheithauer, Hugo / Carrow, Jennifer (2023): eScriptorium Documentation (source code). https://escriptorium.readthedocs.io [last accessed on June 6, 2024].
  2. Heiden, Serge (2010): The TXM Platform : Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. Institute for Digital Enhancement of Cognitive Development, Waseda University. 389‑398. https://shs.hal.science/halshs-00549764 [last accessed on June 6, 2024].
  3. Jiang, Huaxi / Zhu, Jie / Yang, Li / Liang, Geng / Zuo, Chun (2022): DeepRelease: Language-agnostic Release Notes Generation from Pull Requests of Open-source Software. arXiv. http://arxiv.org/abs/2201.06720 [last accessed on May 31, 2024].
  4. Muehlberger, Guenter et al. (2019): "Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study", in: Journal of Documentation 75 (5): 954–976. 10.1108/JD-07-2018-0114.
  5. Pianosi, Francesca / Sarrazin, Fanny / Wagener, Thorsten (2020): "How successfully is open-source research software adopted? Results and implications of surveying the users of a sensitivity analysis toolbox", in: Environmental Modelling & Software 124: 104579. 10.1016/j.envsoft.2019.104579.
  6. Puhlfürß, Tim / Montgomery, Lloyd / Maalej, Walid (2022): An Exploratory Study of Documentation Strategies for Product Features in Popular GitHub Projects. arXiv. http://arxiv.org/abs/2208.01317 [last accessed on May 31, 2024].
  7. Santos, João / Correia, Filipe (2022): Patterns for Documenting Open Source Frameworks. arXiv. http://arxiv.org/abs/2203.13871 [last accessed on May 31, 2024].
  8. Sinclair, Stéfan / Rockwell, Geoffrey (2016): Voyant Tool. http://voyant-tools.org/ [last accessed on June 6, 2024].
  9. Stokes, Peter Anthony / Kiessling, Benjamin / Stökl Ben Ezra, Daniel / Tissot, Robin / Gargem, El Hassane (2021): "The eScriptorium VRE for Manuscript Cultures", in: Classics@ Journal 18:.
  10. Stokes, Peter Anthony / Stökl Ben Ezra, Daniel (2022): Hands-on Introduction to eScriptorium, an Open-Source Platform for HTR (WT-20). Tokyo, Japan. https://dh2022.adho.org/workshops-and-tutorials/wt-20 [last accessed on June 6, 2024].
  11. Stokes, Peter Anthony (2023): How to Transcribe a Million Manuscripts with eScriptorium. in: Penn Libraries. https://www.library.upenn.edu/events/escriptorium [last accessed on June 6, 2024].
Notes
1.

See https://psl.eu/en/scripta.

2.

The manual and Docker-based installations are described here: https://gitlab.com/scripta/escriptorium/-/wikis.

3.

See https://lectaurep.hypotheses.org/documentation/escriptorium-tutorial-en.

4.

See https://lectaurep.hypotheses.org/documentation/prendre-en-main-escriptorium.

5.

See https://ub-mannheim.github.io/eScriptorium_Dokumentation/Nutzungsanleitung_eScriptorium.html.

6.

See https://github.com/pjaskulski/escriptorium_tutorial/blob/master/escriptorium_tutorial.md.

7.

For example, the English tutorial was published in February 2021, and was referred to on the homepage of eScriptorium. However, while new features appeared on the application, it was never updated.

8.

One such workshop occurred during the DH2022 conference (Stokes and Stökl Ben Ezra 2022). Another recent workshop was held at the University of Pennsylvania (Stokes 2023).

9.

For example, the HTRomance project organized a series of three workshop at the French National Library (BNF) at the end of 2023 (https://bnf.hypotheses.org/35711).

10.

Refer to https://docs.readthedocs.io/.

11.

GitHub’s Pull Request system allows for editorial control of proposed updates.

Alix Chagué (alix.chague@inria.fr), ALMAnaCH, Inria, France; Université de Montréal, Canada; Ecole Pratique des Hautes Etudes, France and Floriane Chiffoleau (floriane.chiffoleau@inria.fr), ALMAnaCH, Inria, France; Le Mans Université, France and Hugo Scheithauer (hugo.scheithauer@inria.fr), ALMAnaCH, Inria, France; Ecole Pratique des Hautes Etudes, France