Disability Representation in DH and Book Studies: Digitizing Braille Materials

There is a call for digital humanities to become more accessible (Williams 2012; Ellcessor 2018; Pirrone et al. 2022), and while this connection between digital humanities (DH) and disability studies is needed at a conceptual and theoretical level, DH scholars must also think about how we can adapt DH tools to specifically study accessible materials. There is currently a lacuna in humanities research focused on braille materials. There has been an increase in book history and publishing studies research looking at some accessible book formats, such as ebooks and audiobooks (Rubery 2011 & 2016; Rowberry 2017; Sterne 2003 & 2012), but braille is not yet a common research subject. A large part of that is because many of the tools and methods book history researchers typically use are transferrable between manuscripts, printed books, and ebooks—and in some cases even audiobooks—but braille presents its own set of challenges due to its tactile nature.

This paper will discuss the challenges, methods, and results of two braille-focused DH projects: training an optical character recognition (OCR) model to read pages of embossed braille and writing guidelines for working with braille materials in Textual Encoding Initiative (TEI) schemas. The overall goal of this project is to make it easier for researchers to include braille materials in their research, even if they cannot read braille themselves. These tools will be particularly useful to researchers working on corpus-based research, where a small subset of the available corpus is braille. Researchers who are already using OCR and TEI for the rest of their corpus will now be able to include braille materials in their projects using the same tools, thus increasing the amount of research available on braille as a tactile medium and as a literary script.

The braille character recognition (BCR) model is being trained on Transkribus, and will be available for open access soon; unexpected complications unique to braille have delayed the model’s open availability, which will be explained in the paper. The BCR model is being trained using pages of double-sided embossed 6-dot English braille, and it will be applicable to any language that uses the 6-dot braille system (it will not work for languages that use 8-dot braille). I will discuss the challenges specific to working with braille materials in OCR systems, the error rate of various versions of the model, as well as what the most common errors are and how non-braille users can proofread their BCR output to correct errors.

The TEI guidelines are currently a work-in-progress, which will be nearly complete by the time of the conference. As I began working on my dissertation project, I realized that there were many aspects of braille that I wanted to be able to tag and describe using TEI, but there were no accurate options in TEI. TEI is a highly customizable markup language and the TEI organization is actively working to improve inclusivity and accessibility of its guidelines and schemas. With the support of multiple people on the board and council of TEI, I have begun working on guidelines and a custom schema for working with braille materials, which will allow for direct textual comparisons between braille materials and other textual materials. I will briefly discuss the two case studies for my dissertation that I am using as prototypes for the guidelines, which include a single text published in multiple formats, including embossed braille, which are being encoded together for comparison between formats, and one multi-lingual, multi-script, illustrated book.

An open access BCR model will allow for researchers, who may or may not know braille, to include braille materials in their corpus-based research projects, but it will also allow for the digitization and access of materials currently unavailable. The output of BCR on Transkribus can easily be converted to BRF files, which can then be embossed on paper or read with refreshable braille displays for public or private access. The TEI guidelines will similarly allow for greater access to material that has been difficult to digitize. It will allow researchers to create scholarly digital editions of braille materials, or include braille materials as a point of comparison to other materials in a corpus. Both OCR and TEI are commonly used tools in digital humanities, book history, bibliography, and history disciplines, among others, and they are designed to work with text-based materials. My adaptations of these tools will allow for greater accessibility of braille materials and research within these disciplines, regardless of the researcher’s ability to read braille.

Appendix A

Bibliography
  1. Ellcessor, Elizabeth (2018): “A Glitch in the Tower: Academia, Disability, and Digital Humanities.” In The Routledge Companion to Media Studies and Digital Humanities , 108–16. New York: Routledge.
  2. Pirrone, Maria / Sicarl, Christian / Galletta, Antonino / Villari, Massimo (2022): “Digital Humanities and Disability: A Systematic Literature Review of Cultural Accessibility for People with Disability.” Digital Scholarship in the Humanities , July.
  3. Rowberry, Simon Peter (2017): “Ebookness.” Convergence 23, 3: 289–305.
  4. Rubery, Matthew, ed (2011): Audiobooks, Literature, and Sound Studies . Routledge.
  5. Rubery, Matthew (2016): The Untold Story of the Talking Book . Cambridge: Harvard University Press.
  6. Sterne, Jonathan (2003): The Audible Past: Cultural Origins of Sound Reproduction . London: Duke University Press.
  7. Sterne, Jonathan (2012): MP3: The Meaning of Format . Durham, N.C.: Duke University Press.
  8. Williams, George H (2012): “Disability, Universal Design, and Digital Humanities.” In Debates in Digital Humanities, 202–12. Minnesota: University of Minnesota Press.
Ellen Forget (ellen.michelle@mail.utoronto.ca), University of Toronto, Canada