SILICON: Supporting Digitally-Disadvantaged Languages

For reading and writing the Latin alphabet, as used for English, the technical landscape has "just worked” for several decades. To this day, though, reliably writing and reading digital text in other scripts can be a struggle – or simply impossible in writing systems that have not been added to the Unicode Standard. Usable digital text for most languages requires that several things be in place: official codepoints in the Unicode standard, an input method (which can have complex requirements for languages like Japanese that use multiple scripts with many characters, or like Egyptian hieroglyphs that require complex layout), at least one font (which may need to have built-in computational logic to handle positional glyph variants and other contextual substitutions) available across all platforms and devices. Each step of this process is time-consuming, and draws on specific expertise: knowledge of the Unicode proposal process, experience with UI/UX as well as a familiarity with the user community, and both design and coding skills to develop the fonts. It can take years, or even a decade, for all these pieces to be created and made available to user communities, not least because there is little commercial incentive to undertake this work.

While large tech companies invested considerable resources in the 1980's and 1990's to convert their operating systems to compatibility with Unicode and develop input methods and fonts for the languages used in emerging global markets (including those used in East Asia), several indigenous scripts of the Americas as well as South and Southeast Asia remain to be encoded. Corporate interest in robust multilingual computing has dwindled with the long tail of unencoded scripts, which do not represent profitable new markets. The absence of encoding, input method, or fonts effectively shuts a script out of the digital world and this significantly raises the likelihood that that language will fall into disuse and be lost within a generation or two (Bromham et al. 2022).

There is a long history of digital humanists advocating for better multilingual computing through engaging with the Unicode consortium. David Birnbaum consulted on early proposals for expanding the Cyrillic code block to support medieval Slavic characters (Unicode Collection). Deborah Anderson's 20-year Script Encoding Initiative project (Anderson 2023) has ushered around 120 proposals through the encoding process. Stanford University’s SILICON project was created at the intersection of multilingual DH, design, and human-computer interaction, with the goal of pushing forward the usability of new scripts once they have passed the encoding stage.

SILICON's approach is to address the many instances of the "dongle problem" (Mullaney 2023), where good work comes to a halt due to a small but meaningful barrier. This includes font designers needing a small amount of money to visit an archive, but the paperwork burden for traditional grants (and the fact that their work may not be seen as "scholarly") leaves them without many options. SILICON also brings in undergraduates as collaborators, who partner with members of Unicode committees to write missing documentation on their processes, for internal and external communication. This gives the students exposure to technical writing and public engagement, and serves as a pipeline into tech for humanists.

This poster will present the work of SILICON summer 2024 student interns, who will share the project's accomplishments and goals to date, and how DH scholars can support a more linguistically diverse technical ecosystem using Unicode resources like the Common Locale Data Repository.

Appendix A

Bibliography
  1. Anderson, Deborah (2023). “Digital Vitality for Linguistic Diversity: The Script Encoding Initiative” in Global Language Justice, ed. Lydia Liu and Anupama Rao. Columbia University Press, 2023.
  2. Bromham, Lindell et al. (2022) “Global predictors of language endangerment and the future of linguistic diversity”. Nature Ecology & Evolution, vol. 6, p. 163-173.
  3. Mullaney, Thomas (2023). Introductory remarks at “Face/Interface 2023: Global Type Design and Human-Computer Interaction”. December 1, 2023.
  4. Unicode Collection. Material housed at Stanford University Special Collections. ​​Cyrillic 1981-1993, box 61, folders 2-4. https://oac.cdlib.org/findaid/ark:/13030/c8nz8gd8/dsc
Quinn Dombrowski (qad@stanford.edu), Stanford University, United States of America and Anne Ladyem McDivitt (ladyem@stanford.edu), Stanford University, United States of America and Thomas Mullaney (tsmullaney@stanford.edu), Stanford University, United States of America and Kathryn Starkey (starkey@stanford.edu), Stanford University, United States of America and Elaine Treharne (treharne@stanford.edu), Stanford University, United States of America