Construction of a Corpus of the Jesuit Mission Press <em>“Qincuxǔ” </em>(1593)

Jesuit Mission Press " Xixo, xixxo nadono vchiyori nuqi idaxi, qincuxǔto nasu mono nari. Vôcata soresoreni chǔsuru mono nari." (1593, Figure 1, hereinafter " Qincuxǔ") is a compilation of golden sayings derived from Classical Chinese works of literature ( Xixo, xixxo). According to Fukushima (1969), the compilation of " Qincuxǔ" (1593) is believed to involve several pre-existing collections of golden sayings already circulating within Japan since the Heian Period.

The quoted golden saying is transcribed in the Roman alphabet in literary Japanese and listed in alphabetical order; their implications ( Cocoro) are provided below them, each in colloquial Japanese of the Muromachi period.

" Qincuxǔ" is bound together with the Amakusa editions of " Feiqe no Monogatari" and " Esopo no Fabulas” (hereinafter " Feiqe" and " Esopo"). This volume is a valuable artifact, as it is the only extant copy in the world, housed in the British Library (Shelfmark: Or.59.aa.1)

The collaboration between the British Library and the National Institute for Japanese Language and Linguistics (NINJAL) has facilitated public access to the images. An entirely digitized version of this unique work and transcriptions are available online as part of NINJAL's Corpus of Historical Japanese, Muromachi Period Series II: Christian Materials since 2019. NINJAL has already constructed a corpus of " Feiqe"and " Esopo." Katayama et al. (2019) describe the corpus's construction method; in this study, we constructed a corpus of " Qincuxǔ."

Figure 1. Image of Qincuxǔ (1593),p.507

This corpus possesses three distinctive features.

1.Morphological information is annotated for each text in the Qincuxǔ corpus. We utilized morphological analysis tools, namely UniDic and MeCab, to segment the entire text into morphemes and include morphological details such as lemma, readings, and parts of speech. UniDic, a dictionary designed for Japanese morphological analysis, is capable of lemmatizing variations in orthography and word forms.

2. The original Roman alphabet text has been an essential source for studying Medieval Japanese. We transformed the original Roman alphabet text into an electronic format to facilitate analysis. We have transcribed and encoded the text of the Jesuit Mission Press " Qincuxǔ" (1593) into XML according to the P5 Guidelines of the Text Encoding Initiative (TEI). However, UniDic and MeCab's morphological analysis tools do not support romanized text. Additionally, Japanese readers do not welcome Romanized texts. To address this, we transliterated them into Kana-Kanji texts. This process involved utilizing information from the formOrthBase feature of UniDic, which represents the representative orthography of each word. We then enriched this text with morphological information annotations and aligned the Roman alphabet and Japanese character texts into parallel versions. This approach seamlessly connects the Roman alphabet with the Japanese character text while preserving valuable morphological information.

3. The www-application "Chūnagon" is provided to access the corpus. This resource offers direct links to high-quality photographic images of the original prints from the British Library. Through the functionalities of "Chūnagon," users can conduct intricate searches, specifying combinations of morphological information such as lemma, part-of-speech, and conjugation type.

Acknowledgments

This presentation was supported by the NINJAL collaborative research project 'The Construction of Diachronic Corpora and New Developments in Research on the History of Japanese.'

Appendix A

Bibliography
  1. Fukushima,Kunimichi. (1969): Amakusaban Kinkushū no shutten ni tsuite. Kokugogaku79,90-99
  2. Katayama, Kurumi./Ogiso,Toshinobu./Nakamura,Takenori.(2018): Kirishitan shiryō no rōmaji gembun taiō wabuntekisuto no sakusei. Jinmonkon 2018 Ronbunsyu,89-96.
  3. Katayama,Kurumi./Ogiso,Toshinobu./Watanabe,Yuki.(2019) : Construction of a Corpus of "Christian Materials" for the Study of Colloquial Japanese of the Muromachi Period. Poster presented at Digital Humanities Conference 2019; July 11, 2019; Utrecht University, the Netherlands.
  4. National Institute for Japanese Language and Linguistics. (2024): Corpus of Historical Japanese, Muromachi Period Series, Volume II: Christian Materials.
    https://ccd.ninjal.ac.jp/chj/muromachi.html (accessed 13 June 2024).
  5. National Institute for Japanese Language and Linguistics.(2019, March 1): Images of the Amakusa edition of Heike monogatari, Isoho monogatari and Kinkushū in the British Library collection. https://dglb01.ninjal.ac.jp/BL_amakusa/en.php(accessed 13 June 2024)
Mari KUROKAWA (kmari@ninjal.ac.jp), National Institute for Japanese Language and Linguistics, Japan; Japan Society for the Promotion of Science and Kurumi KATAYAMA (kurumi_katayama@ninjal.ac.jp), National Institute for Japanese Language and Linguistics, Japan and Toshinobu OGISO (togiso@ninjal.ac.jp), National Institute for Japanese Language and Linguistics, Japan